A semantic relatedness preserved subset extraction method for language corpora based on pseudo-Boolean optimization (Q2193277): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
ReferenceBot (talk | contribs)
Changed an Item
 
(5 intermediate revisions by 4 users not shown)
Property / reviewed by
 
Property / reviewed by: Jaak Henno / rank
Normal rank
 
Property / reviewed by
 
Property / reviewed by: Jaak Henno / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1016/j.tcs.2020.07.020 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W3044921385 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Quadratization of symmetric pseudo-Boolean functions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Maximizing a supermodular pseudoboolean function: A polynomial algorithm for supermodular cubic functions / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Selection Problem of Shared Fixed Costs and Network Flows / rank
 
Normal rank
Property / cites work
 
Property / cites work: A discrete Lagrangian-based global-search method for solving satisfiability problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4607913 / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 08:15, 23 July 2024

scientific article
Language Label Description Also known as
English
A semantic relatedness preserved subset extraction method for language corpora based on pseudo-Boolean optimization
scientific article

    Statements

    A semantic relatedness preserved subset extraction method for language corpora based on pseudo-Boolean optimization (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    25 August 2020
    0 references
    Text corpora in natural-language research contain billions of words and the size is growing, which has created the problem of extracting smaller subsets with a minimally changed semantics. Let \(T=\{t_1,\dots,t_n\}\) be a set of tokens (e.g. words) in an annotated text corpus with real-valued unary and binary attributes and semantic relatedness relations \(S^1\in\mathcal{R}^n\), \(S^2\in\mathcal{R}^{n\times n}\), \(S^3\in\mathcal{R}^{n\times n\times n}\); \(X=\{x_1,\dots,x_n\}\in\{0,1\}^n\) be Boolean variables to denote subsets from \(T\). The problem of semantics relatedness preservation in corpora subset extraction is finding an `optimal' (minimal) subset \(X\subset T\) which maximizes \(\sum\limits_{i=1}^ns_i^1{x_i}+\sum\limits_{i,j=1}^ns_{ij}^2x_ix_j+\sum\limits_{i,j,k = 1}^ns_{ijk}^3x_ix_jx_k\) under constraints for attributes (here, one unary and one binary attribute constraint are considered). This NP-hard problem is transformed into the problem of finding the maximum flow in an equivalent graph and solved using the discrete Lagrangian iteration method.
    0 references
    semantic relatedness
    0 references
    subset extraction
    0 references
    language intelligence
    0 references
    pseudo-Boolean optimization
    0 references
    discrete Lagrangian method
    0 references
    0 references

    Identifiers