A semantic relatedness preserved subset extraction method for language corpora based on pseudo-Boolean optimization (Q2193277): Difference between revisions

Text corpora in natural-language research contain billions of words and the size is growing, which has created the problem of extracting smaller subsets with a minimally changed semantics. Let \(T=\{t_1,\dots,t_n\}\) be a set of tokens (e.g. words) in an annotated text corpus with real-valued unary and binary attributes and semantic relatedness relations \(S^1\in\mathcal{R}^n\), \(S^2\in\mathcal{R}^{n\times n}\), \(S^3\in\mathcal{R}^{n\times n\times n}\); \(X=\{x_1,\dots,x_n\}\in\{0,1\}^n\) be Boolean variables to denote subsets from \(T\). The problem of semantics relatedness preservation in corpora subset extraction is finding an `optimal' (minimal) subset \(X\subset T\) which maximizes \(\sum\limits_{i=1}^ns_i^1{x_i}+\sum\limits_{i,j=1}^ns_{ij}^2x_ix_j+\sum\limits_{i,j,k = 1}^ns_{ijk}^3x_ix_jx_k\) under constraints for attributes (here, one unary and one binary attribute constraint are considered). This NP-hard problem is transformed into the problem of finding the maximum flow in an equivalent graph and solved using the discrete Lagrangian iteration method.

0 references

zbMATH Keywords

semantic relatedness

0 references

subset extraction

0 references

language intelligence

0 references

pseudo-Boolean optimization

0 references

discrete Lagrangian method

0 references

reviewed by

Jaak Henno

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1016/j.tcs.2020.07.020

0 references

cites work

Quadratization of symmetric pseudo-Boolean functions

0 references

Maximizing a supermodular pseudoboolean function: A polynomial algorithm for supermodular cubic functions

0 references

A Selection Problem of Shared Fixed Costs and Network Flows

0 references

A discrete Lagrangian-based global-search method for solving satisfiability problems

0 references

Q4607913

0 references

Identifiers

zbMATH Open document ID

1461.68237

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

10.1016/J.TCS.2020.07.020

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2193277

@@ Property / DOI @@
-.1016/j.tcs.2020.07.020
@@ Property / DOI: 10.1016/j.tcs.2020.07.020 / rank @@
-Normal rank
@@ Property / reviewed by @@
-Jaak Henno
@@ Property / reviewed by: Jaak Henno / rank @@
-Normal rank
@@ Property / reviewed by @@
+Jaak Henno
@@ Property / reviewed by: Jaak Henno / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1016/j.tcs.2020.07.020
+Normal rank
@@ Property / OpenAlex ID @@
+W3044921385
@@ Property / OpenAlex ID: W3044921385 / rank @@
+Normal rank
@@ Property / cites work @@
+Quadratization of symmetric pseudo-Boolean functions
+Normal rank
@@ Property / cites work @@
+Maximizing a supermodular pseudoboolean function: A polynomial algorithm for supermodular cubic functions
+Normal rank
@@ Property / cites work @@
+A Selection Problem of Shared Fixed Costs and Network Flows
+Normal rank
@@ Property / cites work @@
+A discrete Lagrangian-based global-search method for solving satisfiability problems
+Normal rank
@@ Property / cites work @@
+Q4607913
@@ Property / cites work: Q4607913 / rank @@
+Normal rank
@@ Property / DOI @@
+.1016/J.TCS.2020.07.020
@@ Property / DOI: 10.1016/J.TCS.2020.07.020 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:2193277