NP-hard problems in hierarchical-tree clustering (Q1102746): Difference between revisions
From MaRDI portal
Added link to MaRDI item. |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Revision as of 02:13, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | NP-hard problems in hierarchical-tree clustering |
scientific article |
Statements
NP-hard problems in hierarchical-tree clustering (English)
0 references
1986
0 references
Given an n-element set \(X=\{x_ 1,...,x_ n\}\) and a symmetric matrix \(D=(d_{i,j})\) whose elements are 0 on the diagonal and positive real numbers elsewhere, a hierarchical tree T over X is defined as a sequence of pairs \((P_ i, l_ i)\) \((i=1,...,q)\) where the \(P_ i\) are partitions of X such that \(P_ 1=\{\{x_ 1\}\), \(\{x_ 2\},...,\{x_ n\}\}\), \(P_ q=\{X\}\) and \(P_ i\) is a proper refinement of \(P_{i+1}\) (1\(\leq i\leq q-1)\), and the \(l_ i\) are integers with \(0=l_ 1<l_ 2<...<l_ q\). The integer q is called the height of T and \(l_ k\) the level of \(P_ k\) in T. If \(d_{i,j}\) is a measure of ``dissimilarity'' among elements \(x_ i\) and \(x_ j\), the hierarchical clustering problem (HIC) is intuitively the problem of finding a hierarchical tree \(T^*\) over X which is ``optimal'' in the sense that it minimizes the difference among the dissimilarity of two elements and the level in the tree where they appear for the first time in the same class in the partition. More formally, the tree \(T^*\) should minimize the value of an objective function F, defined from the set of hierarchical trees on X to the set of nonnegative reals, as follows: \[ F(T)=\sum_{1\leq i<j\leq n}| d_{i,j}-u_ T(x_ i,x_ j)| \] where \(u_ T\) is the function: \[ u_ T(x_ i,x_ j)=Min\{l_ k | \quad \exists M\in P_ k\quad such\quad that\quad \{x_ i,x_ j\}\subseteq M\}\quad. \] A variant of the problem, called \(HIC_ q\), consists in finding the optimal tree of a fixed height q (2\(\leq q\leq n).\) In the paper it is shown that problems HIC and \(HIC_ q\), for \(q\geq 3\), are NP-hard. The proof is based on a chain of polynomial reductions. First, it is shown that \(HIC_ q\propto HIC_{q+1}\), where \(\propto\) denotes the polynomial reducibility relation. Then, the binary restrictions \({}^ bHIC\) and \({}^ bHIC_ q\) of the above problems, i.e. the restrictions to dissimilarity matrices in which each off-diagonal element equals either 1 or 2, are considered, and it is proved that \({}^ bHIC_ 3\propto^ bHIC\). At this point, the NP-hardness of \({}^ bHIC_ 3\) is proved by reduction from the NP-complete problem EC3 (exact cover by ordered 3-tuples): \(EC3\propto^ bHIC_ 3\).
0 references
cluster analysis
0 references
optimization problems
0 references
hierarchical-tree clustering
0 references
NP-completeness
0 references