NP-hard problems in hierarchical-tree clustering (Q1102746)

From MaRDI portal
Revision as of 01:42, 31 January 2024 by Import240129110113 (talk | contribs) (Added link to MaRDI item.)
scientific article
Language Label Description Also known as
English
NP-hard problems in hierarchical-tree clustering
scientific article

    Statements

    NP-hard problems in hierarchical-tree clustering (English)
    0 references
    0 references
    0 references
    1986
    0 references
    Given an n-element set \(X=\{x_ 1,...,x_ n\}\) and a symmetric matrix \(D=(d_{i,j})\) whose elements are 0 on the diagonal and positive real numbers elsewhere, a hierarchical tree T over X is defined as a sequence of pairs \((P_ i, l_ i)\) \((i=1,...,q)\) where the \(P_ i\) are partitions of X such that \(P_ 1=\{\{x_ 1\}\), \(\{x_ 2\},...,\{x_ n\}\}\), \(P_ q=\{X\}\) and \(P_ i\) is a proper refinement of \(P_{i+1}\) (1\(\leq i\leq q-1)\), and the \(l_ i\) are integers with \(0=l_ 1<l_ 2<...<l_ q\). The integer q is called the height of T and \(l_ k\) the level of \(P_ k\) in T. If \(d_{i,j}\) is a measure of ``dissimilarity'' among elements \(x_ i\) and \(x_ j\), the hierarchical clustering problem (HIC) is intuitively the problem of finding a hierarchical tree \(T^*\) over X which is ``optimal'' in the sense that it minimizes the difference among the dissimilarity of two elements and the level in the tree where they appear for the first time in the same class in the partition. More formally, the tree \(T^*\) should minimize the value of an objective function F, defined from the set of hierarchical trees on X to the set of nonnegative reals, as follows: \[ F(T)=\sum_{1\leq i<j\leq n}| d_{i,j}-u_ T(x_ i,x_ j)| \] where \(u_ T\) is the function: \[ u_ T(x_ i,x_ j)=Min\{l_ k | \quad \exists M\in P_ k\quad such\quad that\quad \{x_ i,x_ j\}\subseteq M\}\quad. \] A variant of the problem, called \(HIC_ q\), consists in finding the optimal tree of a fixed height q (2\(\leq q\leq n).\) In the paper it is shown that problems HIC and \(HIC_ q\), for \(q\geq 3\), are NP-hard. The proof is based on a chain of polynomial reductions. First, it is shown that \(HIC_ q\propto HIC_{q+1}\), where \(\propto\) denotes the polynomial reducibility relation. Then, the binary restrictions \({}^ bHIC\) and \({}^ bHIC_ q\) of the above problems, i.e. the restrictions to dissimilarity matrices in which each off-diagonal element equals either 1 or 2, are considered, and it is proved that \({}^ bHIC_ 3\propto^ bHIC\). At this point, the NP-hardness of \({}^ bHIC_ 3\) is proved by reduction from the NP-complete problem EC3 (exact cover by ordered 3-tuples): \(EC3\propto^ bHIC_ 3\).
    0 references
    cluster analysis
    0 references
    optimization problems
    0 references
    hierarchical-tree clustering
    0 references
    NP-completeness
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references