Matrix versions of the Hellinger distance (Q2318010)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Matrix versions of the Hellinger distance
scientific article

    Statements

    Matrix versions of the Hellinger distance (English)
    0 references
    0 references
    0 references
    0 references
    13 August 2019
    0 references
    Let \((p_{1},p_{2},\dots,p_{n})\) and \((q_{1},q_{2},\dots,q_{n})\) be two probability distributions. Then the Hellinger distance between them is defined to be \(\left\{ \sum_{i}(\frac{1}{2}(p_{i}+q_{i})-\sqrt{p_{i}q_{i}})\right\} ^{1/2}\). In terms of the diagonal matrices \(P:=\mathrm{diag}(p_{1},p_{2},\dots)\) and \(Q:=\mathrm{diag}(q_{1},q_{2},\dots,q_{n})\), this can be written \[ d_{H}(P,Q):=\sqrt{\operatorname{tr}\mathcal{A}(P,Q)-\operatorname{tr}\mathcal{G}(P,Q)}, \] where \(\mathcal{A}\) and \(\mathcal{G}\) represent the arithmetic and geometric means of \(P\) and \(Q\), respectively. The goal of the present paper is to examine some extensions of the above definition for general \(n\times n\) complex semipositive definite matrices. Although there is a natural unique way to define \(\mathcal{A}\), there is more than one way to define the square root of a product of semipositive definite matrices and hence more than one way to define \(\mathcal{G}\). Let \(A\) and \(B\) be arbitrary semipositive definite matrices and let \(A^{1/2}\) and \(B^{1/2}\) be their (unique) positive semidefinite square roots. Write \(\left\Vert \ \right\Vert _{2}\) to denote the Frobenius norm and \(\mathbb{P}\) to denote the \(n\times n\) positive definite matrices. Then the following functions are considered: \(d_{1}(A,B):=\left\Vert A^{1/2}-B^{1/2}\right\Vert _{2}=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A^{1/2}B^{1/2}\right\} ^{1/2}\); \(d_{2}(A,B):=\left\{ \operatorname{tr}(A+B)-\operatorname{tr}(A^{1/2}BA^{1/2})^{1/2}\right\} ^{1/2}\); \(d_{3}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A\#B\right\} ^{1/2}\) where \(A\#B:=A^{1/2}(A^{-1/2} BA^{-1/2})^{1/2}A^{1/2}\); and \(d_{4}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}\mathcal{L} (A,B)\right\} ^{1/2}\) where \(\mathcal{L}(A,B):=\exp\left( \frac{1}{2}(\log A+\log B)\right) \) (defined only for strictly positive definite matrices). The functions \(d_{1}\) and \(d_{2}\) define metrics (\(d_{2}\) is sometimes called the Bures distance or Wasserstein metric) but \(d_{3}\) and \(d_{4}\) fail to satisfy the triangle inequality so do not define metrics. The main results of this paper concern the functions \(\Phi_{k}(A,B):=d_{k}(A,B)^{2}\) for \(k=3\) and \(4\). In particular, it is shown that \(\Phi_{3}\) and \(\Phi_{4}\) are divergence functions (see [\textit{S.-i. Amari} [Information geometry and its applications. Tokyo: Springer (2016; Zbl 1350.94001)]) and have useful convexity properties such as (Theorem 8): for each \(A\in\mathbb{P}\) the function \(X\longmapsto\Phi_{4}(A,X)\) is strictly convex on \(\mathbb{P}\).
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    geometric mean
    0 references
    matrix divergence
    0 references
    relative entropy
    0 references
    strict convexity
    0 references
    barycentre
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references