Matrix versions of the Hellinger distance (Q2318010)

Let \((p_{1},p_{2},\dots,p_{n})\) and \((q_{1},q_{2},\dots,q_{n})\) be two probability distributions. Then the Hellinger distance between them is defined to be \(\left\{ \sum_{i}(\frac{1}{2}(p_{i}+q_{i})-\sqrt{p_{i}q_{i}})\right\} ^{1/2}\). In terms of the diagonal matrices \(P:=\mathrm{diag}(p_{1},p_{2},\dots)\) and \(Q:=\mathrm{diag}(q_{1},q_{2},\dots,q_{n})\), this can be written \[ d_{H}(P,Q):=\sqrt{\operatorname{tr}\mathcal{A}(P,Q)-\operatorname{tr}\mathcal{G}(P,Q)}, \] where \(\mathcal{A}\) and \(\mathcal{G}\) represent the arithmetic and geometric means of \(P\) and \(Q\), respectively. The goal of the present paper is to examine some extensions of the above definition for general \(n\times n\) complex semipositive definite matrices. Although there is a natural unique way to define \(\mathcal{A}\), there is more than one way to define the square root of a product of semipositive definite matrices and hence more than one way to define \(\mathcal{G}\). Let \(A\) and \(B\) be arbitrary semipositive definite matrices and let \(A^{1/2}\) and \(B^{1/2}\) be their (unique) positive semidefinite square roots. Write \(\left\Vert \ \right\Vert _{2}\) to denote the Frobenius norm and \(\mathbb{P}\) to denote the \(n\times n\) positive definite matrices. Then the following functions are considered: \(d_{1}(A,B):=\left\Vert A^{1/2}-B^{1/2}\right\Vert _{2}=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A^{1/2}B^{1/2}\right\} ^{1/2}\); \(d_{2}(A,B):=\left\{ \operatorname{tr}(A+B)-\operatorname{tr}(A^{1/2}BA^{1/2})^{1/2}\right\} ^{1/2}\); \(d_{3}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}A\#B\right\} ^{1/2}\) where \(A\#B:=A^{1/2}(A^{-1/2} BA^{-1/2})^{1/2}A^{1/2}\); and \(d_{4}(A,B):=\left\{ \operatorname{tr}(A+B)-2\operatorname{tr}\mathcal{L} (A,B)\right\} ^{1/2}\) where \(\mathcal{L}(A,B):=\exp\left( \frac{1}{2}(\log A+\log B)\right) \) (defined only for strictly positive definite matrices). The functions \(d_{1}\) and \(d_{2}\) define metrics (\(d_{2}\) is sometimes called the Bures distance or Wasserstein metric) but \(d_{3}\) and \(d_{4}\) fail to satisfy the triangle inequality so do not define metrics. The main results of this paper concern the functions \(\Phi_{k}(A,B):=d_{k}(A,B)^{2}\) for \(k=3\) and \(4\). In particular, it is shown that \(\Phi_{3}\) and \(\Phi_{4}\) are divergence functions (see [\textit{S.-i. Amari} [Information geometry and its applications. Tokyo: Springer (2016; Zbl 1350.94001)]) and have useful convexity properties such as (Theorem 8): for each \(A\in\mathbb{P}\) the function \(X\longmapsto\Phi_{4}(A,X)\) is strictly convex on \(\mathbb{P}\).

0 references

reviewed by

John D. Dixon

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references