Probabilistic Fréchet means for time varying persistence diagrams (Q2346527)

This paper proposes a way to allow for a statistical approach to persistent homology, which is an algebraic-topological theory of great use in topological data analysis. In plain words, this theory considers filtrations of topological spaces depending on a parameter \(r\) and describes the births and deaths of \(n\)-dimensional holes that happen while \(r\) increases. The collection of the pairs (birth time, death time) for each \(n\)-dimensional hole is called the \textit{persistence diagram} in degree \(n\) of the given filtration. Previous research has shown the difficulty of introducing good notions of mean and variance for a set of persistence diagrams. This paper faces that problem and proposes a solution that is based on changing persistence diagrams into probability distributions, for which the concept of mean probability distribution is available. The paper starts by recalling the definitions of persistence diagram, \(p\)th-Wasserstein distance between persistence diagrams, vineyard (i.e., a persistence diagram depending on a time parameter [\textit{D. Cohen-Steiner} et al., in: Proceedings of the 22nd annual symposium on computational geometry, SCG'06. New York, NY: Association for Computing Machinery (ACM). 119--126 (2006; Zbl 1153.68388)]), and the concept of mean introduced in [\textit{Y. Mileyko} et al., Inverse Probl. 27, No. 12, Article ID 124007, 22 p. (2011; Zbl 1247.68310)]. The authors highlight the fact that this last mean is not unique and does not depend continuously on the time parameter when the mean is computed for vineyards. This paper proposes to solve that problem by defining the mean of a set of diagrams as a probability distribution over diagrams instead of a single persistence diagram. This is done by introducing the concept of \textit{grouping} for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams. This concept generalizes the one of matching between two persistence diagrams to a finite family of diagrams. In plain words, it consists in selecting a family of \(N\)-tuples \(S^j=(p^j_1,\ldots, p^j_N)\) in \(X_1\times \ldots\times X_N\) in such a way that the elements of each \(X_i\) are given by the points \(\{p^j_i\}\) varying \(j\) (here multiplicities must be counted appropriately, and the points may belong to the diagonal \(x=y\)). Informally speaking, the points \(S^j\) correspond to each other in the grouping. Each grouping \(G\) for \(X\) allows to define a unique diagram \(\text{mean}_X(G)\) obtained by taking the collection of the averages of the sets \(S^j\). A grouping is called \textit{optimal} if it produces a diagram \(\text{mean}_X(G)\) that belongs to the Fréchet mean of the family \(X\), i.e., the sum of the squares of the \(2\)nd-Wasserstein distances from \(X_1,\ldots, X_N\) in the space of persistence diagrams takes its global minimum at the diagram \(\text{mean}_X(G)\). After introducing the concept of grouping, the authors define the \textit{probabilistic Fréchet mean} (PFM) for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams as a probability distribution over the set of persistence diagrams. This probability distribution \(\mu_X\) is defined by considering a suitable probability space of perturbations \(\{X'_1,\ldots, X'_N\}\) of the family \(\{X_1,\ldots, X_N\}\) so that for each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) we can compute the probability \(P(G)\) that the grouping \(G\) is optimal for \(\{X'_1,\ldots, X'_N\}\) (we observe that each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) naturally induces a grouping of \(\{X'_1,\ldots, X'_N\}\), which will be denoted by the same symbol \(G\)). The probability distribution \(\mu_X\) is a function whose support is the set of the diagrams \(\text{mean}_X(G)\) of \(X=\{X_1,\ldots, X_N\}\) varying \(G\) among the possible groupings of \(X\), defined by setting \(\mu_X(\text{mean}_X(G)):=P(G)\). Finally, the authors prove several results, including the fact that the map that takes a set of diagrams to its PFM is Hölder continuous. Some examples of the probabilistic Fréchet mean for a set of diagrams are also given. An appendix concerning algorithms concludes the paper. A statement at the very beginning of the paper claims that the field of topological data analysis was first introduced in 2000. This statement is not correct. Although under a different name, the concept of persistence was already used at the beginning of the '90s for topological analysis of data in computer vision and pattern recognition (cf., e.g., [\textit{A. Verri} et al., Biol. Cybern. 70, No. 2, 99--107 (1993; Zbl 0789.92030); \textit{S. Biasotti} et al., ``Describing shapes by geometrical-topological properties of real functions'', ACM Computing Surveys 40, No. 4, Article ID 12 (2008; \url{doi:10.1145/1391729.1391731})]). Even though the idea of computing averages in persistent homology by changing persistence diagrams into continuous functions related to probability distributions is not new (cf., e.g., [\textit{P. Donatini} et al., ``Size functions for signature recognition'', in: Vision geometry VII. Proceedings of the SPIE 3454, 178--183 (1998; \url{doi:10.1117/12.323253})], this paper has the merit of introducing an approach that could be of great interest for further applications of persistent homology to topological data analysis.

0 references

zbMATH Keywords

topological data analysis (TDA)

0 references

persistent homology

0 references

Fréchet mean

0 references

vineyard

0 references

time-varying data

0 references