Probabilistic Fréchet means for time varying persistence diagrams (Q2346527): Difference between revisions

This paper proposes a way to allow for a statistical approach to persistent homology, which is an algebraic-topological theory of great use in topological data analysis. In plain words, this theory considers filtrations of topological spaces depending on a parameter \(r\) and describes the births and deaths of \(n\)-dimensional holes that happen while \(r\) increases. The collection of the pairs (birth time, death time) for each \(n\)-dimensional hole is called the \textit{persistence diagram} in degree \(n\) of the given filtration. Previous research has shown the difficulty of introducing good notions of mean and variance for a set of persistence diagrams. This paper faces that problem and proposes a solution that is based on changing persistence diagrams into probability distributions, for which the concept of mean probability distribution is available. The paper starts by recalling the definitions of persistence diagram, \(p\)th-Wasserstein distance between persistence diagrams, vineyard (i.e., a persistence diagram depending on a time parameter [\textit{D. Cohen-Steiner} et al., in: Proceedings of the 22nd annual symposium on computational geometry, SCG'06. New York, NY: Association for Computing Machinery (ACM). 119--126 (2006; Zbl 1153.68388)]), and the concept of mean introduced in [\textit{Y. Mileyko} et al., Inverse Probl. 27, No. 12, Article ID 124007, 22 p. (2011; Zbl 1247.68310)]. The authors highlight the fact that this last mean is not unique and does not depend continuously on the time parameter when the mean is computed for vineyards. This paper proposes to solve that problem by defining the mean of a set of diagrams as a probability distribution over diagrams instead of a single persistence diagram. This is done by introducing the concept of \textit{grouping} for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams. This concept generalizes the one of matching between two persistence diagrams to a finite family of diagrams. In plain words, it consists in selecting a family of \(N\)-tuples \(S^j=(p^j_1,\ldots, p^j_N)\) in \(X_1\times \ldots\times X_N\) in such a way that the elements of each \(X_i\) are given by the points \(\{p^j_i\}\) varying \(j\) (here multiplicities must be counted appropriately, and the points may belong to the diagonal \(x=y\)). Informally speaking, the points \(S^j\) correspond to each other in the grouping. Each grouping \(G\) for \(X\) allows to define a unique diagram \(\text{mean}_X(G)\) obtained by taking the collection of the averages of the sets \(S^j\). A grouping is called \textit{optimal} if it produces a diagram \(\text{mean}_X(G)\) that belongs to the Fréchet mean of the family \(X\), i.e., the sum of the squares of the \(2\)nd-Wasserstein distances from \(X_1,\ldots, X_N\) in the space of persistence diagrams takes its global minimum at the diagram \(\text{mean}_X(G)\). After introducing the concept of grouping, the authors define the \textit{probabilistic Fréchet mean} (PFM) for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams as a probability distribution over the set of persistence diagrams. This probability distribution \(\mu_X\) is defined by considering a suitable probability space of perturbations \(\{X'_1,\ldots, X'_N\}\) of the family \(\{X_1,\ldots, X_N\}\) so that for each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) we can compute the probability \(P(G)\) that the grouping \(G\) is optimal for \(\{X'_1,\ldots, X'_N\}\) (we observe that each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) naturally induces a grouping of \(\{X'_1,\ldots, X'_N\}\), which will be denoted by the same symbol \(G\)). The probability distribution \(\mu_X\) is a function whose support is the set of the diagrams \(\text{mean}_X(G)\) of \(X=\{X_1,\ldots, X_N\}\) varying \(G\) among the possible groupings of \(X\), defined by setting \(\mu_X(\text{mean}_X(G)):=P(G)\). Finally, the authors prove several results, including the fact that the map that takes a set of diagrams to its PFM is Hölder continuous. Some examples of the probabilistic Fréchet mean for a set of diagrams are also given. An appendix concerning algorithms concludes the paper. A statement at the very beginning of the paper claims that the field of topological data analysis was first introduced in 2000. This statement is not correct. Although under a different name, the concept of persistence was already used at the beginning of the '90s for topological analysis of data in computer vision and pattern recognition (cf., e.g., [\textit{A. Verri} et al., Biol. Cybern. 70, No. 2, 99--107 (1993; Zbl 0789.92030); \textit{S. Biasotti} et al., ``Describing shapes by geometrical-topological properties of real functions'', ACM Computing Surveys 40, No. 4, Article ID 12 (2008; \url{doi:10.1145/1391729.1391731})]). Even though the idea of computing averages in persistent homology by changing persistence diagrams into continuous functions related to probability distributions is not new (cf., e.g., [\textit{P. Donatini} et al., ``Size functions for signature recognition'', in: Vision geometry VII. Proceedings of the SPIE 3454, 178--183 (1998; \url{doi:10.1117/12.323253})], this paper has the merit of introducing an approach that could be of great interest for further applications of persistent homology to topological data analysis.

0 references

zbMATH Keywords

topological data analysis (TDA)

0 references

persistent homology

0 references

Fréchet mean

0 references

vineyard

0 references

time-varying data

0 references

persistence diagram

0 references

variance

0 references

reviewed by

Patrizio Frosini

0 references

describes a project that uses

SW1PerS

0 references

bootstrap

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Extreme elevation on a 2-manifold

0 references

Interface surfaces for protein-protein complexes

0 references

Robust statistics, hypothesis testing, and confidence intervals for persistent homology on metric measure spaces

0 references

Statistical topological data analysis using persistence landscapes

0 references

Proximity of persistence modules and their diagrams

0 references

Stability of persistence diagrams

0 references

Q3601530

0 references

Q3514524

0 references

Morse-smale complexes for piecewise linear 3-manifolds

0 references

Topological persistence and simplification

0 references

Q4318617

0 references

Exploring uses of persistent homology for statistical analysis of landmark-based shape data

0 references

Probability measures on the space of persistence diagrams

0 references

Algorithms for the Assignment and Transportation Problems

0 references

Q3827224

0 references

Sliding windows and persistence: an application of topological methods to signal analysis

0 references

Reexamination of the perfectness concept for equilibrium points in extensive games

0 references

Fréchet means for distributions of persistence diagrams

0 references

Identifiers

zbMATH Open document ID

1348.68285

0 references

DOI

10.1214/15-EJS1030

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2346527

Revision as of 03:40, 10 July 2024 ReferenceBot (talk \| contribs) Bots 1,895,659 edits ‎Changed an Item ← Older edit	Latest revision as of 08:23, 30 July 2024 Openalex240730090724 (talk \| contribs) 579,206 edits Set OpenAlex properties.
	Property / OpenAlex ID
		W1819250359
	Property / OpenAlex ID: W1819250359 / rank
		Normal rank