Learning algebraic varieties from samples (Q1790935)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Learning algebraic varieties from samples
scientific article

    Statements

    Learning algebraic varieties from samples (English)
    0 references
    0 references
    0 references
    0 references
    4 October 2018
    0 references
    Suppose a sample of points is given in a real Euclidean space from an unknown algebraic variety (that is, the zero locus of multivariate polynomial equations). The purpose of the article under review is to identify characteristics of the unknown variety from the sampling of points. This is, as the authors indicate, `a fundamental problem at the interface of data science and algebraic geometry.' The authors survey a wide range of literature and pose a number of new ideas for algorithms to approach these questions. This article is a useful reference at many levels -- from those starting to learn techniques of data science and algebraic geometry to experts looking for different approaches to try. Especially useful is the long list of examples in Section 2 and the accompanying software described in Section 7 (which is openly available to the reader). The authors pose five questions about the unknown variety \(V\) around which they organize the paper: \begin{itemize} \item[1.] What is the dimension of \(V\)? \item[2.] Which polynomials vanish on \(V\)? \item[3.] What is the degree of \(V\)? \item[4.] What are the irreducible components of \(V\)? \item[5.] What are the homology groups of \(V\)? \end{itemize} In Section 3 the authors consider the problem of dimension estimation, which is standard in the area of manifold learning [\textit{J. A. Lee} and \textit{M. Verleysen}, Nonlinear dimensionality reduction. New York, NY: Springer (2007; Zbl 1128.68024)]. The authors survey six different approaches to dimension estimation. These are non-linear principal component analysis, box counting, correlation dimension, persistent homology curve dimension [\textit{H. Adams} et al., Abel Symp. 15, 1--31 (2020; Zbl 1448.62211)], maximum likelihood estimation [\textit{E. Levina} and \textit{P. Bickel}, ``Maximum likelihood estimation of intrinsic dimension,'' Adv. Neural Inf. Process. Syst. 17, 777--784 (2004)], and an analysis of variance estimation [\textit{M. Díaz} et al., J. Multivariate Anal. 173, 229--247 (2019; Zbl 1422.62182)]. The authors compare the different approaches for their running examples in a \textit{dimension diagram} inspired by the barcodes of persistent homology. In Section 4 the authors proceed to persistent homology, which is a staple of topological data analysis [\textit{G. Carlsson}, Bull. Am. Math. Soc., New Ser. 46, No. 2, 255--308 (2009; Zbl 1172.62002)]. The authors explore using the tangent space of the unknown variety \(V\) at the sample points to get ellipsoids that refine the usual input of spheres for the Vietoris-Rips complex. (The tangent space can be estimated once some equations are known -- which is a problem considered in Section 5.) The authors also indicate the importance of the \textit{reach} of a manifold in learning its homologies [\textit{P. Niyogi} et al., Discrete Comput. Geom. 39, No. 1--3, 419--441 (2008; Zbl 1148.68048)]. Section 5 analyzes the problem of finding equations for the unknown variety \(V\). The standard approach here, when looking for polynomial equations of degree \(d\) that vanish on a set of points, is to `linearize' the question by applying the \(d\)th Veronese map. Under this map, the space of hypersurfaces of degree \(d\) vanishing on the points become a linear space, more precisely they are the kernel of a \textit{Vandermonde} matrix (this is the matrix whose columns record the images of the sample points under the \(d\)th Veronese map). The authors outline three standard methods for computing this kernel; it is particularly important to do this numerically for sample points which are given with floating point coordinates. A problem here, indicated by the authors, is that the Vandermonde matrices are notoriously ill-conditioned [\textit{V. Y. Pan}, SIAM J. Matrix Anal. Appl. 37, No. 2, 676--694 (2016; Zbl 1382.15008)]. There is a trade-off between trying to find \textit{sparse} equations and avoiding the ill-conditioning by choosing an orthogonal basis of polynomials that likely destroys any desired sparse representation. In Section 6 the authors describe methods for extracting meaningful information about the variety \(V\) from the equations derived in Section 5. This is the realm of computational algebraic geometry. Largely thanks to Groebner bases, symbolic methods are now well-established and available in symbolic computer algebra systems such as \texttt{Macaulay2}, \texttt{CoCoA}, and \texttt{Singular}. More recently, \textit{numerical algebraic geometry}, based on homotopy continuation methods, has been a growing field. The software \texttt{Bertini} implements these numerical methods (and, more recently, the \texttt{HomotopyContinuation.jl} package in \texttt{Julia}). The authors describe how to use these software packages to address the remaining questions for the given samples (degree of \(V\), irreducible components of \(V\), etc.) Finally, section 7 demonstrates the methods described in the paper using a \texttt{Julia} package developed by the authors called \texttt{LearningAlgebraicVarieties}. The authors give a step-by-step tutorial for using the package.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    real algebraic geometry
    0 references
    point cloud data
    0 references
    persistent homology
    0 references
    interpolation
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references