Distance geometry and data science (Q2192022)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Distance geometry and data science	scientific article

Statements

scholarly article

0 references

Distance geometry and data science (English)

0 references

0 references

0 references

publication date

26 June 2020

0 references

full work available at URL

https://arxiv.org/abs/1909.08544

0 references

Tha aim of this large paper is to discuss various approaches to the basic distance geometry problem and its applications in data science, thereby visiting many current highlights in optimization research. First the field of Mathematical Programming (MP) is formalized and classified, including reformulations, relaxations and approximations. Then Distance Geometry (DG) is defined as the NP-hard problem of embedding an undirected edge-weighted graph in Euclidean space of given dimension such that the weights equal the corresponding vertex distances, with applications in Engineering, Protein folding and data mining, the last being further developed. After showing how different kinds of data, such as process descriptions, text, databases and abductive inference, may be represented as weighted graphs, it is argued that the particular data science tasks of classification and clustering may be applied to graph-data after vectorization by DG, using e.g. \(k\)-means or Artificial Neural Networks (ANN), while clustering of the vertices of a graph may be done using spectral or modularity clustering. Several MP methods for obtaining robust approximate solutions to DG are detailed for the usual case of perturbed weight data. An unconstrained quartic formulation minimizing the sum of squared square-differences, and two constrained variants may be tackled by a local nonlinear solver. A relaxation of DG may be cast into a semidefinite programming problem solvable for low dimension by interior point methods, which for high dimensions may be further relaxed to diagonally dominant form yielding an LP. Some fast exact, but very high dimensional embeddings may be obtained by incidence vectors, the Frechet max-norm embedding, or by multidimensional scaling. Embedding dimension may be reduced using principal component analysis, Isomap, Barvinok's probabilistic `naive method', and finally by random projections using the fact that these preserve (euclidean) norms on average. Very disturbing for most of these methods is the phenomenon of distance instability and concentration of distances in high dimension: it is shown that as dimension increases the difference between smallest and largest distance between pairs of points tends to zero in probability. The survey ends in an exercise of natural language clustering by way of ANN, comparing several of the techniques described.

0 references

0 references

zbMATH Keywords

Euclidean distance

0 references

isometric embedding

0 references

random projection

0 references

mathematical programming

0 references

machine learning

0 references

artificial neural networks

0 references

describes a project that uses

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

MaRDI profile type

MaRDI publication profile

0 references

Database-friendly random projections: Johnson-Lindenstrauss with binary coins.

0 references

0 references

DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization

0 references

Joint Spectral Radius and Path-Complete Graph Lyapunov Functions

0 references

Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

0 references

Solving Euclidean distance matrix completion problems via semidefinite progrmming

0 references

An improved column generation algorithm for minimum sum-of-squares clustering

0 references

0 references

Edge-swapping algorithms for the minimum fundamental cycle basis problem

0 references

0 references

An algorithmic theory of learning: Robust concepts and random projection

0 references

The Rigidity of Graphs

0 references

Cones of diagonally dominant matrices

0 references

Problems of distance geometry and convex properties of quadratic maps

0 references

Measure concentration in optimization

0 references

Is the Distance Geometry Problem in NP?

0 references

Branching and bounds tighteningtechniques for non-convex MINLP

0 references

0 references

0 references

Introduction to Stochastic Programming

0 references

0 references

Flow diagrams, turing machines and languages with only two formation rules

0 references

0 references

0 references

On Lipschitz embedding of finite metric spaces in Hilbert space

0 references

Improving heuristics for network modularity maximization using an exact algorithm

0 references

0 references

A counterexample to the rigidity conjecture for polyhedra

0 references

0 references

Distance geometry in linearizable norms

0 references

0 references

An elementary proof of a theorem of Johnson and Lindenstrauss

0 references

Approximation bounds for sparse principal component analysis

0 references

Algorithms for Generating Fundamental Cycles in a Graph

0 references

Diagonally Dominant Programming in Distance Geometry

0 references

When is `nearest neighbour' meaningful: A converse theorem and implications

0 references

0 references

Recent advances on the interval distance geometry problem

0 references

0 references

Cluster analysis and mathematical programming

0 references

0 references

Nearest-neighbor-preserving embeddings

0 references

Extensions of Lipschitz mappings into a Hilbert space

0 references

Principal component analysis.

0 references

Sparser Johnson-Lindenstrauss Transforms

0 references

0 references

0 references

On Information and Sufficiency

0 references

Computational Experience with the Molecular Distance Geometry Problem

0 references

The discretizable molecular distance geometry problem

0 references

The interval branch-and-prune algorithm for the discretizable molecular distance geometry problem with inexact distances

0 references

Minimal NMR distance information for rigidity of protein graphs

0 references

On the polynomiality of finding \(^K\text{DMDGP}\) re-orders

0 references

Reformulations in Mathematical Programming: Definitions and Systematics

0 references

Undecidability and hardness in mixed-integer nonlinear programming

0 references

On a Relationship Between Graph Realizability and Distance Matrix Completion

0 references

Six mathematical gems from the history of distance geometry

0 references

Euclidean Distance Geometry

0 references

Mathematical programming: Turing completeness and applications to software analysis

0 references

Barvinok's naive algorithm in distance geometry

0 references

A Branch‐and‐Prune algorithm for the Molecular Distance Geometry Problem

0 references

Foundations of computational intelligence. Volume 3: Global optimization

0 references

The Reformulation-Optimization Software Engine

0 references

Molecular distance geometry methods: from continuous to discrete

0 references

Counting the Number of Solutions of KDMDGP Instances

0 references

Euclidean Distance Geometry and Applications

0 references

On the number of realizations of certain Henneberg graphs arising in protein conformation

0 references

Distance Geometry on the Sphere

0 references

0 references

The Discretizable Molecular Distance Geometry Problem seems Easier on Proteins

0 references

The geometry of graphs and some of its algorithmic applications

0 references

0 references

On variants of the Johnson–Lindenstrauss lemma

0 references

Computability of global solutions to factorable nonconvex programs: Part I — Convex underestimating problems

0 references

A multiplicative weights update algorithm for MINLP

0 references

Laplacian matrices of graphs: A survey

0 references

On the Betti Numbers of Real Varieties

0 references

Algorithmic Aspects of Machine Learning

0 references

The discretizable distance geometry problem

0 references

Distance Geometry

0 references

Conic optimization via operator splitting and homogeneous self-dual embedding

0 references

An algorithm for finding a fundamental set of cycles of a graph

0 references

0 references

Interior-point methods

0 references

0 references

Machine Learning: ECML 2004

0 references

Alternating current optimal power flow with generator selection

0 references

On the estimation of unknown distances for a class of Euclidean distance matrix completion problems with interval data

0 references

Graph clustering

0 references

Neural networks and logistic regression: Part I

0 references

0 references

Angular synchronization by eigenvectors and semidefinite programming

0 references

0 references

Global optimization of mixed-integer nonlinear programs: a theoretical and computational study

0 references

0 references

The Johnson-Lindenstrauss Transform: An Empirical Study

0 references

High-Dimensional Probability

0 references

Generalized principal component analysis

0 references

Random Projections for Linear Programming

0 references

Random projections for quadratic programs over a Euclidean ball

0 references

Gaussian random projections for Euclidean membership problems

0 references

0 references

Finding community structures in complex networks using mixed integer optimisation

0 references

Identifiers

10.1007/s11750-020-00563-0

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2192022

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2192022&oldid=36596360"