Efficient Classification for Metric Data
From MaRDI portal
Publication:2986185
DOI10.1109/TIT.2014.2339840zbMATH Open1360.62332arXiv1306.2547MaRDI QIDQ2986185FDOQ2986185
Authors: Robert Krauthgamer, Lee-Ad Gottlieb, Aryeh Kontorovich
Publication date: 16 May 2017
Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)
Abstract: Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].
Full work available at URL: https://arxiv.org/abs/1306.2547
Classification and discrimination; cluster analysis (statistical aspects) (62H30) Pattern recognition, speech recognition (68T10)
Cited In (12)
- Fully dynamic clustering and diversity maximization in doubling metrics
- Adaptive metric dimensionality reduction
- Robust randomized optimization with \(k\) nearest neighbors
- What Is Known About Vertex Cover Kernelization?
- Semi-Lipschitz functions and machine learning for discrete dynamical systems on graphs
- Active nearest-neighbor learning in metric spaces
- Non-uniform packings
- \(L_{p}\)-norm Sauer-Shelah lemma for margin multi-category classifiers
- On the impossibility of dimension reduction for doubling subsets of \(\ell_{p}\)
- An improved uniform convergence bound with fat-shattering dimension
- Boosting conditional probability estimators
- Universal Bayes consistency in metric spaces
This page was built for publication: Efficient Classification for Metric Data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2986185)