Theory of Classification: a Survey of Some Recent Advances
From MaRDI portal
Publication:3373749
DOI10.1051/ps:2005018zbMath1136.62355OpenAlexW2014902932WikidataQ58374465 ScholiaQ58374465MaRDI QIDQ3373749
Olivier Bousquet, Stéphane Boucheron, Gábor Lugosi
Publication date: 9 March 2006
Published in: ESAIM: Probability and Statistics (Search for Journal in Brave)
Full work available at URL: http://www.numdam.org/item?id=PS_2005__9__323_0
model selectionempirical processesstatistical learning theoryPattern recognitionconcentration inequalities
Classification and discrimination; cluster analysis (statistical aspects) (62H30) Pattern recognition, speech recognition (68T10)
Related Items
Sampling and empirical risk minimization, Neural network approximation, An empirical classification procedure for nonparametric mixture models, Classification in general finite dimensional spaces with the \(k\)-nearest neighbor rule, Classification with reject option, Efficiency of classification methods based on empirical risk minimization, PAC-Bayesian high dimensional bipartite ranking, Model selection by bootstrap penalization for classification, Guest editorial: Learning theory, On the kernel rule for function classification, Fast learning rates in statistical inference through aggregation, Stability and minimax optimality of tangential Delaunay complexes for manifold reconstruction, Localization of VC classes: beyond local Rademacher complexities, On regularization algorithms in learning theory, Multi-kernel regularized classifiers, How can we identify the sparsity structure pattern of high-dimensional data: an elementary statistical analysis to interpretable machine learning, Mathematical methods of randomized machine learning, Robust statistical learning with Lipschitz and convex loss functions, Optimal functional supervised classification with separation condition, Overlaying classifiers: A practical approach to optimal scoring, A partial overview of the theory of statistics with functional data, A statistical view of clustering performance through the theory of \(U\)-processes, Consistency of learning algorithms using Attouch–Wets convergence, Hold-out estimates of prediction models for Markov processes, Empirical risk minimization for heavy-tailed losses, SVRG meets AdaGrad: painless variance reduction, Unnamed Item, Adaptive partitioning schemes for bipartite ranking, Learning noisy linear classifiers via adaptive and selective sampling, Unnamed Item, Learning bounds for quantum circuits in the agnostic setting, Ranking and empirical minimization of \(U\)-statistics, Local nearest neighbour classification with applications to semi-supervised learning, Properties of convergence of a fuzzy set estimator of the density function, Robustness and generalization, Strong Lp convergence of wavelet deconvolution density estimators, Robust classification via MOM minimization, Risk bounds for CART classifiers under a margin condition, Upper bounds and aggregation in bipartite ranking, Concentration Inequalities for Samples without Replacement, Classification with minimax fast rates for classes of Bayes rules with sparse representation, Plugin procedure in segmentation and application to hyperspectral image segmentation, Kullback-Leibler aggregation and misspecified generalized linear models, Adaptive kernel methods using the balancing principle, Relative deviation learning bounds and generalization with unbounded loss functions, An empirical comparison of learning algorithms for nonparametric scoring: the \textsc{TreeRank} algorithm and other methods, Cover-based combinatorial bounds on probability of overfitting, Combinatorial bounds of overfitting for threshold classifiers, Kernel methods in machine learning, Structure from Randomness in Halfspace Learning with the Zero-One Loss, Obtaining fast error rates in nonconvex situations, Simultaneous adaptation to the margin and to complexity in classification, Classification algorithms using adaptive partitioning, Concentration inequalities for two-sample rank processes with application to bipartite ranking, Convergence conditions for the observed mean method in stochastic programming, Sample Complexity of Classifiers Taking Values in ℝQ, Application to Multi-Class SVMs, Optimal rates of aggregation in classification under low noise assumption, Unnamed Item, Learning by mirror averaging, Constructing processes with prescribed mixing coefficients, Reducing mechanism design to algorithm design via machine learning, Learning sets with separating kernels, Classification with many classes: challenges and pluses, A survey of cross-validation procedures for model selection, Maxisets for model selection, A high-dimensional Wilks phenomenon, Supervised Learning by Support Vector Machines, Testing piecewise functions, Fast learning rates for plug-in classifiers, Variance-based regularization with convex objectives, Sharpness estimation of combinatorial generalization ability bounds for threshold decision rules, Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory, Generalized mirror averaging and \(D\)-convex aggregation, PAC-Bayesian bounds for randomized empirical risk minimizers, Bandwidth selection in kernel empirical risk minimization via the gradient, Measuring the Capacity of Sets of Functions in the Analysis of ERM, Agnostic active learning, Bayesian approach, theory of empirical risk minimization. Comparative analysis, Optimal weighted nearest neighbour classifiers, Optimal survey schemes for stochastic gradient descent with applications to M-estimation, Robust \(k\)-means clustering for distributions with two moments, When are epsilon-nets small?, Unnamed Item, On signal representations within the Bayes decision framework, Adaptive Estimation of the Optimal ROC Curve and a Bipartite Ranking Algorithm, Permutational Rademacher Complexity, Instability, complexity, and evolution, Statistical analysis of Mapper for stochastic and multivariate filters, Percolation centrality via Rademacher Complexity, Unnamed Item, Minimax fast rates for discriminant analysis with errors in variables, Statistical active learning algorithms for noise tolerance and differential privacy, Statistical learning from biased training samples, Stochastic Difference-of-Convex-Functions Algorithms for Nonconvex Programming, For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability, Depth separations in neural networks: what is actually being separated?
Uses Software
Cites Work
- A new approach to least-squares estimation, with applications
- Polynomial bounds for VC dimension of sigmoidal and general Pfaffian neural networks
- Neural networks with quadratic VC dimension
- The Glivenko-Cantelli problem, ten years later
- Poincaré's inequalities and Talagrand's concentration phenomenon for the exponential distribution
- Some limit theorems for empirical processes (with discussion)
- Uniform and universal Glivenko-Cantelli classes
- Combinatorics of random processes and sections of convex bodies
- Risk bounds for statistical learning
- Concentration inequalities and model selection. Ecole d'Eté de Probabilités de Saint-Flour XXXIII -- 2003.
- Estimating a regression function
- Large sample optimality of least squares cross-validation in density estimation
- Universal Donsker classes and metric entropy
- The Glivenko-Cantelli problem
- Some special Vapnik-Chervonenkis classes
- Estimation of dependences based on empirical data. Transl. from the Russian by Samuel Kotz
- Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Exponential inequalities for sums of random vectors
- The densest hemisphere problem
- Existence of submatrices with all possible columns
- Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation
- A finite sample distribution-free performance bound for local discrimination rules
- Central limit theorems for empirical measures
- Bootstrap methods: another look at the jackknife
- Balls in \(\mathbb{R}^k\) do not cut all subsets of \(k+2\) points
- A graph-theoretic generalization of the Sauer-Shelah lemma
- Minimum contrast estimators on sieves: Exponential bounds and rates of convergence
- Risk bounds for model selection via penalization
- A result of Vapnik with applications
- Sharper bounds for Gaussian and empirical processes
- Rates of convergence for minimum contrast estimators
- Predicting \(\{ 0,1\}\)-functions on randomly drawn points
- Sphere packing numbers for subsets of the Boolean \(n\)-cube with bounded Vapnik-Chervonenkis dimension
- On the convexified Sauer-Shelah theorem
- Information inequalities and concentration of measure
- Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors
- On nonparametric estimation of density level sets
- Erratum to: ``A measure concentration inequality for contracting Markov chains
- A decision-theoretic generalization of on-line learning and an application to boosting
- Strong minimax lower bounds for learning
- Model selection in nonparametric regression
- Entropy and the combinatorial dimension
- PAC-Bayesian stochastic model selection
- Vapnik-Chervonenkis type conditions and uniform Donsker classes of functions
- Concentration inequalities using the entropy method
- Symmetrization approach to concentration inequalities for empirical processes.
- Smooth discrimination analysis
- Adaptive model selection using empirical complexities
- Model selection for regression on a fixed design
- A Bennett concentration inequality and its application to suprema of empirical processes
- Hardness results for neural network approximation problems
- A note on margin-based loss functions in classification
- Moment inequalities for functions of independent random variables
- Arcing classifiers. (With discussion)
- Boosting the margin: a new explanation for the effectiveness of voting methods
- Behavioral and prescriptive explanations of a reverse sunk cost effect
- A general lower bound on the number of examples needed for learning
- On the trace of finite sets
- Density and dimension
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- The consistency of the BIC Markov order estimator.
- Empirical margin distributions and bounding the generalization error of combined classifiers
- Some extensions of an inequality of Vapnik and Chervonenkis
- Une inégalité de Bennett pour les maxima de processus empiriques. (A Bennet type inequality for maxima of empirical processes)
- About the constants in Talagrand's concentration inequalities for empirical processes.
- Support vector machines are universally consistent
- Concentration for locally acting permutations
- Complexity regularization via localized random penalties
- Generalization bounds for averaged classifiers
- Statistical learning theory and stochastic optimization. Ecole d'Eté de Probabilitiés de Saint-Flour XXXI -- 2001.
- Population theory for boosting ensembles.
- Process consistency for AdaBoost.
- On the Bayes-risk consistency of regularized boosting methods.
- Statistical behavior and consistency of classification methods based on convex risk minimization.
- Optimal aggregation of classifiers in statistical learning.
- Boosting a weak learning algorithm by majority
- Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers
- Support-vector networks
- Measuring mass concentrations and estimating density contour clusters -- An excess mass approach
- Concentration of measure and isoperimetric inequalities in product spaces
- Weak convergence and empirical processes. With applications to statistics
- Empirical processes and applications: An overview. (With discussion)
- A new look at independence
- Regularization networks and support vector machines
- Local Rademacher complexities and oracle inequalities in risk minimization. (2004 IMS Medallion Lecture). (With discussions and rejoinder)
- Statistical performance of support vector machines
- The probability problem of pattern recognition learning and the method of potential functions
- The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points
- Theoretical foundations of the potential function method in pattern recognition learning
- Weighted sums of certain dependent random variables
- Potential function algorithms for pattern recognition learning machines
- A combinatorial problem; stability and order for models and theories in infinitary languages
- On the density of families of sets
- Bounding \(\bar d\)-distance by informational divergence: A method to prove measure concentration
- Majorizing measures: The generic chaining
- Square root penalty: Adaption to the margin in classification and in edge estimation
- Local Rademacher complexities
- On the mathematical foundations of learning
- 10.1162/153244302760185252
- 10.1162/153244303765208368
- PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification
- 10.1162/153244303768966111
- On Talagrand's deviation inequalities for product measures
- Concentration for Independent Permutations
- Probably Approximate Learning of Sets and Functions
- Probability Inequalities for the Sum of Independent Random Variables
- Learnability and the Vapnik-Chervonenkis dimension
- Density estimation via exponential model selection
- Consistency of Support Vector Machines and Other Regularized Kernel Classifiers
- Capacity of reproducing kernel spaces in learning theory
- A simple proof of the blowing-up lemma (Corresp.)
- Automatic pattern recognition: a study of the probability of error
- Distribution-free performance bounds for potential function rules
- Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations
- Minimum complexity density estimation
- Exponential Bounds for Large Deviations
- Correction to bounds on conditional probabilities with applications
- Distribution-free inequalities for the deleted and holdout error estimates
- Uniform Central Limit Theorems
- Strongly consistent code-based identification and order estimation for constrained finite-state model classes
- Ideal spatial adaptation by wavelet shrinkage
- A Remark on the Szarek–Talagrand Theorem
- Scale-sensitive dimensions, uniform convergence, and learnability
- Boosting With theL2Loss
- A sharp concentration inequality with applications
- Minimax nonparametric classification. II. Model selection for adaptation
- Minimax nonparametric classification .I. Rates of convergence
- Rademacher penalties and structural risk minimization
- The Generic Chaining
- Large-scale typicality of Markov sample paths and consistency of MDL order estimators
- Improving the sample complexity using global data
- Learning Theory
- Structural risk minimization over data-dependent hierarchies
- The importance of convexity in learning with squared loss
- Learning pattern classification-a survey
- On the infeasibility of training neural networks with small mean-squared error
- 10.1162/153244302760200650
- 10.1162/153244302760200704
- 10.1162/1532443041424319
- 10.1162/153244303321897681
- 10.1162/153244303321897690
- 10.1162/1532443041827925
- Concept learning using complexity regularization
- Neural Network Learning
- Finiteness results for sigmoidal “neural” networks
- Probability Inequalities for Sums of Bounded Random Variables
- An alternative point of view on Lepski's method
- Learning Theory
- Enumeration of Seven-Argument Threshold Functions
- A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines
- On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
- Some Comments on C P
- Convexity, Classification, and Risk Bounds
- Convergence of stochastic processes
- Introduction to nonparametric estimation
- Some applications of concentration inequalities to statistics
- Concentration inequalities for set-indexed empirical processes
- A note on Talagrand's concentration inequality
- Improved bounds on the sample complexity of learning
- The elements of statistical learning. Data mining, inference, and prediction
- Model selection and error estimation
- Logistic regression, AdaBoost and Bregman distances
- New concentration inequalities in product spaces
- A new look at the statistical model identification
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item