Understanding Machine Learning

From MaRDI portal
Publication:5494386

DOI10.1017/CBO9781107298019zbMath1305.68005OpenAlexW4236362309MaRDI QIDQ5494386

Shai Ben-David, Shai Shalev-Shwartz

Publication date: 28 July 2014

Full work available at URL: https://doi.org/10.1017/cbo9781107298019



Related Items

Approximation bounds for norm constrained neural networks with applications to regression and GANs, Average Sensitivity of Graph Algorithms, Detection of iterative adversarial attacks via counter attack, A dual-based stochastic inexact algorithm for a class of stochastic nonsmooth convex composite problems, A moment-matching metric for latent variable generative models, Principled deep neural network training through linear programming, Deep empirical risk minimization in finance: Looking into the future, On the approximation of functions by tanh neural networks, Stochastic momentum methods for non-convex learning without bounded assumptions, Fuzzy OWL-Boost: learning fuzzy concept inclusions via real-valued boosting, Attraction-repulsion clustering: a way of promoting diversity linked to demographic parity in fair clustering, Three ways to solve partial differential equations with neural networks — A review, Combining machine learning and domain decomposition methods for the solution of partial differential equations—A review, A class of dimension-free metrics for the convergence of empirical measures, Polynomial‐time universality and limitations of deep learning, A Range Space with Constant VC Dimension for All-pairs Shortest Paths in Graphs, Meta-inductive probability aggregation, Unified analysis of stochastic gradient methods for composite convex and smooth optimization, A stochastic variance reduced gradient using Barzilai-Borwein techniques as second order information, Solving Elliptic Problems with Singular Sources Using Singularity Splitting Deep Ritz Method, Speeding-up one-versus-all training for extreme classification via mean-separating initialization, Learning with risks based on M-location, A mini-batch proximal stochastic recursive gradient algorithm with diagonal Barzilai-Borwein stepsize, A framework of convergence analysis of mini-batch stochastic projected gradient methods, Stochastic chaining and strengthened information-theoretic generalization bounds, Universal regular conditional distributions via probabilistic transformers, Spectral clustering with robust self-learning constraints, Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation, Estimating the clustering coefficient using sample complexity analysis, Front transport reduction for complex moving fronts, Newton-MR: inexact Newton method with minimum residual sub-problem solver, Minimax rates for conditional density estimation via empirical entropy, A mathematical perspective of machine learning, Limitations of neural network training due to numerical instability of backpropagation, Randomized Joint Diagonalization of Symmetric Matrices, New results in cooperative adaptive optimal output regulation, IS CAUSAL REASONING HARDER THAN PROBABILISTIC REASONING?, Byzantine-robust loopless stochastic variance-reduced gradient, Unified SVM algorithm based on LS-DC loss, Towards case-optimized hybrid homomorphic encryption. Featuring the \textsf{Elisabeth} stream cipher, Optimal Algorithms for Stochastic Complementary Composite Minimization, Quicksort leave-pair-out cross-validation for ROC curve analysis, Model-Based Deep Learning, PAC learning halfspaces in non-interactive local differential privacy model with public unlabeled data, Error analysis of deep Ritz methods for elliptic equations, VC dimensions of group convolutional neural networks, Preconditioning meets biased compression for efficient distributed optimization, Quality measures for the evaluation of machine learning architectures on the quantification of epistemic and aleatoric uncertainties in complex dynamical systems, ToFU: topology functional units for deep learning, On the information complexity for integration in subspaces of the Wiener algebra, TNet: A Model-Constrained Tikhonov Network Approach for Inverse Problems, High-resolution probabilistic load forecasting: a learning ensemble approach, Simple and fast algorithm for binary integer and online linear programming, Neural network approximation and estimation of classifiers with classification boundary in a Barron class, Hessian averaging in stochastic Newton methods achieves superlinear convergence, Convergence of an asynchronous block-coordinate forward-backward algorithm for convex composite optimization, Decentralized personalized federated learning: lower bounds and optimal algorithm for all personalization modes, Strong generalization in quantum neural networks, \(\alpha\)QBoost: an iteratively weighted adiabatic trained classifier, Optimal subgroup selection, Physics-constrained data-driven variational method for discrepancy modeling, A stochastic gradient descent algorithm to maximize power utility of large credit portfolios under Marshall-Olkin dependence, The no-free-lunch theorems of supervised learning, A single timescale stochastic quasi-Newton method for stochastic optimization, Tractability from overparametrization: the example of the negative perceptron, Convergence Analysis of the Deep Galerkin Method for Weak Solutions, Adaptive proximal SGD based on new estimating sequences for sparser ERM, Random-reshuffled SARAH does not need full gradient computations, Bayes in action in deep learning and dictionary learning, Unnamed Item, Unnamed Item, Recent theoretical advances in decentralized distributed convex optimization, Near-Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks, Streaming Complexity of SVMs, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Adaptive metric dimensionality reduction, Unlabeled sample compression schemes and corner peelings for ample and maximum classes, CLIP: cheap Lipschitz training of neural networks, Understanding generalization error of SGD in nonconvex optimization, Physically interpretable machine learning algorithm on multidimensional non-linear fields, Machine learning from a continuous viewpoint. I, Nonsmoothness in machine learning: specific structure, proximal identification, and applications, Prediction of magnetization dynamics in a reduced dimensional feature space setting utilizing a low-rank kernel method, Deep microlocal reconstruction for limited-angle tomography, High-dimensional penalty selection via minimum description length principle, Neural network training using \(\ell_1\)-regularization and bi-fidelity data, Crowdsourcing with unsure option, Optimal distributed stochastic mirror descent for strongly convex optimization, A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization, Eigenvalue clustering, control energy, and logarithmic capacity, Solving multiscale steady radiative transfer equation using neural networks with uniform stability, A stochastic subgradient method for distributionally robust non-convex and non-smooth learning, Efficient parameter estimation of truncated Boolean product distributions, Joint feature selection and classification for positive unlabelled multi-label data using weighted penalized empirical risk minimization, Frameworks and results in distributionally robust optimization, Training thinner and deeper neural networks: jumpstart regularization, Joint ranking SVM and binary relevance with robust low-rank learning for multi-label classification, The minimax learning rates of normal and Ising undirected graphical models, Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems, Normal approximations for discrete-time occupancy processes, A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics, A quantum-implementable neural network model, Optimal probability aggregation based on generalized Brier scoring, Hierarchical design of fast minimum disagreement algorithms, Generalization bounds for learning weighted automata, Newton-type methods for non-convex optimization under inexact Hessian information, Absolutely no free lunches!, Properties of the sign gradient descent algorithms, Computational complexity of learning algebraic varieties, On weak \(\epsilon\)-nets and the Radon number, Kolmogorov width decay and poor approximators in machine learning: shallow neural networks, random feature models and neural tangent kernels, Stochastic transitivity: axioms and models, Fast generalization rates for distance metric learning. Improved theoretical analysis for smooth strongly convex distance metric learning, Fast approximation of betweenness centrality through sampling, Fast approximate simulation of finite long-range spin systems, A statistician teaches deep learning, Networks for nonlinear diffusion problems in imaging, A generalized minimal residual based iterative back propagation algorithm for polynomial nonlinear models, Accelerated gradient sliding for minimizing a sum of functions, Domain adaptation -- can quantity compensate for quality?, Stochastic variance reduced gradient methods using a trust-region-like scheme, Convergence of stochastic proximal gradient algorithm, Multi-fidelity deep neural network surrogate model for aerodynamic shape optimization, (Machine) learning parameter regions, Model selection in utility-maximizing binary prediction, Testing conditional independence in supervised learning algorithms, Stability bounds and almost sure convergence of improved particle swarm optimization methods, A multiple criteria nominal classification method based on the concepts of similarity and dissimilarity, A unified convergence analysis of stochastic Bregman proximal gradient and extragradient methods, Bounds for the tracking error of first-order online optimization methods, Volatility forecasting via SVR-GARCH with mixture of Gaussian kernels, Sharpness estimation of combinatorial generalization ability bounds for threshold decision rules, A selective overview of deep learning, Relative utility bounds for empirically optimal portfolios, BEST : A decision tree algorithm that handles missing values, Fastest rates for stochastic mirror descent methods, Control-based algorithms for high dimensional online learning, Solving the Kolmogorov PDE by means of deep learning, Regularisation of neural networks by enforcing Lipschitz continuity, The VC-dimension of axis-parallel boxes on the torus, Twelve great papers: comments and replies. Response to a special issue on logical perspectives on science and cognition -- the philosophy of Gerhard Schurz, Universal Bayes consistency in metric spaces, A hybrid acceleration strategy for nonparallel support vector machine, An exact cutting plane method for \(k\)-submodular function maximization, Solving equations of random convex functions via anchored regression, Superquantiles at work: machine learning applications and efficient subgradient computation, Generalization error of GAN from the discriminator's perspective, Marginal singularity and the benefits of labels in covariate-shift, On sharpness of error bounds for univariate approximation by single hidden layer feedforward neural networks, Deep learning observables in computational fluid dynamics, Machine learning for failure analysis: a mathematical modelling perspective, Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model, On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization, Numerical study of reciprocal recommendation with domain matching, Dimension independent excess risk by stochastic gradient descent, Percolation centrality via Rademacher Complexity, Dot products in \(\mathbb{F}_q^3\) and the Vapnik-Chervonenkis dimension, Perturbed iterate SGD for Lipschitz continuous loss functions, Towards convergence rate analysis of random forests for classification, Coupled block diagonal regularization for multi-view subspace clustering, Regret-minimizing Bayesian persuasion, Quantum learning of concentrated Boolean functions, On the robustness of randomized classifiers to adversarial examples, Efficient fair principal component analysis, Computation of invariant sets via immersion for discrete-time nonlinear systems, On the expressive power of message-passing neural networks as global feature map transformers, Stable recovery of entangled weights: towards robust identification of deep neural networks from minimal samples, A measure theoretical approach to the mean-field maximum principle for training NeurODEs, Compressive sensing and neural networks from a statistical learning perspective, On the perceptron's compression, Suboptimality of constrained least squares and improvements via non-linear predictors, From inexact optimization to learning via gradient concentration, PAC-learning gains of Turing machines over circuits and neural networks, Depth separations in neural networks: what is actually being separated?, The Barron space and the flow-induced function spaces for neural network models, Robust and resource-efficient identification of two hidden layer neural networks, Fast rates by transferring from auxiliary hypotheses, Turnpike in Lipschitz—nonlinear optimal control, Stationary Density Estimation of Itô Diffusions Using Deep Learning, Regularization via Mass Transportation, Approximate diagonalization of some Toeplitz operators and matrices, On Tackling Explanation Redundancy in Decision Trees, WARPd: A Linearly Convergent First-Order Primal-Dual Algorithm for Inverse Problems with Approximate Sharpness Conditions, Kernel Entropy Discriminant Analysis for Dimension Reduction, Particle dual averaging: optimization of mean field neural network with global convergence rate analysis*, Stochastic approximation versus sample average approximation for Wasserstein barycenters, Mean-field inference methods for neural networks, Graphical Convergence of Subgradients in Nonconvex Optimization and Learning, A Vectorization Scheme for Nonconvex Set Optimization Problems, Imaging conductivity from current density magnitude using neural networks*, Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks, \(\ell^1\)-analysis minimization and generalized (co-)sparsity: when does recovery succeed?, Full error analysis for the training of deep neural networks, Logarithmic sample bounds for sample average approximation with capacity- or budget-constraints, Regularizing conjunctive features for classification, Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization, Real estate price estimation in French cities using geocoding and machine learning, Big data driven order-up-to level model: application of machine learning, Nearest neighbor representations of Boolean functions, Complexity of training ReLU neural network, Improving kernel online learning with a snapshot memory, RidgeSketch: A Fast Sketching Based Solver for Large Scale Ridge Regression, Quantum learning Boolean linear functions w.r.t. product distributions, Empirical risk minimization: probabilistic complexity and stepsize strategy, Recovery of Sobolev functions restricted to iid sampling, Regrets of proximal method of multipliers for online non-convex optimization with long term constraints, Learning MAX-SAT from contextual examples for combinatorial optimisation, A convenient infinite dimensional framework for generative adversarial learning, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Learning bounds for quantum circuits in the agnostic setting, Screening Rules and its Complexity for Active Set Identification, Sparse PCA on fixed-rank matrices, Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory, Search for pair-produced vectorlike lepton singlet at the ILC by the XGBoost method, Learning from non-irreducible Markov chains, A theoretical framework for deep transfer learning, Sample Complexity of Sample Average Approximation for Conditional Stochastic Optimization, Deep learning for the generation of heuristics in answer set programming: a case study of graph coloring, On the optimality of averaging in distributed statistical learning, Greedy training algorithms for neural networks and applications to PDEs, Constructing New Weighted 1-Algorithms for the Sparsest Points of Polyhedral Sets, Unnamed Item, Generalization Error in Deep Learning, Artificial Intelligence-Enabled ECG Big Data Mining for Pervasive Heart Health Monitoring, On the Complexity of Learning a Class Ratio from Unlabeled Data, Parallel Optimization Techniques for Machine Learning, Convergence of Newton-MR under Inexact Hessian Information, Discovery of Dynamics Using Linear Multistep Methods, Machine Learning in Adaptive FETI-DP: Reducing the Effort in Sampling, Unnamed Item, Active Nearest-Neighbor Learning in Metric Spaces, Community Detection and Stochastic Block Models, Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization, Near-Optimal Algorithms for Online Matrix Prediction, Unnamed Item, Unnamed Item, Unnamed Item, Coreness of cooperative games with truncated submodular profit functions, Unnamed Item, Unnamed Item, Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions, Machine Learning in Adaptive Domain Decomposition Methods---Predicting the Geometric Location of Constraints, Subset Selection in Sparse Matrices, Supervised Deep Learning in High Energy Phenomenology: a Mini Review*, On the Purity and Entropy of Mixed Gaussian States, Solving inverse problems using data-driven models, On Version Space Compression, MONOTONIC SUPPORT VECTOR MACHINES FOR CREDIT RISK RATING, Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent, Hierarchical Design of Fast Minimum Disagreement Algorithms, Information Preserving Dimensionality Reduction, Unnamed Item, Minimum description length revisited, A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Enhancing Accuracy of Deep Learning Algorithms by Training with Low-Discrepancy Sequences, Unnamed Item, Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures, Unnamed Item, Structure-preserving deep learning, A multi-level procedure for enhancing accuracy of machine learning algorithms, Unnamed Item, Learning in Repeated Auctions, Generalisation error in learning with random features and the hidden manifold model*, Hausdorff dimension, heavy tails, and generalization in neural networks*, Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs, For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability