Learning from untrusted data
From MaRDI portal
Publication:4977960
DOI10.1145/3055399.3055491zbMATH Open1369.68277arXiv1611.02315OpenAlexW2554864439MaRDI QIDQ4977960FDOQ4977960
Authors: Jacob Steinhardt, Gregory Valiant, Moses Charikar
Publication date: 17 August 2017
Published in: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (Search for Journal in Brave)
Abstract: The vast majority of theoretical results in machine learning and statistics assume that the available training data is a reasonably reliable reflection of the phenomena to be learned or estimated. Similarly, the majority of machine learning and statistical techniques used in practice are brittle to the presence of large amounts of biased or malicious data. In this work we consider two frameworks in which to study estimation, learning, and optimization in the presence of significant fractions of arbitrary data. The first framework, list-decodable learning, asks whether it is possible to return a list of answers, with the guarantee that at least one of them is accurate. For example, given a dataset of points for which an unknown subset of points are drawn from a distribution of interest, and no assumptions are made about the remaining points, is it possible to return a list of answers, one of which is correct? The second framework, which we term the semi-verified learning model, considers the extent to which a small dataset of trusted data (drawn from the distribution in question) can be leveraged to enable the accurate extraction of information from a much larger but untrusted dataset (of which only an -fraction is drawn from the distribution). We show strong positive results in both settings, and provide an algorithm for robust learning in a very general stochastic optimization setting. This general result has immediate implications for robust estimation in a number of settings, including for robustly estimating the mean of distributions with bounded second moments, robustly learning mixtures of such distributions, and robustly finding planted partitions in random graphs in which significant portions of the graph have been perturbed by an adversary.
Full work available at URL: https://arxiv.org/abs/1611.02315
Recommendations
Cited In (20)
- Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers
- Stronger data poisoning attacks break data sanitization defenses
- Best Arm Identification for Contaminated Bandits
- Learning from Multiple Sources of Inaccurate Data
- DLP learning from uncertain data
- Title not available (Why is that?)
- Efficiently learning structured distributions from untrusted batches
- Efficient parameter estimation of truncated Boolean product distributions
- Mean estimation and regression under heavy-tailed distributions: A survey
- Nearly optimal robust secret sharing against rushing adversaries
- Title not available (Why is that?)
- Robust Estimators in High-Dimensions Without the Computational Intractability
- A characterization of list learnability
- Algorithms approaching the threshold for semi-random planted clique
- Sampling Correctors
- Distributed statistical estimation and rates of convergence in normal approximation
- Title not available (Why is that?)
- Robust supervised learning with coordinate gradient descent
- Corruption-tolerant bandit learning
- Learning under \(p\)-tampering poisoning attacks
This page was built for publication: Learning from untrusted data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4977960)