Semi-supervised learning with density-ratio estimation
From MaRDI portal
Abstract: In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The classification and regression problems are formalized as the supervised learning. In semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, exploiting unlabeled data is important to improve the prediction accuracy in semi-supervised learning. This problems is regarded as a semiparametric estimation problem with missing data. Under the the discriminative probabilistic models, it had been considered that the unlabeled data is useless to improve the estimation accuracy. Recently, it was revealed that the weighted estimator using the unlabeled data achieves better prediction accuracy in comparison to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, the improvement under the semiparametric model with missing data is possible, when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in the semi-supervised learning. The benefit of our approach is that the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on the statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms the supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods.
Recommendations
Cites work
- A paradox concerning nuisance parameters and projected estimating functions
- Asymptotic Statistics
- Asymptotic theory for the semiparametric accelerated failure time model with missing data
- Covariate shift adaptation by importance weighted cross validation
- Density ratio estimation in machine learning. Foreword by Thomas G. Dietterich
- Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
- Elements of Information Theory
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- Importance Sampling Via the Estimated Sampler
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Inferences for case-control and semiparametric two-sample density ratio models
- Information geometry of estimating functions in semi-parametric statistical models
- Semi-supervised learning on Riemannian manifolds
- Soft margins for AdaBoost
- Statistical analysis of kernel-based least-squares density-ratio estimation
- Text classification from labeled and unlabeled documents using EM
- The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter
Cited in
(17)- Efficient and adaptive linear regression in semi-supervised settings
- The use of unlabeled data in predictive modeling
- Semi-supervised learning of class balance under class-prior change by distribution matching
- On semi-supervised linear regression in covariate shift problems
- Semi-supervised logistic discrimination via labeled data and unlabeled data from different sampling distributions
- Semi-supervised inference: general theory and estimation of means
- Density-sensitive semisupervised inference
- Unbiased generative semi-supervised learning
- Safe semi-supervised learning based on weighted likelihood
- Finite-sample analysis of impacts of unlabeled data and their labeling mechanisms in linear discriminant analysis
- Semi-supervised learning based on high density region estimation
- Asymptotic comparison of semi-supervised and supervised linear discriminant functions for heteroscedastic normal populations
- Semi-Supervised Linear Regression
- Semi-supervised learning via constraints
- A General M-estimation Theory in Semi-Supervised Framework
- A novel semisupervised support vector machine classifier based on active learning and context information
- scientific article; zbMATH DE number 6253975 (Why is no real title available?)
This page was built for publication: Semi-supervised learning with density-ratio estimation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q374187)