Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data
From MaRDI portal
Publication:6138630
DOI10.1214/23-AOAS1757arXiv2303.08242OpenAlexW4388085831MaRDI QIDQ6138630FDOQ6138630
Publication date: 16 January 2024
Published in: The Annals of Applied Statistics (Search for Journal in Brave)
Abstract: The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.
Full work available at URL: https://arxiv.org/abs/2303.08242
Cites Work
- Dynamic Linear Models with R
- Monte Carlo strategies in scientific computing.
- Moment bounds for stationary mixing sequences
- Title not available (Why is that?)
- Title not available (Why is that?)
- Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix
- Bayesian forecasting and dynamic models.
- Copula-based semiparametric models for multivariate time series
- Title not available (Why is that?)
- Title not available (Why is that?)
- Sampling Algorithms and Coresets for $\ell_p$ Regression
- Detection of Influential Observation in Linear Regression
- Title not available (Why is that?)
- Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
- Optimal Design of Experiments
- Inference of time-varying regression models
- Title not available (Why is that?)
- On statistics, computation and scalability
- Principles of Optimal Design
- SOME THEOREMS IN LEAST SQUARES
- Simultaneous Inference of Linear Models with Time Varying Coefficients
- Title not available (Why is that?)
- On the sequential construction of optimum bounded designs
- Information-Based Optimal Subdata Selection for Big Data Linear Regression
- Computational Advertising: Techniques for Targeting Relevant Ads
- Optimal Subsampling for Large Sample Logistic Regression
- Sequential online subsampling for thinning experimental designs
- Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data
- LowCon: A Design-based Subsampling Approach in a Misspecified Linear Model
- Orthogonal subsampling for big data linear regression
- Online Censoring for Large-Scale Regressions with Application to Streaming Big Data
- Title not available (Why is that?)
This page was built for publication: Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6138630)