Optimizing regression models for data streams with missing values
Gespeichert in:
Verfasser / Beitragende:
[Indrė Žliobaitė, Jaakko Hollmén]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 99/1(2015-04-01), 47-73
Format:
Artikel (online)
Online Zugang:
| LEADER | caa a22 4500 | ||
|---|---|---|---|
| 001 | 605478457 | ||
| 003 | CHVBK | ||
| 005 | 20210128100405.0 | ||
| 007 | cr unu---uuuuu | ||
| 008 | 210128e20150401xx s 000 0 eng | ||
| 024 | 7 | 0 | |a 10.1007/s10994-014-5450-3 |2 doi |
| 035 | |a (NATIONALLICENCE)springer-10.1007/s10994-014-5450-3 | ||
| 245 | 0 | 0 | |a Optimizing regression models for data streams with missing values |h [Elektronische Daten] |c [Indrė Žliobaitė, Jaakko Hollmén] |
| 520 | 3 | |a Automated data acquisition systems, such as wireless sensor networks, surveillance systems, or any system that records data in operating logs, are becoming increasingly common, and provide opportunities for making decision on data in real or nearly real time. In these systems, data is generated continuously resulting in a stream of data, and predictive models need to be built and updated online with the incoming data. In addition, the predictive models need to be able to output predictions continuously, and without delays. Automated data acquisition systems are prone to occasional failures. As a result, missing values may often occur. Nevertheless, predictions need to be made continuously. Hence, predictive models need to have mechanisms for dealing with missing data in such a way that the loss in accuracy due to occasionally missing values would be minimal. In this paper, we theoretically analyze effects of missing values to the accuracy of linear predictive models. We derive the optimal least squares solution that minimizes the expected mean squared error given an expected rate of missing values. Based on this theoretically optimal solution we propose a recursive algorithm for producing and updating linear regression online, without accessing historical data. Our experimental evaluation on eight benchmark datasets and a case study in environmental monitoring with streaming data validate the theoretical results and confirm the effectiveness of the proposed strategy. | |
| 540 | |a The Author(s), 2014 | ||
| 690 | 7 | |a Data streams |2 nationallicence | |
| 690 | 7 | |a Missing data |2 nationallicence | |
| 690 | 7 | |a Linear models |2 nationallicence | |
| 690 | 7 | |a Online regression |2 nationallicence | |
| 690 | 7 | |a Regularized recursive regression |2 nationallicence | |
| 700 | 1 | |a Žliobaitė |D Indrė |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland |4 aut | |
| 700 | 1 | |a Hollmén |D Jaakko |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland |4 aut | |
| 773 | 0 | |t Machine Learning |d Springer US; http://www.springer-ny.com |g 99/1(2015-04-01), 47-73 |x 0885-6125 |q 99:1<47 |1 2015 |2 99 |o 10994 | |
| 856 | 4 | 0 | |u https://doi.org/10.1007/s10994-014-5450-3 |q text/html |z Onlinezugriff via DOI |
| 898 | |a BK010053 |b XK010053 |c XK010000 | ||
| 900 | 7 | |a Metadata rights reserved |b Springer special CC-BY-NC licence |2 nationallicence | |
| 908 | |D 1 |a research-article |2 jats | ||
| 949 | |B NATIONALLICENCE |F NATIONALLICENCE |b NL-springer | ||
| 950 | |B NATIONALLICENCE |P 856 |E 40 |u https://doi.org/10.1007/s10994-014-5450-3 |q text/html |z Onlinezugriff via DOI | ||
| 950 | |B NATIONALLICENCE |P 700 |E 1- |a Žliobaitė |D Indrė |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland |4 aut | ||
| 950 | |B NATIONALLICENCE |P 700 |E 1- |a Hollmén |D Jaakko |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland |4 aut | ||
| 950 | |B NATIONALLICENCE |P 773 |E 0- |t Machine Learning |d Springer US; http://www.springer-ny.com |g 99/1(2015-04-01), 47-73 |x 0885-6125 |q 99:1<47 |1 2015 |2 99 |o 10994 | ||