Optimizing regression models for data streams with missing values

Verfasser / Beitragende:
[Indrė Žliobaitė, Jaakko Hollmén]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 99/1(2015-04-01), 47-73
Format:
Artikel (online)
ID: 605478457
LEADER caa a22 4500
001 605478457
003 CHVBK
005 20210128100405.0
007 cr unu---uuuuu
008 210128e20150401xx s 000 0 eng
024 7 0 |a 10.1007/s10994-014-5450-3  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-014-5450-3 
245 0 0 |a Optimizing regression models for data streams with missing values  |h [Elektronische Daten]  |c [Indrė Žliobaitė, Jaakko Hollmén] 
520 3 |a Automated data acquisition systems, such as wireless sensor networks, surveillance systems, or any system that records data in operating logs, are becoming increasingly common, and provide opportunities for making decision on data in real or nearly real time. In these systems, data is generated continuously resulting in a stream of data, and predictive models need to be built and updated online with the incoming data. In addition, the predictive models need to be able to output predictions continuously, and without delays. Automated data acquisition systems are prone to occasional failures. As a result, missing values may often occur. Nevertheless, predictions need to be made continuously. Hence, predictive models need to have mechanisms for dealing with missing data in such a way that the loss in accuracy due to occasionally missing values would be minimal. In this paper, we theoretically analyze effects of missing values to the accuracy of linear predictive models. We derive the optimal least squares solution that minimizes the expected mean squared error given an expected rate of missing values. Based on this theoretically optimal solution we propose a recursive algorithm for producing and updating linear regression online, without accessing historical data. Our experimental evaluation on eight benchmark datasets and a case study in environmental monitoring with streaming data validate the theoretical results and confirm the effectiveness of the proposed strategy. 
540 |a The Author(s), 2014 
690 7 |a Data streams  |2 nationallicence 
690 7 |a Missing data  |2 nationallicence 
690 7 |a Linear models  |2 nationallicence 
690 7 |a Online regression  |2 nationallicence 
690 7 |a Regularized recursive regression  |2 nationallicence 
700 1 |a Žliobaitė  |D Indrė  |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland  |4 aut 
700 1 |a Hollmén  |D Jaakko  |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 99/1(2015-04-01), 47-73  |x 0885-6125  |q 99:1<47  |1 2015  |2 99  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-014-5450-3  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-014-5450-3  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Žliobaitė  |D Indrė  |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Hollmén  |D Jaakko  |u Department of Information and Computer Science, Aalto University, Aalto, PO Box 15400, 00076, Espoo, Finland  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 99/1(2015-04-01), 47-73  |x 0885-6125  |q 99:1<47  |1 2015  |2 99  |o 10994