Improving classification performance through selective instance completion

Verfasser / Beitragende:
[Amit Dhurandhar, Karthik Sankaranarayanan]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 100/2-3(2015-09-01), 425-447
Format:
Artikel (online)
ID: 60547821X
LEADER caa a22 4500
001 60547821X
003 CHVBK
005 20210128100404.0
007 cr unu---uuuuu
008 210128e20150901xx s 000 0 eng
024 7 0 |a 10.1007/s10994-015-5500-5  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-015-5500-5 
245 0 0 |a Improving classification performance through selective instance completion  |h [Elektronische Daten]  |c [Amit Dhurandhar, Karthik Sankaranarayanan] 
520 3 |a In multiple domains, actively acquiring missing input information at a reasonable cost in order to improve our understanding of the input-output relationships is of increasing importance. This problem has gained prominence in healthcare, public policy making, education, and in the targeted advertising industry which tries to best match people to products. In this paper we tackle an important variant of this problem: Instance completion, where we want to choose the best k incomplete instances to query from a much larger universe of N ( $$\gg $$ ≫ k) incomplete instances so as to learn the most accurate classifier. We propose a principled framework which motivates a generally applicable yet efficient meta-technique for choosing k such instances. Since we cannot know a priori the classifier that will result from the completed dataset, i.e. the final classifier, our method chooses the k instances based on a derived upper bound on the expectation of the distance between the next classifier and the final classifier. We additionally derive a sufficient condition for these two solutions to match. We then empirically evaluate the performance of our method relative to the state-of-the-art methods on four UCI datasets as well as three proprietary e-commerce datasets used in previous studies. In these experiments, we also demonstrate how close we are likely to be to the optimal solution, by quantifying the extent to which our sufficient condition is satisfied. Lastly, we show that our method is easily extensible to the setting where we have a non-uniform cost associated with acquiring the missing information. 
540 |a The Author(s), 2015 
690 7 |a Instance completion  |2 nationallicence 
690 7 |a Active feature acquisition  |2 nationallicence 
690 7 |a Missing data  |2 nationallicence 
700 1 |a Dhurandhar  |D Amit  |u IBM T.J. Watson, Yorktown Heights, NY, USA  |4 aut 
700 1 |a Sankaranarayanan  |D Karthik  |u IBM India Research Lab, Bangalore, India  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 425-447  |x 0885-6125  |q 100:2-3<425  |1 2015  |2 100  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-015-5500-5  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-015-5500-5  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Dhurandhar  |D Amit  |u IBM T.J. Watson, Yorktown Heights, NY, USA  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sankaranarayanan  |D Karthik  |u IBM India Research Lab, Bangalore, India  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 425-447  |x 0885-6125  |q 100:2-3<425  |1 2015  |2 100  |o 10994