Direct conditional probability density estimation with sparse feature selection

Verfasser / Beitragende:
[Motoki Shiga, Voot Tangkaratt, Masashi Sugiyama]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 100/2-3(2015-09-01), 161-182
Format:
Artikel (online)
ID: 605478368
LEADER caa a22 4500
001 605478368
003 CHVBK
005 20210128100405.0
007 cr unu---uuuuu
008 210128e20150901xx s 000 0 eng
024 7 0 |a 10.1007/s10994-014-5472-x  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-014-5472-x 
245 0 0 |a Direct conditional probability density estimation with sparse feature selection  |h [Elektronische Daten]  |c [Motoki Shiga, Voot Tangkaratt, Masashi Sugiyama] 
520 3 |a Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have been developed, and a kernel-based approach called least-squares conditional density estimation (LS-CDE) was demonstrated to be promising. However, LS-CDE still suffers from large estimation error if input contains many irrelevant features. In this paper, we therefore propose an extension of LS-CDE called sparse additive CDE (SA-CDE), which allows automatic feature selection in CDE. SA-CDE applies kernel LS-CDE to each input feature in an additive manner and penalizes the whole solution by a group-sparse regularizer. We also give a subgradient-based optimization method for SA-CDE training that scales well to high-dimensional large data sets. Through experiments with benchmark and humanoid robot transition datasets, we demonstrate the usefulness of SA-CDE in noisy CDE problems. 
540 |a The Author(s), 2015 
690 7 |a Conditional density estimation  |2 nationallicence 
690 7 |a Feature selection  |2 nationallicence 
690 7 |a Sparse structured norm  |2 nationallicence 
700 1 |a Shiga  |D Motoki  |u Gifu University, 1-1 Yanagido, Gifu-city, 501-1193, Gifu, Japan  |4 aut 
700 1 |a Tangkaratt  |D Voot  |u Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, 152-8552, Tokyo, Japan  |4 aut 
700 1 |a Sugiyama  |D Masashi  |u Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, 152-8552, Tokyo, Japan  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 161-182  |x 0885-6125  |q 100:2-3<161  |1 2015  |2 100  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-014-5472-x  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-014-5472-x  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Shiga  |D Motoki  |u Gifu University, 1-1 Yanagido, Gifu-city, 501-1193, Gifu, Japan  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Tangkaratt  |D Voot  |u Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, 152-8552, Tokyo, Japan  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sugiyama  |D Masashi  |u Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, 152-8552, Tokyo, Japan  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 161-182  |x 0885-6125  |q 100:2-3<161  |1 2015  |2 100  |o 10994