Regularized feature selection in reinforcement learning

Verfasser / Beitragende:
[Dean Wookey, George Konidaris]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 100/2-3(2015-09-01), 655-676
Format:
Artikel (online)
ID: 605478236
LEADER caa a22 4500
001 605478236
003 CHVBK
005 20210128100404.0
007 cr unu---uuuuu
008 210128e20150901xx s 000 0 eng
024 7 0 |a 10.1007/s10994-015-5518-8  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-015-5518-8 
245 0 0 |a Regularized feature selection in reinforcement learning  |h [Elektronische Daten]  |c [Dean Wookey, George Konidaris] 
520 3 |a We introduce feature regularization during feature selection for value function approximation. Feature regularization introduces a prior into the selection process, improving function approximation accuracy and reducing overfitting. We show that the smoothness prior is effective in the incremental feature selection setting and present closed-form smoothness regularizers for the Fourier and RBF bases. We present two methods for feature regularization which extend the temporal difference orthogonal matching pursuit (OMP-TD) algorithm and demonstrate the effectiveness of the smoothness prior; smooth Tikhonov OMP-TD and smoothness scaled OMP-TD. We compare these methods against OMP-TD, regularized OMP-TD and least squares TD with random projections, across six benchmark domains using two different types of basis functions. 
540 |a The Author(s), 2015 
690 7 |a Feature selection  |2 nationallicence 
690 7 |a Reinforcement learning  |2 nationallicence 
690 7 |a Function approximation  |2 nationallicence 
690 7 |a Regularization  |2 nationallicence 
690 7 |a Linear function approximation  |2 nationallicence 
690 7 |a OMP-TD  |2 nationallicence 
700 1 |a Wookey  |D Dean  |u School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa  |4 aut 
700 1 |a Konidaris  |D George  |u Department of Computer Science, Duke University, 27708, Durham, NC, USA  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 655-676  |x 0885-6125  |q 100:2-3<655  |1 2015  |2 100  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-015-5518-8  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-015-5518-8  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Wookey  |D Dean  |u School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Konidaris  |D George  |u Department of Computer Science, Duke University, 27708, Durham, NC, USA  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 655-676  |x 0885-6125  |q 100:2-3<655  |1 2015  |2 100  |o 10994