Policy gradient in Lipschitz Markov Decision Processes

Verfasser / Beitragende:
[Matteo Pirotta, Marcello Restelli, Luca Bascetta]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 100/2-3(2015-09-01), 255-283
Format:
Artikel (online)
ID: 605478279
LEADER caa a22 4500
001 605478279
003 CHVBK
005 20210128100404.0
007 cr unu---uuuuu
008 210128e20150901xx s 000 0 eng
024 7 0 |a 10.1007/s10994-015-5484-1  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-015-5484-1 
245 0 0 |a Policy gradient in Lipschitz Markov Decision Processes  |h [Elektronische Daten]  |c [Matteo Pirotta, Marcello Restelli, Luca Bascetta] 
520 3 |a This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control. 
540 |a The Author(s), 2015 
690 7 |a Reinforcement learning  |2 nationallicence 
690 7 |a Markov Decision Process  |2 nationallicence 
690 7 |a Lipschitz continuity  |2 nationallicence 
690 7 |a Policy gradient algorithm  |2 nationallicence 
700 1 |a Pirotta  |D Matteo  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
700 1 |a Restelli  |D Marcello  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
700 1 |a Bascetta  |D Luca  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 255-283  |x 0885-6125  |q 100:2-3<255  |1 2015  |2 100  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-015-5484-1  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-015-5484-1  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Pirotta  |D Matteo  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Restelli  |D Marcello  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Bascetta  |D Luca  |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 100/2-3(2015-09-01), 255-283  |x 0885-6125  |q 100:2-3<255  |1 2015  |2 100  |o 10994