Policy gradient in Lipschitz Markov Decision Processes
Gespeichert in:
Verfasser / Beitragende:
[Matteo Pirotta, Marcello Restelli, Luca Bascetta]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 100/2-3(2015-09-01), 255-283
Format:
Artikel (online)
Online Zugang:
| LEADER | caa a22 4500 | ||
|---|---|---|---|
| 001 | 605478279 | ||
| 003 | CHVBK | ||
| 005 | 20210128100404.0 | ||
| 007 | cr unu---uuuuu | ||
| 008 | 210128e20150901xx s 000 0 eng | ||
| 024 | 7 | 0 | |a 10.1007/s10994-015-5484-1 |2 doi |
| 035 | |a (NATIONALLICENCE)springer-10.1007/s10994-015-5484-1 | ||
| 245 | 0 | 0 | |a Policy gradient in Lipschitz Markov Decision Processes |h [Elektronische Daten] |c [Matteo Pirotta, Marcello Restelli, Luca Bascetta] |
| 520 | 3 | |a This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control. | |
| 540 | |a The Author(s), 2015 | ||
| 690 | 7 | |a Reinforcement learning |2 nationallicence | |
| 690 | 7 | |a Markov Decision Process |2 nationallicence | |
| 690 | 7 | |a Lipschitz continuity |2 nationallicence | |
| 690 | 7 | |a Policy gradient algorithm |2 nationallicence | |
| 700 | 1 | |a Pirotta |D Matteo |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | |
| 700 | 1 | |a Restelli |D Marcello |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | |
| 700 | 1 | |a Bascetta |D Luca |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | |
| 773 | 0 | |t Machine Learning |d Springer US; http://www.springer-ny.com |g 100/2-3(2015-09-01), 255-283 |x 0885-6125 |q 100:2-3<255 |1 2015 |2 100 |o 10994 | |
| 856 | 4 | 0 | |u https://doi.org/10.1007/s10994-015-5484-1 |q text/html |z Onlinezugriff via DOI |
| 898 | |a BK010053 |b XK010053 |c XK010000 | ||
| 900 | 7 | |a Metadata rights reserved |b Springer special CC-BY-NC licence |2 nationallicence | |
| 908 | |D 1 |a research-article |2 jats | ||
| 949 | |B NATIONALLICENCE |F NATIONALLICENCE |b NL-springer | ||
| 950 | |B NATIONALLICENCE |P 856 |E 40 |u https://doi.org/10.1007/s10994-015-5484-1 |q text/html |z Onlinezugriff via DOI | ||
| 950 | |B NATIONALLICENCE |P 700 |E 1- |a Pirotta |D Matteo |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | ||
| 950 | |B NATIONALLICENCE |P 700 |E 1- |a Restelli |D Marcello |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | ||
| 950 | |B NATIONALLICENCE |P 700 |E 1- |a Bascetta |D Luca |u Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Piazza Leonardo Da Vinci, 32, I-20133, Milan, Italy |4 aut | ||
| 950 | |B NATIONALLICENCE |P 773 |E 0- |t Machine Learning |d Springer US; http://www.springer-ny.com |g 100/2-3(2015-09-01), 255-283 |x 0885-6125 |q 100:2-3<255 |1 2015 |2 100 |o 10994 | ||