Scalable solutions of interactive POMDPs using generalized and bounded policy iteration

Verfasser / Beitragende:
[Ekhlas Sonu, Prashant Doshi]
Ort, Verlag, Jahr:
2015
Enthalten in:
Autonomous Agents and Multi-Agent Systems, 29/3(2015-05-01), 455-494
Format:
Artikel (online)
ID: 605514747
LEADER caa a22 4500
001 605514747
003 CHVBK
005 20210128100706.0
007 cr unu---uuuuu
008 210128e20150501xx s 000 0 eng
024 7 0 |a 10.1007/s10458-014-9261-5  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10458-014-9261-5 
245 0 0 |a Scalable solutions of interactive POMDPs using generalized and bounded policy iteration  |h [Elektronische Daten]  |c [Ekhlas Sonu, Prashant Doshi] 
520 3 |a Policy iteration algorithms for partially observable Markov decision processes (POMDPs) offer the benefits of quicker convergence compared to value iteration and the ability to operate directly on the solution, which usually takes the form of a finite state automaton. However, the finite state controller tends to grow quickly in size across iterations due to which its evaluation and improvement become computationally costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration, and allow POMDPs to scale. In this article, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform bounded policy iteration with anytime behavior in settings formalized by the interactive POMDP framework, which generalizes POMDPs to non-stationary contexts shared with multiple other agents. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its novel generalization in this article makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling other agents sharing the environment, we ascribe controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We show how we may exploit the agent's initial belief, often available, toward further improving the controller, particularly in large domains, though at the expense of increased computations, which we compensate. We extensively evaluate the approach on multiple problem domains with some that are significantly large in their dimensions, and in contexts with uncertainty about the other agent's frames and those involving multiple other agents, and demonstrate its properties and scalability. 
540 |a The Author(s), 2014 
690 7 |a Decision making  |2 nationallicence 
690 7 |a Multiagent settings  |2 nationallicence 
690 7 |a Policy iteration  |2 nationallicence 
690 7 |a POMDP  |2 nationallicence 
700 1 |a Sonu  |D Ekhlas  |u Department of Computer Science, University of Georgia, 30602, Athens, GA, USA  |4 aut 
700 1 |a Doshi  |D Prashant  |u Department of Computer Science, University of Georgia, 30602, Athens, GA, USA  |4 aut 
773 0 |t Autonomous Agents and Multi-Agent Systems  |d Springer US; http://www.springer-ny.com  |g 29/3(2015-05-01), 455-494  |x 1387-2532  |q 29:3<455  |1 2015  |2 29  |o 10458 
856 4 0 |u https://doi.org/10.1007/s10458-014-9261-5  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10458-014-9261-5  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Sonu  |D Ekhlas  |u Department of Computer Science, University of Georgia, 30602, Athens, GA, USA  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Doshi  |D Prashant  |u Department of Computer Science, University of Georgia, 30602, Athens, GA, USA  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Autonomous Agents and Multi-Agent Systems  |d Springer US; http://www.springer-ny.com  |g 29/3(2015-05-01), 455-494  |x 1387-2532  |q 29:3<455  |1 2015  |2 29  |o 10458