Publicatie

Cooperative Prioritized Sweeping

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

We present a novel model-based algorithm, Cooperative Prioritized Sweeping, for sample-efficient learning in large multi-agent Markov decision processes. Our approach leverages domain knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment to determine which state-action pairs are the most likely in need to be updated, significantly increasing learning speed. Batch updates can then be performed which efficiently back-propagate knowledge throughout the value function. Our method outperforms the state-of-the-art sparse cooperative Q-learning and QMIX algorithms, both on the well-known SysAdmin benchmark, randomized environments and a fully-observable variation of the well-known firefighter benchmark from Dec-POMDP literature.

Boek: Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021

Series: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Pagina's: 160-168

Aantal pagina's: 9

Jaar van publicatie:2021

ORCID: /0000-0003-2824-7200/work/105290159
Scopus Id: 85112236405
ORCID: /0000-0003-3036-617X/work/93856805
Institutional Repository URL: https://cris.vub.be/ws/files/75769165/p160.pdf
Institutional Repository URL: https://www.ifaamas.org/Proceedings/aamas2021/pdfs/p160.pdf
DOI: https://doi.org/10.5555/3463952.3463977

Toegankelijkheid:Open

Publicatie

Cooperative Prioritized Sweeping

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden

Projecten