Publicatie

The impatient may use limited optimism to minimize regret

Tijdschriftbijdrage - Tijdschriftartikel

Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may realize that, with hindsight, she could have increased her reward by playing differently: this difference in outcomes constitutes her regret value. The agent may thus elect to follow a regret- minimal strategy. In this paper, it is shown that (1) there always exist regret-minimal strategies that are admissible—a strategy being inadmissible if there is another strategy that always performs better; (2) computing the minimum possible regret or checking that a strategy is regret-minimal can be done in Open image in new window , disregarding the computational cost of numerical analysis (otherwise, this bound becomes Open image in new window ).

Tijdschrift: Lecture notes in computer science

ISSN: 0302-9743

Volume: 114

Pagina's: 133 - 149

Jaar van publicatie:2019

Trefwoorden:A1 Journal article

WoS Id: 000714952800008
DOI: https://doi.org/10.1007/978-3-030-17127-8_8
Handle: https://hdl.handle.net/10067/1634700151162165141

BOF-keylabel:ja

Authors from:Higher Education

Toegankelijkheid:Open

Publicatie

The impatient may use limited optimism to minimize regret

Tijdschriftbijdrage - Tijdschriftartikel

Auteurs/uitgever

Onderzoekseenheden