< Terug naar vorige pagina

Publicatie

LTLf-based Reward Shaping for Reinforcement Learning

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

Reinforcement Learning usually does not scale up well to large problems. It typically takes a Reinforcement Learning agent many trials until it can reach a satisfying policy. A main contributing factor to this problem is the fact that Reinforcement Learning is often used for learning exclusively by means of trial and error. There has been much work that addresses incorporating domain knowledge in Reinforcement Learning to allow more efficient learning. Reward shaping is a well-established method to incorporate domain knowledge in Reinforcement Learning by providing the learning agent with a supplementary reward. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic formulas. Linear Temporal Logic in our work serves as a rich, yet compact, language that allows the user to express the domain knowledge with minimum effort. Linear Temporal Logic is also rather easy to be expressed in natural language which makes it easier for non-expert users. We use the flag collection domain to demonstrate empirically the increase in both the convergence speed and the quality of the learned policy despite the minimum domain knowledge provided.
Boek: Proceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS
Aantal pagina's: 6
Jaar van publicatie:2021
Trefwoorden:Reinforcement Learning, Reward Shaping, Linear Temporal Logic on finite traces
  • ORCID: /0000-0001-9094-4221/work/93245335
  • ORCID: /0000-0002-2235-5115/work/93244180
  • ORCID: /0000-0001-6346-4564/work/93243351
Toegankelijkheid:Closed