< Back to previous page

Publication

Epsilon-approximate Pareto optimal set of arms identification in multi-objective multi-armed bandits

Book Contribution - Book Chapter Conference Contribution

Many real-world stochastic environments are inherently multi-objective environments with multiple possibly conflicting objectives.
Techniques from multi-objective optimization are imported into the multi-armed bandits (MAB) problem for efficient exploration/exploitation mechanisms of reward vectors. We introduce the $\varepsilon$-approximate Pareto MAB algorithm that uses the $\varepsilon$-dominance relation such that its upper confidence bound does not depend on the number of best arms, an important feature for environments with relatively many optimal arms. We experimentally show that the $\varepsilon$-approximate Pareto MAB algorithms outperform the performance of the Pareto UCB1 algorithm on a multi-objective Bernoulli problem inspired by a real world control application.
Book: BENELEARN 2014 - 23rd annual Belgian-Dutch Conference on Machine Learning
Pages: 73-80
Number of pages: 8
Publication year:2014
Keywords:Epsilon dominance, Multi-objective optimization, Multi-armed bandits
  • ORCID: /0000-0001-6346-4564/work/65577481