AI Seminar: Efficient exploration in sequential decision making problems


Yasin Abbasi-YadkoriYasin Abbasi-Yadkori, researcher at VinAI.


I will discuss recent results in designing more adaptive bandit algorithms. Our first approach is based on the bootstrap method and leads to a more efficient and data-dependent algorithm for the multi-armed bandit problem. Our second approach is a model-selection method for bandit problems. As an example of the usefulness of the approach, when the reward function is largely independent of the contexts, the method will automatically converge to the simpler and more efficient non-contextual algorithm.