AI Seminar: Algorithms for Easy and Hard Online Learning Environments – University of Copenhagen

SCIENCE AI Centre > Events > AI Seminar: Yevgeny Se...

AI Seminar: Algorithms for Easy and Hard Online Learning Environments

Speaker

Yevgeny Seldin, Associate Professor in the Machine Learning Section at Department of Computer Science, University of Copenhagen.

Abstract

Online learning is a subfield of machine learning modeling situations, where data acquisition is interleaved with decision making and acting. Examples include investment in the stock market, personalized advertising, online ranking, and so on. Most of existing algorithms for online learning make very specific assumptions about the nature of environment they operate in. The most common are i.i.d. and adversarial environments. For example, weather prediction is an i.i.d. environment and spam filtering is an adversarial environment. However, in many real-world scenarios the environment is neither purely i.i.d. nor fully adversarial, but something in between. Online learning problems also differ by the amount of feedback that the algorithm receives at every round of interaction with the environment. The most common forms are full information and limited (a.k.a. bandit) feedback. In full information the outcomes of all actions are observed and in bandit feedback only the outcome of the action taken. For example, in stock market investment the rates of all stocks are observed [full info] and in medical treatments only the outcome of the applied treatment is observed [bandit feedback]. There are also intermediate forms of feedback, known as "prediction with limited advice", where outcome of more than one, but less than all actions is observed.

I will give a quick introduction into online learning and then survey the field of algorithms that are optimal over ranges of problems and present some new results, including (1) an algorithm that is optimal for prediction with limited advice in adversarial environments, adversarial environments with small effective outcome range, and i.i.d. environments, without prior knowledge of the nature of the environment; and (2) an algorithm that is optimal for i.i.d., adversarial, moderately contaminated i.i.d., and stochastically constrained adversarial bandits, without prior knowledge of the nature of the environment.