AI Seminar: Multilingual multimodal language understanding


Desmond ElliotDesmond Elliot, Assistant Professor in the Machine Learning Section at Department of Computer Science, University of Copenhagen.


I will give an overview of my work on grounding natural languages into images. This problem can be addressed as either a language generation problem or a retrieval problem. In the language generation setting of visually-grounded machine translation, I will discuss whether we should use visual representations as an input variable, or as a variable that the model learns to predict. In the image--sentence retrieval setting, I will present experiments on when it is useful to train with multilingual annotations, as opposed to monolingual annotations. I will also highlight some recent research on learning common representations between videos, speech, and text, and discuss directions for future research in these areas.