AI Seminar: Multilingual multimodal language understanding
Desmond Elliot, Assistant Professor in the Machine Learning Section at Department of Computer Science, University of Copenhagen.
I will give an overview of my work on grounding natural languages into images. This problem can be addressed as either a language generation problem or a retrieval problem. In the language generation setting of visually-grounded machine translation, I will discuss whether we should use visual representations as an input variable, or as a variable that the model learns to predict. In the image--sentence retrieval setting, I will present experiments on when it is useful to train with multilingual annotations, as opposed to monolingual annotations. I will also highlight some recent research on learning common representations between videos, speech, and text, and discuss directions for future research in these areas.
This seminar is a part of the AI Seminar Series organised by SCIENCE AI Centre. The series highlights advances and challenges in research within Machine Learning, Data Science, and AI. Like the AI Centre itself, the seminar series has a broad scope, covering both new methodological contributions, ground-breaking applications, and impacts on society.