AI Seminar: Data governance and transparency for Large Language Models: lessons from the BigScience Workshop
Join us for a talk by Anna Rogers, Assistant Professor in the Center for Social Data Science at the University of Copenhagen. Everybody is welcome to attend.
Title
Data governance and transparency for Large Language Models: lessons from the BigScience Workshop
Abstract
The continued growth of LLMs and their wide-scale adoption in commercial applications such as chatGPT make it increasingly important to (a) develop ways to source their training data in a more transparent way, and (b) to investigate it, both for research and for ethical issues. This talk will discuss the current state of affairs and some data governance lessons learned from Big Science, an open-source effort to train a multilingual LLM - including an ongoing effort for investigating the 1.6 Tb multilingual ROOTS corpus.
This seminar is a part of the AI Seminar Series organised by SCIENCE AI Centre. The series highlights advances and challenges in research within Machine Learning, Data Science, and AI. Like the AI Centre itself, the seminar series has a broad scope, covering both new methodological contributions, ground-breaking applications, and impacts on society.