AI Seminar: Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords

dekorativt billede

Join us for a talk by Taelin Karidi from the Hebrew University of Jerusalem on September 2, 2022 at 1300.

Title

Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords

Abstract

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized "pseudoword" as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally "sense voids" -- regions that do not correspond to any intelligible sense.

Bio

Taelin Karidi is a second year PhD student in the School of Computer Science at HUJI under the supervision of Prof. Omri Abend. Her research is interdisciplinary and is found at the intersection between NLP, mathematics (mostly geometry) and cognitive science. Taelin is interested in how languages differ from one another on the linguistic and cognitive levels and employs computational tools from mathematics and computer science to investigate this question. Taelin completed a BSc and MSc in pure mathematics at Tel Aviv University. She spent a year as a guest student in the Department of Mathematics at California Institute of Technology (Caltech) and was a visiting researcher in the Hasson lab of the Neuroscience Institute at Princeton University, where she is working on an ongoing project that combines neuroscience and NLP