Characterizing the languages of the world in terms of their structural similarities and differences is one of the fundamental goals of linguistics. We present a new data-driven approach to linguistic typology, where the differences in the grammars of different languages are encoded in vectors learned from plain text by multilingual neural language models. We then show that it is possible to learn multilingual grammars that can be parameterized using these vectors, allowing a single multilingual grammar to account for the structural patterns of a wide variety of languages. Each language’s unique vector determines how the multilingual grammar is applied to that language. This approach to crosslingual language processing creates exciting opportunities for the development of language technologies for languages facing scarcity of datasets and other resources.

The statistics department is hosting an open seminar by Dr. Kenji Sagae from the linguistics department at UC Davis.

Where: Remote, When: Thursday November 12th 4:10pm

To learn more or to request zoom access, visit the seminar page here.