no code implementations • NAACL (SIGTYP) 2022 • Christo Kirov, Richard Sproat, Alexander Gutkin
For reflex generation, the missing reflexes are treated as “masked pixels” in an “image” which is a representation of an entire cognate set across a language family.
1 code implementation • 19 May 2023 • Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar
We evaluate commonly used models on the benchmark.
no code implementations • 6 Mar 2023 • Elizabeth Nielsen, Christo Kirov, Brian Roark
Using a set of probe words unique to either British or American English, we first establish that training corpora exhibit substantial (though not total) consistency.
no code implementations • Findings (EMNLP) 2021 • Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat
Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages.
1 code implementation • LREC 2020 • Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang
In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.
no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
3 code implementations • LREC 2018 • Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages.
no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.
3 code implementations • TACL 2018 • Christo Kirov, Ryan Cotterell
We suggest that the empirical performance of modern networks warrants a re-examination of their utility in linguistic and cognitive modeling.
no code implementations • TACL 2019 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
We quantify the linguistic complexity of different languages' morphological systems.
no code implementations • NAACL 2018 • Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner
Lexical ambiguity makes it difficult to compute various useful statistics of a corpus.
no code implementations • 23 Apr 2018 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
Many languages' inflectional morphological systems are replete with irregulars, i. e., words that do not seem to follow standard inflectional rules.
no code implementations • EMNLP 2017 • Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky
The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task.
no code implementations • CONLL 2017 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by predicting all of the remaining inflected forms.
no code implementations • EACL 2017 • Christo Kirov, John Sylak-Glassman, Rebecca Knowles, Ryan Cotterell, Matt Post
A traditional claim in linguistics is that all human languages are equally expressive{---}able to convey the same wide range of meanings.
no code implementations • EACL 2017 • Ryan Cotterell, John Sylak-Glassman, Christo Kirov
Many of the world{'}s languages contain an abundance of inflected forms for each lexeme.
no code implementations • LREC 2016 • John Sylak-Glassman, Christo Kirov, David Yarowsky
We present methods inspired by linguistic fieldwork for gathering inflectional paradigm data in a machine-readable, interoperable format from remotely-located speakers of any language.
no code implementations • LREC 2016 • Christo Kirov, John Sylak-Glassman, Roger Que, David Yarowsky
Wiktionary is a large-scale resource for cross-lingual lexical information with great potential utility for machine translation (MT) and many other NLP tasks, especially automatic morphological analysis and generation.