Search Results for author: Christo Kirov

Found 24 papers, 5 papers with code

Mockingbird at the SIGTYP 2022 Shared Task: Two Types of Models forthe Prediction of Cognate Reflexes

no code implementations • NAACL (SIGTYP) 2022 • Christo Kirov, Richard Sproat, Alexander Gutkin

For reflex generation, the missing reflexes are treated as “masked pixels” in an “image” which is a representation of an entire cognate set across a language family.

Decoder Image Restoration

Paper
Add Code

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

1 code implementation • 19 May 2023 • Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar

We evaluate commonly used models on the benchmark.

In-Context Learning Multilingual NLP +3

Paper
Code

Spelling convention sensitivity in neural language models

no code implementations • 6 Mar 2023 • Elizabeth Nielsen, Christo Kirov, Brian Roark

Using a set of probe words unique to either British or American English, we first establish that training corpora exhibit substantial (though not total) consistency.

Language Modelling

Paper
Add Code

Structured abbreviation expansion in context

no code implementations • Findings (EMNLP) 2021 • Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages.

Spelling Correction

Paper
Add Code

Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

1 code implementation • LREC 2020 • Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall

This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages.

Language Modelling Sentence +1

183

Paper
Code

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden

Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.

Hallucination Morphological Inflection

Paper
Code

Neural Polysynthetic Language Modelling

no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang

In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.

Language Modelling Lemmatization +1

Paper
Add Code

UniMorph 3.0: Universal Morphology

no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Paper
Add Code

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.

Cross-Lingual Transfer Lemmatization +3

Paper
Add Code

UniMorph 2.0: Universal Morphology

3 code implementations • LREC 2018 • Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages.

LEMMA

Paper
Code

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.

LEMMA Task 2

Paper
Add Code

Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate

3 code implementations • TACL 2018 • Christo Kirov, Ryan Cotterell

We suggest that the empirical performance of modern networks warrants a re-examination of their utility in linguistic and cognitive modeling.

Decoder

Paper
Code

On the Complexity and Typology of Inflectional Morphological Systems

no code implementations • TACL 2019 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner

We quantify the linguistic complexity of different languages' morphological systems.

Paper
Add Code

Unsupervised Disambiguation of Syncretism in Inflected Lexicons

no code implementations • NAACL 2018 • Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus.

Paper
Add Code

On the Diachronic Stability of Irregularity in Inflectional Morphology

no code implementations • 23 Apr 2018 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner

Many languages' inflectional morphological systems are replete with irregulars, i. e., words that do not seem to follow standard inflectional rules.

Relation

Paper
Add Code

Improving Low Resource Machine Translation using Morphological Glosses (Non-archival Extended Abstract)

no code implementations • WS 2018 • Steven Shearing, Christo Kirov, Huda Khayrallah, David Yarowsky

Data Augmentation Machine Translation +1

Paper
Add Code

Paradigm Completion for Derivational Morphology

no code implementations • EMNLP 2017 • Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task.

Paper
Add Code

CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

no code implementations • CONLL 2017 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by predicting all of the remaining inflected forms.

Data Augmentation Inductive Bias +2

Paper
Add Code

A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax

no code implementations • EACL 2017 • Christo Kirov, John Sylak-Glassman, Rebecca Knowles, Ryan Cotterell, Matt Post

A traditional claim in linguistics is that all human languages are equally expressive{---}able to convey the same wide range of meanings.

Dependency Parsing Machine Translation +3

Paper
Add Code

Neural Graphical Models over Strings for Principal Parts Morphological Paradigm Completion

no code implementations • EACL 2017 • Ryan Cotterell, John Sylak-Glassman, Christo Kirov

Many of the world{'}s languages contain an abundance of inflected forms for each lexeme.

Morphological Analysis

Paper
Add Code

The SIGMORPHON 2016 Shared Task---Morphological Reinflection

no code implementations • WS 2016 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, Mans Hulden

Morphological Analysis

Paper
Add Code

Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages

no code implementations • LREC 2016 • John Sylak-Glassman, Christo Kirov, David Yarowsky

We present methods inspired by linguistic fieldwork for gathering inflectional paradigm data in a machine-readable, interoperable format from remotely-located speakers of any language.

Morphological Analysis

Paper
Add Code

Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms

no code implementations • LREC 2016 • Christo Kirov, John Sylak-Glassman, Roger Que, David Yarowsky

Wiktionary is a large-scale resource for cross-lingual lexical information with great potential utility for machine translation (MT) and many other NLP tasks, especially automatic morphological analysis and generation.

Machine Translation Morphological Analysis +1

Paper
Add Code

A Language-Independent Feature Schema for Inflectional Morphology

no code implementations • IJCNLP 2015 • John Sylak-Glassman, Christo Kirov, David Yarowsky, Roger Que

Machine Translation Morphological Analysis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.