Search Results for author: Richard Sproat

Found 32 papers, 8 papers with code

Boring Problems Are Sometimes the Most Interesting

no code implementations • CL (ACL) 2022 • Richard Sproat

In a recent position paper, Turing Award Winners Yoshua Bengio, Geoffrey Hinton, and Yann LeCun make the case that symbolic methods are not needed in AI and that, while there are still many issues to be resolved, AI will be solved using purely neural methods.

Paper
Add Code

Mockingbird at the SIGTYP 2022 Shared Task: Two Types of Models forthe Prediction of Cognate Reflexes

no code implementations • NAACL (SIGTYP) 2022 • Christo Kirov, Richard Sproat, Alexander Gutkin

For reflex generation, the missing reflexes are treated as “masked pixels” in an “image” which is a representation of an entire cognate set across a language family.

Image Restoration

Paper
Add Code

The Taxonomy of Writing Systems: How to Measure How Logographic a System Is

no code implementations • CL (ACL) 2021 • Richard Sproat, Alexander Gutkin

Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.

Paper
Add Code

BiPhone: Modeling Inter Language Phonetic Influences in Text

no code implementations • 6 Jul 2023 • Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, Aravindan Raghuveer

To the best of our knowledge, FunGLUE is the first benchmark to introduce L1-L2 interactions in text.

Paper
Add Code

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

no code implementations • 7 Jun 2023 • Shigeki Karita, Richard Sproat, Haruko Ishikawa

Word error rate (WER) and character error rate (CER) are standard metrics in Speech Recognition (ASR), but one problem has always been alternative spellings: If one's system transcribes adviser whereas the ground truth has advisor, this will count as an error even though the two spellings really represent the same word.

Machine Translation speech-recognition +2

Paper
Add Code

Beyond Arabic: Software for Perso-Arabic Script Manipulation

1 code implementation • 26 Jan 2023 • Alexander Gutkin, Cibu Johny, Raiomond Doctor, Brian Roark, Richard Sproat

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.

Transliteration

Paper
Code

Graphemic Normalization of the Perso-Arabic Script

1 code implementation • 21 Oct 2022 • Raiomond Doctor, Alexander Gutkin, Cibu Johny, Brian Roark, Richard Sproat

Since its original appearance in 1991, the Perso-Arabic script representation in Unicode has grown from 169 to over 440 atomic isolated characters spread over several code pages representing standard letters, various diacritics and punctuation for the original Arabic and numerous other regional orthographic traditions.

Language Modelling Machine Translation

32,845

Paper
Code

Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation

1 code implementation • 18 Oct 2022 • Llion Jones, Richard Sproat, Haruko Ishikawa, Alexander Gutkin

If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it?

32,852

Paper
Code

Structured abbreviation expansion in context

no code implementations • Findings (EMNLP) 2021 • Kyle Gorman, Christo Kirov, Brian Roark, Richard Sproat

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages.

Spelling Correction

Paper
Add Code

Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities

1 code implementation • COLING 2020 • Hao Zhang, Jae Ro, Richard Sproat

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search.

Chinese Word Segmentation Speech Synthesis +1

Paper
Code

Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities

1 code implementation • 5 Nov 2020 • Hao Zhang, Jae Ro, Richard Sproat

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search.

Chinese Word Segmentation Speech Synthesis +1

Paper
Code

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

1 code implementation • 14 Oct 2020 • Alena Butryna, Shan-Hui Cathy Chu, Isin Demirsahin, Alexander Gutkin, Linne Ha, Fei He, Martin Jansche, Cibu Johny, Anna Katanova, Oddur Kjartansson, Chenfang Li, Tatiana Merkulova, Yin May Oo, Knot Pipatsrisawat, Clara Rivera, Supheakmungkol Sarin, Pasindu De Silva, Keshan Sodimana, Richard Sproat, Theeraphol Wattanavekin, Jaka Aris Eko Wibawa

This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

365

Paper
Code

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

1 code implementation • EMNLP (SIGTYP) 2020 • Alexander Gutkin, Richard Sproat

This paper describes the NEMO submission to SIGTYP 2020 shared task which deals with prediction of linguistic typological features for multiple languages using the data derived from World Atlas of Language Structures (WALS).

regression

32,850

Paper
Code

Neural Models of Text Normalization for Speech Applications

no code implementations • CL 2019 • Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark

One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS).

BIG-bench Machine Learning Speech Synthesis +1

Paper
Add Code

Automatic Ambiguity Detection

no code implementations • 28 May 2019 • Richard Sproat, Jan van Santen

Most work on sense disambiguation presumes that one knows beforehand -- e. g. from a thesaurus -- a set of polysemous terms.

Paper
Add Code

Fast and Accurate Reordering with ITG Transition RNN

no code implementations • COLING 2018 • Hao Zhang, Axel Ng, Richard Sproat

Compared to a strong baseline of attention-based RNN, our ITG RNN re-ordering model can reach the same reordering accuracy with only 1/10 of the training data and is 2. 5x faster in decoding.

Feature Engineering Machine Translation +3

Paper
Add Code

Keynote Lecture 2: Neural (and other Machine Learning) Approaches to Text Normalization

no code implementations • WS 2016 • Richard Sproat

BIG-bench Machine Learning

Paper
Add Code

RNN Approaches to Text Normalization: A Challenge

1 code implementation • 31 Oct 2016 • Richard Sproat, Navdeep Jaitly

Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.

Paper
Code

Minimally Supervised Written-to-Spoken Text Normalization

no code implementations • 21 Sep 2016 • Ke Wu, Kyle Gorman, Richard Sproat

In speech-applications such as text-to-speech (TTS) or automatic speech recognition (ASR), \emph{text normalization} refers to the task of converting from a \emph{written} representation into a representation of how the text is to be \emph{spoken}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1