Search Results for author: Juan Antonio Pérez-Ortiz

Found 10 papers, 4 papers with code

An English-Swahili parallel corpus and its use for neural machine translation in the news domain

no code implementations • EAMT 2020 • Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet.

Machine Translation Translation

Paper
Add Code

Ranking suggestions for black-box interactive translation prediction systems with multilayer perceptrons

no code implementations • AMTA 2016 • Daniel Torregrosa, Juan Antonio Pérez-Ortiz, Mikel Forcada

The objective of interactive translation prediction (ITP), a paradigm of computer-aided translation, is to assist professional translators by offering context-based computer-generated suggestions as they type.

Machine Translation Translation

Paper
Add Code

Surprise Language Challenge: Developing a Neural Machine Translation System between Pashto and English in Two Months

no code implementations • MTSummit 2021 • Alexandra Birch, Barry Haddow, Antonio Valerio Miceli Barone, Jindrich Helcl, Jonas Waldendorf, Felipe Sánchez Martínez, Mikel Forcada, Víctor Sánchez Cartagena, Juan Antonio Pérez-Ortiz, Miquel Esplà-Gomis, Wilker Aziz, Lina Murady, Sevi Sariisik, Peggy van der Kreeft, Kay Macquarrie

We find that starting from an existing large model pre-trained on 50languages leads to far better BLEU scores than pretraining on one high-resource language pair with a smaller model.

Machine Translation Transfer Learning +1

Paper
Add Code

MultiTraiNMT: Training Materials to Approach Neural Machine Translation from Scratch

no code implementations • TRITON 2021 • Gema Ramírez-Sánchez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Caroline Rossi, Dorothy Kenny, Riccardo Superbo, Pilar Sánchez-Gijón, Olga Torres-Hostench

The MultiTraiNMT Erasmus+ project aims at developing an open innovative syllabus in neural machine translation (NMT) for language learners and translators as multilingual citizens.

Machine Translation NMT +1

Paper
Add Code

Curated Datasets and Neural Models for Machine Translation of Informal Registers between Mayan and Spanish Vernaculars

2 code implementations • 11 Apr 2024 • Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena

The Mayan languages comprise a language family with an ancient history, millions of speakers, and immense cultural value, that, nevertheless, remains severely underrepresented in terms of resources and global exposure.

Machine Translation Translation

259

Paper
Code

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

no code implementations • 29 Jan 2024 • Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features.

Machine Translation TAG

Paper
Add Code

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

1 code implementation • 29 Jan 2024 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them.

Machine Translation Translation

Paper
Code

Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

1 code implementation • 16 Jan 2024 • Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment, increasing the amount of useful translation proposals, and that our neural model for estimating the post-editing effort enables the combination of translation proposals obtained from monolingual corpora and from TMs in the usual way.

Sentence Sentence Embeddings +1

Paper
Code

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

1 code implementation • EMNLP 2021 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences.

Data Augmentation Decoder +4

Paper
Code

Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

no code implementations • 3 Apr 2020 • Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz, Rafael C. Carrasco

Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs.

Clustering Machine Translation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.