no code implementations • WMT (EMNLP) 2021 • Md Mahfuz ibn Alam, Ivana Kvapilíková, Antonios Anastasopoulos, Laurent Besacier, Georgiana Dinu, Marcello Federico, Matthias Gallé, Kweonwoo Jung, Philipp Koehn, Vassilina Nikoulina
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry.
1 code implementation • EMNLP (WNUT) 2020 • Md Mahfuz ibn Alam, Antonios Anastasopoulos
The performance of neural machine translation (NMT) systems only trained on a single language variant degrades when confronted with even slightly different language variations.
1 code implementation • 4 Mar 2024 • Sina Ahmadi, Daban Q. Jaff, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties.
no code implementations • 2 Feb 2024 • Md Mahfuz ibn Alam, Antonios Anastasopoulos
It is relatively easy to mine a large parallel corpus for any machine learning task, such as speech-to-text or speech-to-speech translation.
no code implementations • 2 Feb 2024 • Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
In this paper, we propose strategies to synthesize parallel data relying on morpho-syntactic information and using bilingual lexicons along with a small amount of seed parallel data.
no code implementations • 26 May 2023 • Md Mahfuz ibn Alam, Sina Ahmadi, Antonios Anastasopoulos
Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations.
1 code implementation • 26 May 2023 • Claytone Sikasote, Eunice Mukonde, Md Mahfuz ibn Alam, Antonios Anastasopoulos
We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba.
1 code implementation • 23 May 2023 • Milind Agarwal, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Second, we propose a novel misprediction-resolution hierarchical model, LIMIt, for language identification that reduces error by 55% (from 0. 71 to 0. 32) on our compiled children's stories dataset and by 40% (from 0. 23 to 0. 14) on the FLORES-200 benchmark.
no code implementations • 25 Apr 2023 • Md Mahfuz ibn Alam, Ruoyu Xie, Fahim Faisal, Antonios Anastasopoulos
This report describes GMU's sentiment analysis system for the SemEval-2023 shared task AfriSenti-SemEval.
1 code implementation • Findings (EMNLP) 2021 • Fahim Faisal, Sharlina Keshava, Md Mahfuz ibn Alam, Antonios Anastasopoulos
Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces.
1 code implementation • 22 Jun 2021 • Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina
As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.