1 code implementation • 2 May 2024 • Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard
PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval.
no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.
1 code implementation • 29 Apr 2024 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh
Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora.
no code implementations • 11 Apr 2024 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
The principal tasks are ranked retrieval of news in one of the three languages, using English topics.
1 code implementation • 9 Jan 2024 • Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, Scott Miller
Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ.
1 code implementation • 30 May 2023 • Douglas W. Oard
Despite the plethora of born-digital content, vast troves of important content remain accessible only on physical media such as paper or microfilm.
no code implementations • 24 Apr 2023 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.
no code implementations • 20 Dec 2022 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard
By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks.
1 code implementation • 3 Sep 2022 • Dawn Lawrie, Eugene Yang, Douglas W. Oard, James Mayfield
Providing access to information across languages has been a goal of Information Retrieval (IR) for decades.
no code implementations • 25 Apr 2022 • Eugene Yang, Suraj Nair, Ramraj Chandradevan, Rebecca Iglesias-Flores, Douglas W. Oard
Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval.
1 code implementation • 20 Jan 2022 • Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard
These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25.
no code implementations • 20 Nov 2021 • Behrooz Mansouri, Douglas W. Oard, Anurag Agarwal, Richard Zanibbi
There are now several test collections for the formula retrieval task, in which a system's goal is to identify useful mathematical formulae to show in response to a query posed as a formula.
no code implementations • 10 Nov 2021 • Petra Galuščáková, Douglas W. Oard, Suraj Nair
Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find.
no code implementations • ACL 2021 • Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown
This paper proposes an approach to cross-language sentence selection in a low-resource setting.
no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.
no code implementations • 14 Nov 2020 • Jason R. Baron, Mahmoud F. Sayed, Douglas W. Oard
At present, the review process for material that is exempt from disclosure under the Freedom of Information Act (FOIA) in the United States of America, and under many similar government transparency regimes worldwide, is entirely manual.