Search Results for author: Douglas W. Oard

Found 19 papers, 6 papers with code

PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval

1 code implementation • 2 May 2024 • Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard

PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval.

Retrieval

Paper
Code

On the Evaluation of Machine-Generated Reports

no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler

Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.

Document Ranking Text Generation

Paper
Add Code

Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval

1 code implementation • 29 Apr 2024 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh

Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora.

Information Retrieval Retrieval +1

Paper
Code

Overview of the TREC 2023 NeuCLIR Track

no code implementations • 11 Apr 2024 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

The principal tasks are ranked retrieval of news in one of the three languages, using English topics.

Information Retrieval Retrieval

Paper
Add Code

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

1 code implementation • 9 Jan 2024 • Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, Scott Miller

Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ.

Information Retrieval Knowledge Distillation +2

Paper
Code

Known by the Company it Keeps: Proximity-Based Indexing for Physical Content in Archival Repositories

1 code implementation • 30 May 2023 • Douglas W. Oard

Despite the plethora of born-digital content, vast troves of important content remain accessible only on physical media such as paper or microfilm.

Paper
Code

Overview of the TREC 2022 NeuCLIR Track

no code implementations • 24 Apr 2023 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.

Information Retrieval Retrieval

Paper
Add Code

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

no code implementations • 20 Dec 2022 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard

By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks.

Information Retrieval Language Modelling +1

Paper
Add Code

Neural Approaches to Multilingual Information Retrieval

1 code implementation • 3 Sep 2022 • Dawn Lawrie, Eugene Yang, Douglas W. Oard, James Mayfield

Providing access to information across languages has been a goal of Information Retrieval (IR) for decades.

Document Translation Information Retrieval +3

Paper
Code

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

no code implementations • 25 Apr 2022 • Eugene Yang, Suraj Nair, Ramraj Chandradevan, Rebecca Iglesias-Flores, Douglas W. Oard

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval.

Language Modelling Retrieval

Paper
Add Code

Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models

1 code implementation • 20 Jan 2022 • Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard

These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25.

Document Ranking Information Retrieval +3

Paper
Code

Effects of context, complexity, and clustering on evaluation for math formula retrieval

no code implementations • 20 Nov 2021 • Behrooz Mansouri, Douglas W. Oard, Anurag Agarwal, Richard Zanibbi

There are now several test collections for the formula retrieval task, in which a system's goal is to identify useful mathematical formulae to show in response to a query posed as a formula.

Clustering Math +1

Paper
Add Code

Cross-language Information Retrieval

no code implementations • 10 Nov 2021 • Petra Galuščáková, Douglas W. Oard, Suraj Nair

Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find.

Information Retrieval Retrieval

Paper
Add Code

Cross-language Sentence Selection via Data Augmentation and Rationale Training

no code implementations • ACL 2021 • Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown

This paper proposes an approach to cross-language sentence selection in a low-resource setting.

Data Augmentation Machine Translation +4

Paper
Add Code

The Multilingual TEDx Corpus for Speech Recognition and Translation

no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.

speech-recognition Speech Recognition +1

Paper
Add Code

Providing More Efficient Access To Government Records: A Use Case Involving Application of Machine Learning to Improve FOIA Review for the Deliberative Process Privilege

no code implementations • 14 Nov 2020 • Jason R. Baron, Mahmoud F. Sayed, Douglas W. Oard

At present, the review process for material that is exempt from disclosure under the Freedom of Information Act (FOIA) in the United States of America, and under many similar government transparency regimes worldwide, is entirely manual.

text-classification Text Classification