no code implementations • ACL (RepL4NLP) 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin
We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model.
2 code implementations • 13 Jun 2023 • Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin
BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations.
2 code implementations • 4 Apr 2023 • Jheng-Hong Yang, Carlos Lassance, Rafael Sampaio de Rezende, Krishna Srinivasan, Miriam Redi, Stéphane Clinchant, Jimmy Lin
This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval.
no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang
The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.
1 code implementation • 21 Mar 2022 • Wei Zhong, Jheng-Hong Yang, Yuqing Xie, Jimmy Lin
With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness.
Ranked #1 on Math Information Retrieval on ARQMath (using extra training data)
no code implementations • 17 Dec 2021 • Jheng-Hong Yang, Xueguang Ma, Jimmy Lin
Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE.
no code implementations • 29 Apr 2021 • Jia-Huei Ju, Jheng-Hong Yang, Chuan-Ju Wang
Recently, much progress in natural language processing has been driven by deep contextualized representations pretrained on large corpora.
no code implementations • EMNLP 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin
This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations.
4 code implementations • 14 Apr 2021 • Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury
A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.
Ranked #15 on Zero-shot Text Search on BEIR
1 code implementation • 19 Feb 2021 • Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.
Cultural Vocal Bursts Intensity Prediction Information Retrieval +1
no code implementations • COLING 2020 • Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.
2 code implementations • 22 Oct 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin
We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model.
no code implementations • 5 May 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
Conversational search plays a vital role in conversational information seeking.
no code implementations • 4 Apr 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).
no code implementations • 18 Mar 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.
Ranked #17 on Coreference Resolution on Winograd Schema Challenge