no code implementations • 13 Apr 2024 • Yifan Qiao, Shanxiu He, Yingrui Yang, Parker Carlson, Tao Yang
This paper revisits cluster-based retrieval that partitions the inverted index into multiple groups and skips the index partially at cluster and document levels during online inference using a learned sparse representation.
1 code implementation • 20 Jun 2023 • Yifan Qiao, Yingrui Yang, Shanxiu He, Tao Yang
Learned sparse document representations using a transformer-based neural model has been found to be attractive in both relevance effectiveness and time efficiency.
1 code implementation • 2 May 2023 • Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact.
no code implementations • 26 Apr 2022 • John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu
DNN models across many domains continue to grow in size, resulting in high resource requirements for effective training, and unpalatable (and often unaffordable) costs for organizations and research labs across scales.
1 code implementation • 23 Apr 2022 • Yifan Qiao, Yingrui Yang, Haixin Lin, Tianbo Xiong, Xiyue Wang, Tao Yang
This paper proposes a dual skipping guidance scheme with hybrid scoring to accelerate document retrieval that uses learned sparse representations while still delivering a good relevance.
no code implementations • ACL 2022 • Yingrui Yang, Yifan Qiao, Tao Yang
Transformer based re-ranking models can achieve high search relevance through context-aware soft matching of query tokens with document tokens.
1 code implementation • 24 May 2021 • John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, Guoqing Harry Xu
Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas.
no code implementations • 11 Mar 2021 • Yingrui Yang, Yifan Qiao, Jinjin Shao, Mayuresh Anand, Xifeng Yan, Tao Yang
By applying token encoding on top of a dual-encoder architecture, BECR separates the attentions between a query and a document while capturing the contextual semantics of a query.
no code implementations • 16 Apr 2019 • Yifan Qiao, Chenyan Xiong, Zheng-Hao Liu, Zhiyuan Liu
This paper studies the performances and behaviors of BERT in ranking tasks.