Query Generation with External Knowledge for Dense Retrieval

Dense retrieval aims at searching for the most relevant documents to the given query by encoding texts in the embedding space, requiring a large amount of query-document pairs to train. Since manually constructing such training data is challenging, recent work has proposed to generate synthetic queries from documents and use them to train a dense retriever. However, compared to the manually composed queries, synthetic queries do not generally ask for implicit information, therefore leading to a degraded retrieval performance. In this work, we propose Query Generation with External Knowledge (QGEK), a novel method for generating queries with external information related to the corresponding document. Specifically, we convert a query into a triplet-based template form to accommodate external information and transmit it to a pre-trained language model (PLM). We validate QGEK on both in-domain and out-domain dense retrieval settings. The dense retriever with the queries requiring implicit information is found to make good performance improvement. Also, such queries are similar to manually composed queries, confirmed by both human evaluation and unique & non-unique words distribution.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here