Search Results for author: Chaochen Gao

Found 8 papers, 6 papers with code

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

no code implementations • 30 May 2024 • Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu

Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts.

Language Modelling Large Language Model

Paper
Add Code

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

1 code implementation • 13 Oct 2022 • Xing Wu, Chaochen Gao, Zijia Lin, Zhongyuan Wang, Jizhong Han, Songlin Hu

Sparse sampling is also likely to miss important frames corresponding to some text portions, resulting in textual redundancy.

Contrastive Learning Retrieval +1

Paper
Code

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

2 code implementations • 8 Oct 2022 • Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin Hu

Contrastive learning has been extensively studied in sentence embedding learning, which assumes that the embeddings of different views of the same sentence are closer.

Contrastive Learning Language Modelling +5

100

Paper
Code

Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

1 code implementation • ACL 2022 • Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang, Songlin Hu

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary.

Data Augmentation Language Modelling +3

Paper
Code

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

1 code implementation • 10 Dec 2021 • Chaochen Gao, Xing Wu, Peng Wang, Jue Wang, Liangjun Zang, Zhongyuan Wang, Songlin Hu

To tackle that, we propose an effective knowledge distillation framework for contrastive sentence embeddings, termed DistilCSE.

Contrastive Learning Knowledge Distillation +5

100

Paper
Code

TransAug: Translate as Augmentation for Sentence Embeddings

no code implementations • 30 Oct 2021 • Jue Wang, Haofan Wang, Xing Wu, Chaochen Gao, Debing Zhang

In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings.

Contrastive Learning Data Augmentation +4

Paper
Add Code

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

2 code implementations • COLING 2022 • Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, Songlin Hu

Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair.

Contrastive Learning Data Augmentation +5

100

Paper
Code

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

2 code implementations • COLING 2022 • Xing Wu, Chaochen Gao, Yipeng Su, Jizhong Han, Zhongyuan Wang, Songlin Hu

Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding.

Contrastive Learning Sentence +4

100

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.