Search Results for author: Deli Chen

Found 18 papers, 9 papers with code

Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation

1 code implementation • Findings (EMNLP) 2021 • Hua Zheng, Lei LI, Damai Dai, Deli Chen, Tianyu Liu, Xu sun, Yang Liu

In this paper, we propose to leverage word-formation knowledge to enhance Chinese WSD.

Paper
Code

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

1 code implementation • 7 May 2024 • DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

2,283

Paper
Code

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

1 code implementation • 11 Jan 2024 • Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, Wenfeng Liang

Subsequently, we scale up DeepSeekMoE to 16B parameters and show that it achieves comparable performance with LLaMA2 7B, with only about 40% of computations.

Language Modelling Large Language Model

881

Paper
Code

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

1 code implementation • 5 Jan 2024 • DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, JianZhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li, Guowei Li, Jiashi Li, Yao Li, Y. K. Li, Wenfeng Liang, Fangyun Lin, A. X. Liu, Bo Liu, Wen Liu, Xiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu Lu, Shanghao Lu, Fuli Luo, Shirong Ma, Xiaotao Nie, Tian Pei, Yishi Piao, Junjie Qiu, Hui Qu, Tongzheng Ren, Zehui Ren, Chong Ruan, Zhangli Sha, Zhihong Shao, Junxiao Song, Xuecheng Su, Jingxiang Sun, Yaofeng Sun, Minghui Tang, Bingxuan Wang, Peiyi Wang, Shiyu Wang, Yaohui Wang, Yongji Wang, Tong Wu, Y. Wu, Xin Xie, Zhenda Xie, Ziwei Xie, Yiliang Xiong, Hanwei Xu, R. X. Xu, Yanhong Xu, Dejian Yang, Yuxiang You, Shuiping Yu, Xingkai Yu, B. Zhang, Haowei Zhang, Lecong Zhang, Liyue Zhang, Mingchuan Zhang, Minghua Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Qihao Zhu, Yuheng Zou

The rapid development of open-source large language models (LLMs) has been truly remarkable.

1,255

Paper
Code

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

1 code implementation • 14 Dec 2023 • Peiyi Wang, Lei LI, Zhihong Shao, R. X. Xu, Damai Dai, Yifei Li, Deli Chen, Y. Wu, Zhifang Sui

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions.

Ranked #14 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

Paper
Code

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

1 code implementation • 29 Jul 2023 • Lean Wang, Wenkai Yang, Deli Chen, Hao Zhou, Yankai Lin, Fandong Meng, Jie zhou, Xu sun

As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs.

Language Modelling

Paper
Code

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

1 code implementation • 23 May 2023 • Lean Wang, Lei LI, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks.

In-Context Learning

120

Paper
Code

Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias

no code implementations • 8 May 2023 • Zhiyuan Zhang, Deli Chen, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

To settle this issue, we propose the Fine-purifying approach, which utilizes the diffusion theory to study the dynamic process of fine-tuning for finding potentially poisonous dimensions.

Paper
Add Code

Integrating Local Real Data with Global Gradient Prototypes for Classifier Re-Balancing in Federated Long-Tailed Learning

no code implementations • 25 Jan 2023 • Wenkai Yang, Deli Chen, Hao Zhou, Fandong Meng, Jie zhou, Xu sun

Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively in a data privacy-preserving manner.

Federated Learning Privacy Preserving

Paper
Add Code

Topology-Imbalance Learning for Semi-Supervised Node Classification

1 code implementation • NeurIPS 2021 • Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie zhou, Xu sun

The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community.

Classification Node Classification

Paper
Code

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade

1 code implementation • Findings (EMNLP) 2021 • Lei LI, Yankai Lin, Deli Chen, Shuhuai Ren, Peng Li, Jie zhou, Xu sun

On the other hand, the exiting decisions made by internal classifiers are unreliable, leading to wrongly emitted early predictions.

Knowledge Distillation Model Selection

Paper
Code

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

no code implementations • 14 Dec 2020 • Deli Chen, Yankai Lin, Lei LI, Xuancheng Ren, Peng Li, Jie zhou, Xu sun

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC).

Contrastive Learning Graph Learning +1

Paper
Add Code

Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction

no code implementations • 26 Jun 2020 • Wei Li, Ruihan Bao, Keiko Harimoto, Deli Chen, Jingjing Xu and Qi Su

Further analysis shows that the introduction of the graph enables our model to predict the movement of stocks that are not directly associated with news as well as the whole market, which is not available in most previous methods.

Relation

Paper
Add Code

HighwayGraph: Modelling Long-distance Node Relations for Improving General Graph Neural Network

no code implementations • 10 Nov 2019 • Deli Chen, Xiaoqian Liu, Yankai Lin, Peng Li, Jie zhou, Qi Su, Xu sun

To address this issue, we propose to model long-distance node relations by simply relying on shallow GNN architectures with two solutions: (1) Implicitly modelling by learning to predict node pair relations (2) Explicitly modelling by adding edges between nodes that potentially have the same label.

General Classification Node Classification

Paper
Add Code

Group, Extract and Aggregate: Summarizing a Large Amount of Finance News for Forex Movement Prediction

no code implementations • WS 2019 • Deli Chen, Shuming Ma, Keiko Harimoto, Ruihan Bao, Qi Su, Xu sun

In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement.

Extractive Summarization Stock Market Prediction

Paper
Add Code

Recursive Graphical Neural Networks for Text Classification

no code implementations • 18 Sep 2019 • Wei Li, Shuheng Li, Shuming Ma, Yancheng He, Deli Chen, Xu sun

Graph is a natural structure to describe the complicated relation between tokens.

General Classification text-classification +1

Paper
Add Code

Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View

no code implementations • 7 Sep 2019 • Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie zhou, Xu sun

Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks.

Ranked #52 on Node Classification on Cora

Node Classification

Paper
Add Code

Identifying High-Quality Chinese News Comments Based on Multi-Target Text Matching Model

no code implementations • 22 Aug 2018 • Deli Chen, Shuming Ma, Pengcheng Yang, Xu sun

In this work, we introduce a novel task: high-quality comment identification (HQCI), which aims to automatically assess the quality of online comments.

Informativeness Text Matching

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.