TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	ARC (Challenge)	LLaMA-2 13B + MixLoRA	Accuracy	69.9	# 15
Common Sense Reasoning	ARC (Challenge)	LLaMA-3 8B + MixLoRA	Accuracy	79.9	# 14
Common Sense Reasoning	ARC (Challenge)	LLaMA-2 7B + MixLoRA	Accuracy	58.1	# 23
Common Sense Reasoning	ARC (Easy)	LLaMA-2 13B + MixLoRA	Accuracy	83.5	# 9
Common Sense Reasoning	ARC (Easy)	LLaMA-2 7B + MixLoRA	Accuracy	77.7	# 19
Common Sense Reasoning	ARC (Easy)	LLaMA-3 8B + MixLoRA	Accuracy	86.5	# 4
Question Answering	BoolQ	LLaMA-2 7B + MixLoRA	Accuracy	72.7	# 38
Question Answering	BoolQ	LLaMA-2 13B + MixLoRA	Accuracy	77.1	# 30
Question Answering	BoolQ	LLaMA-3 8B + MixLoRA	Accuracy	75	# 35
Sentence Completion	HellaSwag	LLaMA-2 7B + MixLoRA	Accuracy	93.1	# 9
Sentence Completion	HellaSwag	LLaMA-2 13B + MixLoRA	Accuracy	94.7	# 5
Sentence Completion	HellaSwag	LLaMA-3 8B + MixLoRA	Accuracy	93.3	# 8
Question Answering	OpenBookQA	LLaMA-3 8B + MixLoRA	Accuracy	84.8	# 15
Question Answering	OpenBookQA	LLaMA-2 13B + MixLoRA	Accuracy	83	# 19
Question Answering	OpenBookQA	LLaMA-2 7B + MixLoRA	Accuracy	84.4	# 16
Question Answering	PIQA	LLaMA-3 8B + MixLoRA	Accuracy	87.6	# 3
Question Answering	PIQA	LLaMA-2 7B + MixLoRA	Accuracy	83.2	# 12
Question Answering	PIQA	LLaMA-2 13B + MixLoRA	Accuracy	86.8	# 6
Question Answering	SIQA	LLaMA-2 13B + MixLoRA	Accuracy	82.5	# 2
Question Answering	SIQA	LLaMA-2 7B + MixLoRA	Accuracy	78	# 10
Question Answering	SIQA	LLaMA-3 8B + MixLoRA	Accuracy	78.8	# 9
Common Sense Reasoning	WinoGrande	LLaMA-2 13B + MixLoRA	Accuracy	86.3	# 9
Common Sense Reasoning	WinoGrande	LLaMA-3 8B + MixLoRA	Accuracy	82.1	# 11
Common Sense Reasoning	WinoGrande	LLaMA-2 7B + MixLoRA	Accuracy	76.8	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/common-sense-reasoning-on-arc-easy)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-easy?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/common-sense-reasoning-on-arc-challenge)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-challenge?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/question-answering-on-openbookqa)](https://paperswithcode.com/sota/question-answering-on-openbookqa?p=mixlora-enhancing-large-language-models-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixlora-enhancing-large-language-models-fine/question-answering-on-boolq)](https://paperswithcode.com/sota/question-answering-on-boolq?p=mixlora-enhancing-large-language-models-fine)`

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

22 Apr 2024 · Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng, Yinghao Tang, Yan Zhang, Lei Duan, Jie Zuo, Cal Yang, Mingjie Tang ·

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

PDF Abstract

Code

Add Remove Mark official

TUDB-Labs/mLoRA official

↳ Quickstart in

Colab

191

TUDB-Labs/MixLoRA official

Tasks

Add Remove

Common Sense Reasoning

Multi-Task Learning

Quantization

Question Answering

Sentence Completion

Text Classification

Datasets

HellaSwag

BoolQ

PIQA

WinoGrande

OpenBookQA

ARC (AI2 Reasoning Challenge)

SIQA

Results from the Paper

Edit

Ranked #2 on Question Answering on SIQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	ARC (Challenge)	LLaMA-2 13B + MixLoRA	Accuracy	69.9	# 15	Compare
Common Sense Reasoning	ARC (Challenge)	LLaMA-3 8B + MixLoRA	Accuracy	79.9	# 14	Compare
Common Sense Reasoning	ARC (Challenge)	LLaMA-2 7B + MixLoRA	Accuracy	58.1	# 23	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA-2 13B + MixLoRA	Accuracy	83.5	# 9	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA-2 7B + MixLoRA	Accuracy	77.7	# 19	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA-3 8B + MixLoRA	Accuracy	86.5	# 4	Compare
Question Answering	BoolQ	LLaMA-2 7B + MixLoRA	Accuracy	72.7	# 38	Compare
Question Answering	BoolQ	LLaMA-2 13B + MixLoRA	Accuracy	77.1	# 30	Compare
Question Answering	BoolQ	LLaMA-3 8B + MixLoRA	Accuracy	75	# 35	Compare
Sentence Completion	HellaSwag	LLaMA-2 7B + MixLoRA	Accuracy	93.1	# 9	Compare
Sentence Completion	HellaSwag	LLaMA-2 13B + MixLoRA	Accuracy	94.7	# 5	Compare
Sentence Completion	HellaSwag	LLaMA-3 8B + MixLoRA	Accuracy	93.3	# 8	Compare
Question Answering	OpenBookQA	LLaMA-3 8B + MixLoRA	Accuracy	84.8	# 15	Compare
Question Answering	OpenBookQA	LLaMA-2 13B + MixLoRA	Accuracy	83	# 19	Compare
Question Answering	OpenBookQA	LLaMA-2 7B + MixLoRA	Accuracy	84.4	# 16	Compare
Question Answering	PIQA	LLaMA-3 8B + MixLoRA	Accuracy	87.6	# 3	Compare
Question Answering	PIQA	LLaMA-2 7B + MixLoRA	Accuracy	83.2	# 12	Compare
Question Answering	PIQA	LLaMA-2 13B + MixLoRA	Accuracy	86.8	# 6	Compare
Question Answering	SIQA	LLaMA-2 13B + MixLoRA	Accuracy	82.5	# 2	Compare
Question Answering	SIQA	LLaMA-2 7B + MixLoRA	Accuracy	78	# 10	Compare
Question Answering	SIQA	LLaMA-3 8B + MixLoRA	Accuracy	78.8	# 9	Compare
Common Sense Reasoning	WinoGrande	LLaMA-2 13B + MixLoRA	Accuracy	86.3	# 9	Compare
Common Sense Reasoning	WinoGrande	LLaMA-3 8B + MixLoRA	Accuracy	82.1	# 11	Compare
Common Sense Reasoning	WinoGrande	LLaMA-2 7B + MixLoRA	Accuracy	76.8	# 23	Compare

Methods

Add Remove

Layer Normalization • MixLoRA • MoE • Softmax • Transformer

Edit Social Preview

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove