CA-LoRA

Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices

Introduction

This repository has the source code for the paper CA-LoRA, accepted at COLM 2024.

Considering that the open-source community has already contributed many LoRAs to LLMs, we propose to adapt these existing LoRAs from the LLMs to their compressed version and introduce a Compression-Aware LoRA (CA-LoRA) framework. We incorporate knowledge inheritance and recovery strategies to recover the lost knowledge caused by model compression. Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM and achieves comparable performance to the non-compressed LLM with existing LoRA modules.

Repo Content

This repo contains the code to reproduce the experimental results in our paper.

Section 4.1: Task-specific fine-tuning
- README Code T5 experiments implemented based on BMCook and OpenDelta.
Section 4.2: Instruction tuning
- 4.2.1: README Code Llama experiments implemented on OpenDelta.
- 4.2.2: README Code Llama-2 experiments implemented on PEFT

Citation

Please cite our paper if you find our work valuable.

@article{zhao2024calora,
      title={CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices}, 
      author={Weilin Zhao and Yuxiang Huang and Xu Han and Zhiyuan Liu and Zhengyan Zhang and Kuai Li and Chen Chen and Tao Yang and Maosong Sun},
      journal={arXiv preprint arXiv:2307.07705},
      year={2024}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CA-LoRA

Introduction

Repo Content

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

CA-LoRA

Introduction

Repo Content

Citation