From 14098d49dd89fd6643bdfbbbd8041d165f914331 Mon Sep 17 00:00:00 2001 From: shibing624 Date: Fri, 26 Jan 2024 18:16:00 +0800 Subject: [PATCH] update moe --- README.md | 15 +++++++++------ README_EN.md | 12 ++++++++++++ docs/training_params.md | 1 + 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 2eb22a0..6a6c94b 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,8 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO( - DPO方法来自论文[Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf) ## 🔥 News +[2024/01/26] v1.8版本:支持微调Mixtral混合专家MoE模型 **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)**。详见[Release-v1.8](https://github.com/shibing624/MedicalGPT/releases/tag/1.8.0) + [2024/01/14] v1.7版本:新增检索增强生成(RAG)的基于文件问答[ChatPDF](https://github.com/shibing624/ChatPDF)功能,代码`chatpdf.py`,可以基于微调后的LLM结合知识库文件问答提升行业问答准确率。详见[Release-v1.7](https://github.com/shibing624/MedicalGPT/releases/tag/1.7.0) [2023/10/23] v1.6版本:新增RoPE插值来扩展GPT模型的上下文长度;针对LLaMA模型支持了[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)和[LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**;支持了[NEFTune](https://github.com/neelsjain/NEFTune)给embedding加噪训练方法。详见[Release-v1.6](https://github.com/shibing624/MedicalGPT/releases/tag/1.6.0) @@ -110,12 +112,13 @@ pip install -r requirements.txt --upgrade #### Hardware Requirement -| Method | Bits | 7B | 13B | 30B | 65B | -| ------ | ---- | ----- | ----- | ----- | ------ | -| Full | 16 | 160GB | 320GB | 600GB | 1200GB | -| LoRA | 16 | 16GB | 32GB | 80GB | 160GB | -| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | -| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | + +| 训练方法 | 精度 | 7B | 13B | 30B | 65B | 8x7B | +| ------- | ---- | ----- | ----- | ----- | ------ | ------ | +| 全参数 | 16 | 160GB | 320GB | 600GB | 1200GB | 900GB | +| LoRA | 16 | 16GB | 32GB | 80GB | 160GB | 120GB | +| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB | +| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB | ## 🚀 Training Pipeline diff --git a/README_EN.md b/README_EN.md index 789c68e..e82c2e1 100644 --- a/README_EN.md +++ b/README_EN.md @@ -50,6 +50,8 @@ Parameter Description: - `--gpus {gpu_ids}`: Specifies the number of GPU devices used, the default is 0. If using multiple GPUs, separate them with commas, such as 0,1,2 + + ## 🚀 Training Pipeline ### Stage 1: Continue Pretraining @@ -114,6 +116,16 @@ sh run_ppo.sh ``` [Training Detail wiki](https://github.com/shibing624/MedicalGPT/wiki/Training-Details) + +### Hardware Requirement + +| Method | Bits | 7B | 13B | 30B | 65B | 8x7B | +| ------ | ---- | ----- | ----- | ----- | ------ | ------ | +| Full | 16 | 160GB | 320GB | 600GB | 1200GB | 900GB | +| LoRA | 16 | 16GB | 32GB | 80GB | 160GB | 120GB | +| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB | +| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB | + ## 🔥 Inference After the training is complete, now we load the trained model to verify the effect of the model generating text. diff --git a/docs/training_params.md b/docs/training_params.md index 23e6a37..ab1271d 100644 --- a/docs/training_params.md +++ b/docs/training_params.md @@ -26,6 +26,7 @@ 12. 针对LLaMA模型支持了[FlashAttention-2](https://github.com/Dao-AILab/flash-attention),如果您使用的是 RTX4090、A100 或 H100 GPU,SFT中请使用 `--flash_attn` 参数以启用 FlashAttention-2 13. 新增了[LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**,使模型获得长文本处理能力,SFT中使用 `--shift_attn` 参数以启用该功能 14. 支持了[NEFTune](https://github.com/neelsjain/NEFTune)给embedding加噪SFT训练方法,[NEFTune paper](https://arxiv.org/abs/2310.05914), SFT中使用 `--neft_alpha` 参数启用 NEFTune,例如 `--neft_alpha 5` +15. 支持微调Mixtral混合专家MoE模型 **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)**,SFT中如果用lora微调模型,可以开启4bit量化和QLoRA`--load_in_4bit True --qlora True`以节省显存,建议设置`--target_modules q_proj,k_proj,v_proj,o_proj`,这样可以避免对MoE专家网络的MLP层量化,因为它们很稀疏且量化后会导致性能效果下降。 **关于PT Training**