Skip to content

Commit

Permalink
update moe
Browse files Browse the repository at this point in the history
  • Loading branch information
shibing624 committed Jan 26, 2024
1 parent 3845803 commit 14098d4
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 6 deletions.
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(
- DPO方法来自论文[Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf)

## 🔥 News
[2024/01/26] v1.8版本:支持微调Mixtral混合专家MoE模型 **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)**。详见[Release-v1.8](https://github.com/shibing624/MedicalGPT/releases/tag/1.8.0)

[2024/01/14] v1.7版本:新增检索增强生成(RAG)的基于文件问答[ChatPDF](https://github.com/shibing624/ChatPDF)功能,代码`chatpdf.py`,可以基于微调后的LLM结合知识库文件问答提升行业问答准确率。详见[Release-v1.7](https://github.com/shibing624/MedicalGPT/releases/tag/1.7.0)

[2023/10/23] v1.6版本:新增RoPE插值来扩展GPT模型的上下文长度;针对LLaMA模型支持了[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)[LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**;支持了[NEFTune](https://github.com/neelsjain/NEFTune)给embedding加噪训练方法。详见[Release-v1.6](https://github.com/shibing624/MedicalGPT/releases/tag/1.6.0)
Expand Down Expand Up @@ -110,12 +112,13 @@ pip install -r requirements.txt --upgrade

#### Hardware Requirement

| Method | Bits | 7B | 13B | 30B | 65B |
| ------ | ---- | ----- | ----- | ----- | ------ |
| Full | 16 | 160GB | 320GB | 600GB | 1200GB |
| LoRA | 16 | 16GB | 32GB | 80GB | 160GB |
| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB |

| 训练方法 | 精度 | 7B | 13B | 30B | 65B | 8x7B |
| ------- | ---- | ----- | ----- | ----- | ------ | ------ |
| 全参数 | 16 | 160GB | 320GB | 600GB | 1200GB | 900GB |
| LoRA | 16 | 16GB | 32GB | 80GB | 160GB | 120GB |
| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB |

## 🚀 Training Pipeline

Expand Down
12 changes: 12 additions & 0 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ Parameter Description:
- `--gpus {gpu_ids}`: Specifies the number of GPU devices used, the default is 0. If using multiple GPUs, separate them with commas, such as 0,1,2




## 🚀 Training Pipeline

### Stage 1: Continue Pretraining
Expand Down Expand Up @@ -114,6 +116,16 @@ sh run_ppo.sh
```
[Training Detail wiki](https://github.com/shibing624/MedicalGPT/wiki/Training-Details)


### Hardware Requirement

| Method | Bits | 7B | 13B | 30B | 65B | 8x7B |
| ------ | ---- | ----- | ----- | ----- | ------ | ------ |
| Full | 16 | 160GB | 320GB | 600GB | 1200GB | 900GB |
| LoRA | 16 | 16GB | 32GB | 80GB | 160GB | 120GB |
| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB |

## 🔥 Inference
After the training is complete, now we load the trained model to verify the effect of the model generating text.

Expand Down
1 change: 1 addition & 0 deletions docs/training_params.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
12. 针对LLaMA模型支持了[FlashAttention-2](https://github.com/Dao-AILab/flash-attention),如果您使用的是 RTX4090、A100 或 H100 GPU,SFT中请使用 `--flash_attn` 参数以启用 FlashAttention-2
13. 新增了[LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**,使模型获得长文本处理能力,SFT中使用 `--shift_attn` 参数以启用该功能
14. 支持了[NEFTune](https://github.com/neelsjain/NEFTune)给embedding加噪SFT训练方法,[NEFTune paper](https://arxiv.org/abs/2310.05914), SFT中使用 `--neft_alpha` 参数启用 NEFTune,例如 `--neft_alpha 5`
15. 支持微调Mixtral混合专家MoE模型 **[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)**,SFT中如果用lora微调模型,可以开启4bit量化和QLoRA`--load_in_4bit True --qlora True`以节省显存,建议设置`--target_modules q_proj,k_proj,v_proj,o_proj`,这样可以避免对MoE专家网络的MLP层量化,因为它们很稀疏且量化后会导致性能效果下降。

**关于PT Training**

Expand Down

0 comments on commit 14098d4

Please sign in to comment.