v1.6.0

shibing624 released this 23 Oct 08:01

· 231 commits to main since this release

v1.6版本：

新增了RoPE插值来扩展GPT模型的上下文长度，通过位置插值方法，在增量数据上进行训练，使模型获得长文本处理能力，使用 --rope_scaling linear 参数训练模型；
针对LLaMA模型支持了FlashAttention-2，如果您使用的是 RTX4090、A100 或 H100 GPU，请使用 --flash_attn 参数以启用 FlashAttention-2；
新增了LongLoRA 提出的 $S^2$-Attn，使模型获得长文本处理能力，SFT中使用 --shift_attn 参数以启用该功能；
支持了NEFTune给embedding加噪SFT训练方法，NEFTune paper, 使用 --neft_alpha 参数启用 NEFTune，例如 --neft_alpha 5；
PT增量预训练支持qlora方法，如果使用的是 RTX4090、A100 或 H100 GPU，支持nf4，使用--qlora True --load_in_kbits 4参数启用qlora训练。

What's Changed

About validation_file_dir by @Billccx in #196
fix similar to issue #194 by @kinghuin in #200
fix lm_head type changed bug by @jiangtann in #215

New Contributors

@Billccx made their first contribution in #196
@kinghuin made their first contribution in #200
@jiangtann made their first contribution in #215

Full Changelog: 1.5.0...1.6.0

Contributors

kinghuin, jiangtann, and Billccx

Assets 2