Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

Open
jltchiu opened this issue Aug 5, 2024 · 1 comment
Open

Self trained zephyr-7b-dpo-qlora MT-bench score dropped to 1.88 #188

jltchiu opened this issue Aug 5, 2024 · 1 comment

Comments

@jltchiu
Copy link

jltchiu commented Aug 5, 2024

Hi, I just followed recipes/zephyr-7b-beta/dpo/config_qlora.yaml and hope to replicate the experiments. I was training on A10G, with 1 gpu, and the only modification I did was reducing the train_batch_size from 4 to 1 (due to memory constraint). However, my output models zephyr-7b-dpo-qlora only has mt-score of 1.88. I also did a mt-score benchmark with the downloaded zephyr-7b-sft-qlora and it had mt-bench score of 6.37 (which seems relatively normal). Does anyone else also have difficulties replicating this dpo experiments with qlora? Or is the batch size a critical difference for training?

@jltchiu
Copy link
Author

jltchiu commented Aug 5, 2024

Update: I use the mt-bench master branch to run the benchmark on 3 models with gpt-4
zephyr-7b-sft-qlora(downloaded) 6.365625
zephyr-7b-dpo-qlora(downloaded) 4.443038
zephyr-7b-dpo-qlora(trained) 1.883648

Even the downloaded qlora dpo model is worse than the sft model, does someone else also observe this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant