Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about LISA #833

Open
caoshuai03 opened this issue May 20, 2024 · 1 comment
Open

Discussion about LISA #833

caoshuai03 opened this issue May 20, 2024 · 1 comment

Comments

@caoshuai03
Copy link

In the article, only the comparison of the average weight paradigm of each layer during lora fine-tuning is given.

  1. But what if their weights are different before fine-tuning?
  2. Using the weighted averaging method, is it possible that there are differences in the intermediate iteration process?
@research4pan
Copy link
Contributor

Thanks for your interest in LMFlow and LISA!

Regarding the first question, we conducted the fine-tuning based on the same seed and the same base model, so their initial weight should be the same before fine-tuning.

As for the second question, I think the weighted averaging methods are certainly different from the normal training process. But since it is less frequently adopted in practice of fine-tuning LLMs, we didn't conduct experiments on that. To draw insights from weighted averaging methods, we think at least two parts of experiments are needed if anyone is interested in this aspect:

  • Weighted averaging method can reproduce the good performance as normal fine-tuning techniques
  • After that, we may check the layer-wise weight norm of weighted-average-fine-tuned models

Hope this information can be helpful 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants