Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update train_lora_flux_24gb.yaml #69

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

boopage
Copy link

@boopage boopage commented Aug 15, 2024

1e-4 looks too low for my tests, 2e-4 and 2000 steps seems to result in much better resemblance (training with ~20 photos of a person).

Close it if you don't agree, was not sure if the 1e-4 has been tested with the new linear_timesteps: true (?)

1e-4 looks too low for my tests, 2e-4 and 2000 steps seems to result in much better resemblance (training with ~20 photos of a person)
@D-Ogi
Copy link

D-Ogi commented Aug 16, 2024

To be honest, in my case even 1e-4 at 500 steps is sufficient to tune person-based LORA.

@boopage
Copy link
Author

boopage commented Aug 16, 2024

I agree it's true that 500 steps already magically converges unlike SDXL, but more is needed IMO to reach a satisfy-able LoRA that's weighted enough, I just think for people trying first time 2e-4 might result in a more satisfying result.

@WarAnakin
Copy link

Depending on the optimizer and scheduler you use, you can achieve faster/better results.

@boopage
Copy link
Author

boopage commented Aug 17, 2024

It's mainly about the default config, I'm sure there's lots of parameters to tune. But I'm concerned people try it out and then are disappointed with the results, better to lean towards a little over training than under training in that case.

@zethfoxster
Copy link

Depending on the optimizer and scheduler you use, you can achieve faster/better results.

do you have any suggestions?

@WarAnakin
Copy link

Depending on the optimizer and scheduler you use, you can achieve faster/better results.

do you have any suggestions?

i just saw your reply, sorry for the delay.
Most of the time i use the polynomial scheduler for training with the adafactor optimizer, but for ostris' scripts you need to modify the scheduler.py file in order to add the polynomial config. I will see if ostris agrees with me to post the change in his main branch.

@D-Ogi
Copy link

D-Ogi commented Aug 21, 2024

@WarAnakin polynomial scheduler sounds really promising. We you willing to share the related code in PR or in your forked repository, please?

@WarAnakin
Copy link

@WarAnakin polynomial scheduler sounds really promising. We you willing to share the related code in PR or in your forked repository, please?

yes, i'll do that now, you only need 1 file to update

@WarAnakin
Copy link

WarAnakin commented Aug 21, 2024

@WarAnakin polynomial scheduler sounds really promising. We you willing to share the related code in PR or in your forked repository, please?

This one's for you:

https://github.com/WarAnakin/ai-toolkit/blob/main/toolkit/scheduler.py

You don't need the whole thing, just replace scheduler.py file with mine, in the /toolkit folder.

@WarAnakin
Copy link

@D-Ogi please don't forget to add the following line of code to the config file

image

@gi0baro
Copy link

gi0baro commented Aug 24, 2024

@D-Ogi please don't forget to add the following line of code to the config file

Mind that if you don't specify lr_scheduler_params.power with your code (default power 1.0) polynomial is the same of linear.

@D-Ogi
Copy link

D-Ogi commented Aug 24, 2024

Thank you both. I tested the polynomial scheduler rather deeply but it seems to be very weak even with 1e-2. I went from 1e-5 to 1e-1 and steps from 500 to 2000, but never got anything valuable.

@gi0baro
Copy link

gi0baro commented Aug 24, 2024

Thank you both. I tested the polynomial scheduler rather deeply but it seems to be very weak even with 1e-2. I went from 1e-5 to 1e-1 and steps from 500 to 2000, but never got anything valuable.

That makes sense, as a linear progression will start with the LR you specified and linearly lower through 0 by the end, whereas by default configuration the LR you specify is constant across the whole run.
Polynomial scheduler should theoretically help in reducing overtrain and preserve details across the run. When compared to constant, you probably want to increase the LR and set the power to smth in the 0.1-0.4 range, eg:

lr: 4e-4
lr_scheduler: "polynomial"
lr_scheduler_params:
  power: 0.4

@WarAnakin
Copy link

Thank you both. I tested the polynomial scheduler rather deeply but it seems to be very weak even with 1e-2. I went from 1e-5 to 1e-1 and steps from 500 to 2000, but never got anything valuable.

That makes sense, as a linear progression will start with the LR you specified and linearly lower through 0 by the end, whereas by default configuration the LR you specify is constant across the whole run. Polynomial scheduler should theoretically help in reducing overtrain and preserve details across the run. When compared to constant, you probably want to increase the LR and set the power to smth in the 0.1-0.4 range, eg:

lr: 4e-4
lr_scheduler: "polynomial"
lr_scheduler_params:
  power: 0.4

thanks for pointing that out

also, you guys might want to know that training the text encoder for loras is now possible using CLIP_L
i asked ostris to see if he could do the same for ai-tool, for now it has been updated in kohya and simple_tuner

@AFMSB
Copy link

AFMSB commented Sep 5, 2024

@WarAnakin I tried using your configuration and did get better results. However, when I restore the fine-tuned checkpoint, my images don’t look the same as they did with the training checkpoints.

@WarAnakin
Copy link

@WarAnakin I tried using your configuration and did get better results. However, when I restore the fine-tuned checkpoint, my images don’t look the same as they did with the training checkpoints.

I don't understand. What do you mean by restore a checkpoint ?

@AFMSB
Copy link

AFMSB commented Sep 6, 2024

I have finetunned flux.1-schnell with the learning rate set to 0.004, the results did improve on the samples generated during the training, but when I load the flux + lora to generate images the result is completely different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants