Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Decouple linear1 and linear2 Flux layers in network_args #1613

Open
EricBCoding opened this issue Sep 19, 2024 · 5 comments

Comments

@EricBCoding
Copy link

Hi,

There's a popular discussion thread that suggests training the proj_out (linear2) module of single blocks 7 and 20 for Flux LoRAs:

https://old.reddit.com/r/StableDiffusion/comments/1f523bd/good_flux_loras_can_be_less_than_45mb_128_dim/

As far as I can tell, it is not yet possible to isolate linear2 through the sd-scripts network_args flag. Perhaps this is as close as it gets:

--network_args "train_double_block_indices=none" "train_single_block_indices=7,20" "single_mod_dim=0"

I propose replacing the single_dim layer with e.g. single_linear1_dim and single_linear2_dim. That way, we can specify single_linear1_dim=0 to reproduce the training method outlined in the thread above.

Or is this already possible with a different set of arguments?

Thanks!

@whmc76
Copy link

whmc76 commented Sep 19, 2024

+1

@kohya-ss
Copy link
Owner

This is interesting. Since linear1 and 2 belong to the same attention, I don't think there is any need to separate them.

I think we can get almost the same effect by training linear1 and 2 with half the dim(rank). Have you tried it?

@EricBCoding
Copy link
Author

EricBCoding commented Sep 19, 2024

I think we can get almost the same effect by training linear1 and 2 with half the dim(rank). Have you tried it?

No, but I'll give it a go! I modified my copy of lora_flux.py to isolate linear2 and already trained a couple models that way. Let me try your suggestion and see how the results differ.

@envy-ai
Copy link

envy-ai commented Sep 21, 2024

I think we can get almost the same effect by training linear1 and 2 with half the dim(rank). Have you tried it?

No, but I'll give it a go! I modified my copy of lora_flux.py to isolate linear2 and already trained a couple models that way. Let me try your suggestion and see how the results differ.

Any chance you could post the patch?

@EricBCoding
Copy link
Author

EricBCoding commented Sep 21, 2024

Any chance you could post the patch?

My sd-scripts is heavily customized, but here's how you can apply it to yours:

  • Look for ("single_blocks", "linear") in networks/lora_flux.py. It's currently on line 665:

("single_blocks", "linear"),

  • Replace this line with ("linear1"), and save the file.

  • Add "single_dim=0" to your network_args flag (in addition to the other values I provided in the OP.) This will now skip the linear1 modules and allow you to isolate linear2 for training.

  • If patched correctly, your console should state that you are targeting only 2 unet modules.

Hope that helps.

I'm still in the process of running some tests on training linear1 and linear2 in conjunction, will report back on that in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants