-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About "Guidance distilled diffusion models" #4
Comments
Hi! Thanks for your interest.
|
Thanks for your clear answer! I understand what the distilled models work for. |
So what should I do if I want to train the model with another distillation teacher, like the newly released SD3? |
There seems no official implementation of the paper "On Distillation of Guided Diffusion Models" |
Yes, you are right. There is no official implementation. Probably, we will release the code for guidance distillation when we have free time. However, it is not necessary to do Consistency Distillation on top of a guidance distilled model. In other words, if you want to play with SD3, you can skip the guidance distillation step and use our CD directly with the SD3 teacher. |
Ok, thanks a lot for your kind response!! |
Hi. Have you tried generating? How do the metrics behave during training (FID, reconstruction loss)? |
The generating results also seems to be less satisfactory than the provided checkpoints. |
It looks like the guidance has been turned off. Do you distil the model with guidance ( |
Yes, I follow the lcm_lora distillation script by setting the "w_embedding" to None for both student and teacher unet model, and use the normal CFG calculation formulation to estimate conditional x_0 and eps_0. |
OK, I see. We did distillation without embedding in the initial stages of the project, so there are probably some issues. I'll try to distill it myself tomorrow and get back to you with an answer. |
Thanks a lot! |
I notice that in forward/reverse presevre loss, the w is explictly set to zero if there is a w embedding, while it's not the case for those without w embedding. I wonder if this can be a possible reason. |
I don't think so. Since the preservation losses only influence reconstruction, you also have problems with generation. By the way, it's tricky to define preservation losses without w embedding, since we calculate them only for the unguided process (see Eq. 4 and the corresponding text in the paper). Probably, you should try to create a student with a similar format to the teacher's, to provide the possibility of controlling guidance: Likely, I will release the code for guidance distillation to avoid these issues. |
Yes, I understand that the preservation losses are calculated for the unguided process. However, I noticed that when we use the embedding, an embedded vector for "0" is fed to the network. While using a "None" input when the network doesn't have a w-input is not equivalent to an unguided process according to the CD process. |
Is there any updates? Thanks a lot! |
I'm sorry, I haven't had time yet... I will definitely do it soon! |
According to my own understanding, these checkpoints are distilled SD models that generates with Dynamic, is this right? Moreover, can you provide more training details of such models? Thanks a lot in advance !
The text was updated successfully, but these errors were encountered: