Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About "Guidance distilled diffusion models" #4

Open
lyx0208 opened this issue Jun 26, 2024 · 19 comments
Open

About "Guidance distilled diffusion models" #4

lyx0208 opened this issue Jun 26, 2024 · 19 comments

Comments

@lyx0208
Copy link

lyx0208 commented Jun 26, 2024

According to my own understanding, these checkpoints are distilled SD models that generates with Dynamic, is this right? Moreover, can you provide more training details of such models? Thanks a lot in advance !

@quickjkee
Copy link
Member

Hi! Thanks for your interest.
Not really. We just embed the guidance through a new layer. In more detail:

  • the guidance has the following form:
    $\epsilon^{w} (x_{t}, t, c) = \epsilon_{\theta}(x_{t}, t, 0) + w (\epsilon_{\theta}(x_{t}, t, c) - \epsilon_{\theta}(x_{t}, t, 0))$
    As you can see, the neural network is called twice: $\epsilon_{\theta}(x_{t}, t, 0), \epsilon_{\theta}(x_{t}, t, c)$

  • We just distill this guidance into a new layer called an embedding layer. That is, solve the following taks:
    $| \epsilon^{w} (x_{t}, t, c) - \epsilon (x_{t}, t, c, w)|$. Where $w$ is a new layer (it is implemented in the same way as $t$ embedding). In this way, we have one propagation instead of two.

  • To make a dynamic guidance, after distillation, we can make $w$ dependent on $t$. We use the step function $w(t)$. That is, if $t > \tau$, then $w(t)=0$, otherwise $w(t)=w$.

@lyx0208
Copy link
Author

lyx0208 commented Jun 27, 2024

Thanks for your clear answer! I understand what the distilled models work for.

@lyx0208 lyx0208 closed this as completed Jun 27, 2024
@lyx0208 lyx0208 reopened this Jun 27, 2024
@lyx0208
Copy link
Author

lyx0208 commented Jun 27, 2024

So what should I do if I want to train the model with another distillation teacher, like the newly released SD3?

@lyx0208
Copy link
Author

lyx0208 commented Jun 27, 2024

There seems no official implementation of the paper "On Distillation of Guided Diffusion Models"

@quickjkee
Copy link
Member

Yes, you are right. There is no official implementation. Probably, we will release the code for guidance distillation when we have free time. However, it is not necessary to do Consistency Distillation on top of a guidance distilled model. In other words, if you want to play with SD3, you can skip the guidance distillation step and use our CD directly with the SD3 teacher.

@lyx0208
Copy link
Author

lyx0208 commented Jun 28, 2024

Ok, thanks a lot for your kind response!!

@lyx0208 lyx0208 closed this as completed Jun 28, 2024
@lyx0208 lyx0208 reopened this Jul 9, 2024
@lyx0208
Copy link
Author

lyx0208 commented Jul 9, 2024

Hello!I've tried training the sd1.5 based model with the script you provided, but without the "embed_guidance" flag. The model I get shows poorer performance in both reconstruction accuaracy and editability. See the image below:
test_editing_iCD-SD1 5_woemb

@quickjkee
Copy link
Member

Hi. Have you tried generating? How do the metrics behave during training (FID, reconstruction loss)?

@lyx0208
Copy link
Author

lyx0208 commented Jul 9, 2024

The generating results also seems to be less satisfactory than the provided checkpoints.
I don't have visualized loss metrics since the machine I use can not access to wandb and tensorboard.
By the way, have you tried performing iCD without the w embedding and what's the corresponding results?

@lyx0208
Copy link
Author

lyx0208 commented Jul 10, 2024

Here is the demo generating result with prompt "a cute owl with a graduation cap."
test_generation_iCD-SD1 5

@quickjkee
Copy link
Member

It looks like the guidance has been turned off. Do you distil the model with guidance ($w>0$?)

@lyx0208
Copy link
Author

lyx0208 commented Jul 10, 2024

Yes, I follow the lcm_lora distillation script by setting the "w_embedding" to None for both student and teacher unet model, and use the normal CFG calculation formulation to estimate conditional x_0 and eps_0.

@quickjkee
Copy link
Member

OK, I see. We did distillation without embedding in the initial stages of the project, so there are probably some issues. I'll try to distill it myself tomorrow and get back to you with an answer.

@lyx0208
Copy link
Author

lyx0208 commented Jul 10, 2024

Thanks a lot!

@lyx0208
Copy link
Author

lyx0208 commented Jul 10, 2024

I notice that in forward/reverse presevre loss, the w is explictly set to zero if there is a w embedding, while it's not the case for those without w embedding. I wonder if this can be a possible reason.

@quickjkee
Copy link
Member

quickjkee commented Jul 10, 2024

I don't think so. Since the preservation losses only influence reconstruction, you also have problems with generation.

By the way, it's tricky to define preservation losses without w embedding, since we calculate them only for the unguided process (see Eq. 4 and the corresponding text in the paper). Probably, you should try to create a student with a similar format to the teacher's, to provide the possibility of controlling guidance:
$\epsilon^{w} (x_{t}, t, c) = \epsilon_{\theta}(x_{t}, t, 0) + w (\epsilon_{\theta}(x_{t}, t, c) - \epsilon_{\theta}(x_{t}, t, 0))$

Likely, I will release the code for guidance distillation to avoid these issues.

@lyx0208
Copy link
Author

lyx0208 commented Jul 10, 2024

Yes, I understand that the preservation losses are calculated for the unguided process. However, I noticed that when we use the embedding, an embedded vector for "0" is fed to the network. While using a "None" input when the network doesn't have a w-input is not equivalent to an unguided process according to the CD process.

@lyx0208
Copy link
Author

lyx0208 commented Jul 16, 2024

Is there any updates? Thanks a lot!

@quickjkee
Copy link
Member

I'm sorry, I haven't had time yet... I will definitely do it soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants