About "Guidance distilled diffusion models" #4

lyx0208 · 2024-06-26T06:17:32Z

According to my own understanding, these checkpoints are distilled SD models that generates with Dynamic, is this right? Moreover, can you provide more training details of such models? Thanks a lot in advance !

quickjkee · 2024-06-26T14:42:42Z

Hi! Thanks for your interest.
Not really. We just embed the guidance through a new layer. In more detail:

the guidance has the following form:
$\epsilon^{w} (x_{t}, t, c) = \epsilon_{\theta}(x_{t}, t, 0) + w (\epsilon_{\theta}(x_{t}, t, c) - \epsilon_{\theta}(x_{t}, t, 0))$
As you can see, the neural network is called twice: $\epsilon_{\theta}(x_{t}, t, 0), \epsilon_{\theta}(x_{t}, t, c)$
We just distill this guidance into a new layer called an embedding layer. That is, solve the following taks:
$| \epsilon^{w} (x_{t}, t, c) - \epsilon (x_{t}, t, c, w)|$. Where $w$ is a new layer (it is implemented in the same way as $t$ embedding). In this way, we have one propagation instead of two.
To make a dynamic guidance, after distillation, we can make $w$ dependent on $t$. We use the step function $w(t)$. That is, if $t > \tau$, then $w(t)=0$, otherwise $w(t)=w$.

lyx0208 · 2024-06-27T02:01:52Z

Thanks for your clear answer! I understand what the distilled models work for.

lyx0208 · 2024-06-27T02:16:10Z

So what should I do if I want to train the model with another distillation teacher, like the newly released SD3?

lyx0208 · 2024-06-27T06:34:10Z

There seems no official implementation of the paper "On Distillation of Guided Diffusion Models"

quickjkee · 2024-06-27T14:31:10Z

Yes, you are right. There is no official implementation. Probably, we will release the code for guidance distillation when we have free time. However, it is not necessary to do Consistency Distillation on top of a guidance distilled model. In other words, if you want to play with SD3, you can skip the guidance distillation step and use our CD directly with the SD3 teacher.

lyx0208 · 2024-06-28T02:15:05Z

Ok, thanks a lot for your kind response!!

lyx0208 · 2024-07-09T06:50:40Z

Hello!I've tried training the sd1.5 based model with the script you provided, but without the "embed_guidance" flag. The model I get shows poorer performance in both reconstruction accuaracy and editability. See the image below:

quickjkee · 2024-07-09T08:41:45Z

Hi. Have you tried generating? How do the metrics behave during training (FID, reconstruction loss)?

lyx0208 · 2024-07-09T12:01:13Z

The generating results also seems to be less satisfactory than the provided checkpoints.
I don't have visualized loss metrics since the machine I use can not access to wandb and tensorboard.
By the way, have you tried performing iCD without the w embedding and what's the corresponding results?

lyx0208 · 2024-07-10T01:19:59Z

Here is the demo generating result with prompt "a cute owl with a graduation cap."

quickjkee · 2024-07-10T04:15:58Z

It looks like the guidance has been turned off. Do you distil the model with guidance ($w>0$?)

lyx0208 · 2024-07-10T04:25:09Z

Yes, I follow the lcm_lora distillation script by setting the "w_embedding" to None for both student and teacher unet model, and use the normal CFG calculation formulation to estimate conditional x_0 and eps_0.

quickjkee · 2024-07-10T04:33:42Z

OK, I see. We did distillation without embedding in the initial stages of the project, so there are probably some issues. I'll try to distill it myself tomorrow and get back to you with an answer.

lyx0208 · 2024-07-10T04:37:27Z

Thanks a lot!

lyx0208 · 2024-07-10T05:23:31Z

I notice that in forward/reverse presevre loss, the w is explictly set to zero if there is a w embedding, while it's not the case for those without w embedding. I wonder if this can be a possible reason.

quickjkee · 2024-07-10T05:41:41Z

I don't think so. Since the preservation losses only influence reconstruction, you also have problems with generation.

By the way, it's tricky to define preservation losses without w embedding, since we calculate them only for the unguided process (see Eq. 4 and the corresponding text in the paper). Probably, you should try to create a student with a similar format to the teacher's, to provide the possibility of controlling guidance:
$\epsilon^{w} (x_{t}, t, c) = \epsilon_{\theta}(x_{t}, t, 0) + w (\epsilon_{\theta}(x_{t}, t, c) - \epsilon_{\theta}(x_{t}, t, 0))$

Likely, I will release the code for guidance distillation to avoid these issues.

lyx0208 · 2024-07-10T11:40:43Z

Yes, I understand that the preservation losses are calculated for the unguided process. However, I noticed that when we use the embedding, an embedded vector for "0" is fed to the network. While using a "None" input when the network doesn't have a w-input is not equivalent to an unguided process according to the CD process.

lyx0208 · 2024-07-16T01:03:31Z

Is there any updates? Thanks a lot!

quickjkee · 2024-07-18T08:12:29Z

I'm sorry, I haven't had time yet... I will definitely do it soon!

lyx0208 closed this as completed Jun 27, 2024

lyx0208 reopened this Jun 27, 2024

lyx0208 closed this as completed Jun 28, 2024

lyx0208 reopened this Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About "Guidance distilled diffusion models" #4

About "Guidance distilled diffusion models" #4

lyx0208 commented Jun 26, 2024

quickjkee commented Jun 26, 2024

lyx0208 commented Jun 27, 2024

lyx0208 commented Jun 27, 2024

lyx0208 commented Jun 27, 2024

quickjkee commented Jun 27, 2024

lyx0208 commented Jun 28, 2024

lyx0208 commented Jul 9, 2024 •

edited

Loading

quickjkee commented Jul 9, 2024

lyx0208 commented Jul 9, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024 •

edited

Loading

lyx0208 commented Jul 10, 2024

lyx0208 commented Jul 16, 2024

quickjkee commented Jul 18, 2024

About "Guidance distilled diffusion models" #4

About "Guidance distilled diffusion models" #4

Comments

lyx0208 commented Jun 26, 2024

quickjkee commented Jun 26, 2024

lyx0208 commented Jun 27, 2024

lyx0208 commented Jun 27, 2024

lyx0208 commented Jun 27, 2024

quickjkee commented Jun 27, 2024

lyx0208 commented Jun 28, 2024

lyx0208 commented Jul 9, 2024 • edited Loading

quickjkee commented Jul 9, 2024

lyx0208 commented Jul 9, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

lyx0208 commented Jul 10, 2024

quickjkee commented Jul 10, 2024 • edited Loading

lyx0208 commented Jul 10, 2024

lyx0208 commented Jul 16, 2024

quickjkee commented Jul 18, 2024

lyx0208 commented Jul 9, 2024 •

edited

Loading

quickjkee commented Jul 10, 2024 •

edited

Loading