Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Above 32 GB RAM usage, when loading Flux models in checkpoint version. #4239

Open
JorgeR81 opened this issue Aug 6, 2024 · 7 comments
Open
Labels
Potential Bug User is reporting a bug. This should be tested.

Comments

@JorgeR81
Copy link

JorgeR81 commented Aug 6, 2024

Expected Behavior

Keep RAM usage below the limit, to avoid wearing down my SSD.

Actual Behavior

I have 8GB VRAM and 32 GB RAM
I'm on Windows 10

With the full size fp16 models, my RAM usage goes above the limit, when the models need to be loaded.
It works, but the available SSD space goes down.

This is normal, I guess, considering the model sizes.

But it also happens with the ( fp8 ) Comfy-org checkpoint models ( 17.2 GB )

Steps to Reproduce

I used the default workflow.

Mode details in this discussion, with task manager images: #4226

@JorgeR81 JorgeR81 added the Potential Bug User is reporting a bug. This should be tested. label Aug 6, 2024
@JorgeR81 JorgeR81 changed the title VRAM usage above 32 GB RAM, when loading Flux models in checkpoint version. Above 32 GB RAM usage, when loading Flux models in checkpoint version. Aug 6, 2024
@NoMansPC
Copy link

NoMansPC commented Aug 6, 2024

Same happening here. The model I'm using is only 17.2 GB, but it tries to fill up all my RAM before it even tries to use the GPU. I'm so tired of requirements increasingly exponentially in AI. Feels like it's designed to be used online only so you're a slave to their GPU clusters.

@RandomGitUser321
Copy link
Contributor

It's likely doing some kind of casting up to float32 or 16 and then back down to fp8, even if you're using an fp8 version of the model. It might not be the transformer though, maybe it's doing it for the t5 or something. I haven't actually checked to verify though.

@JorgeR81
Copy link
Author

JorgeR81 commented Aug 7, 2024

Here is a summary of my observations, in case it helps.

When I use the fp16 models ( and t5 also in fp16 ):

  • When the Unet is loading, I run out of RAM, for a moment. But then it goes below the limit again. 
  • Then, when the text encoder is loading, I run out of RAM, again. But also temporarily. 
  • When I'm generating I'm at ~20 GB RAM and ~7.2 VRAM usage.
  • In idle, after generating, I'm at about ~26 GB RAM and ~1 GB RAM usage.
  • But if I change the prompt, I will also run out of RAM, temporarily. 

With the Comfy-org Flux checkpoint:

  • When the Checkpoint is loading, I run out of RAM, for a moment. But then it goes below the limit again.
  • When I'm generating I'm at ~ 14 GB RAM and ~7.2 VRAM usage.
  • In idle, after generating, I'm at about ~20 GB RAM and ~1 GB VRAM usage.
  • I can change the prompt, without running out of RAM.

@JorgeR81
Copy link
Author

JorgeR81 commented Aug 7, 2024

Here's some observations from other users, with more RAM.
#4173 (comment)
#3649 (comment)

@RandomGitUser321
Copy link
Contributor

Yeah I think I was on to something about it upcasting:
supported_inference_dtypes = [torch.bfloat16, torch.float32]

class Flux(supported_models_base.BASE):

@JorgeR81
Copy link
Author

JorgeR81 commented Aug 7, 2024

Even if fp8 is not possible, just supporting / upcasting to fp16 would be a good improvement.
I think now it's probably upcasting to fp32 in all cases, while loading.

The fp16 model is 23.8 GB
When the Unet is loading, I start with ~ 4 RAM usage, and I still run out of RAM, even before the text encoder starts loading.
This also happens if I set the weight_type to fp8, in the Unet loader node.
And even if I start Comfy UI with --force-fp16

@KEDI103
Copy link

KEDI103 commented Aug 7, 2024

I got this problem to it blow my ram and swap even if I don't type python main.py --use-split-cross-attention its crash whole ubuntu os. If I run I Stuck at 32 gb ram load 4 gb frozen swap and stuck at .vae and can't gen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug User is reporting a bug. This should be tested.
Projects
None yet
Development

No branches or pull requests

4 participants