Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage When Loading Flux Model in ComfyUI #4480

Open
Govee-Chan opened this issue Aug 19, 2024 · 45 comments
Open

High Memory Usage When Loading Flux Model in ComfyUI #4480

Govee-Chan opened this issue Aug 19, 2024 · 45 comments
Labels
Feature A new feature to add to ComfyUI.

Comments

@Govee-Chan
Copy link

Feature Idea

Hello,

I am experiencing a significant memory usage issue when using the Flux model in ComfyUI. During the model loading phase, the memory consumption spikes to approximately 70GB. This seems excessively high and may not be feasible for many users.

Existing Solutions

No response

Other

No response

@Govee-Chan Govee-Chan added the Feature A new feature to add to ComfyUI. label Aug 19, 2024
@JorgeR81
Copy link

JorgeR81 commented Aug 19, 2024

Same for me.
I only have 32 GB RAM, and my system needs to use the file page, while loading, even for the FT8 version.
#4239

I hope they can improve this.

The Q8_0 format looks almost as good as FT16, loads faster and requires less than 32 GB RAM while loading.
https://github.com/city96/ComfyUI-GGUF

The down side could be less compatibility with other features and less model finetunes, if the format does not gain popularity.

@JorgeR81
Copy link

memory consumption spikes to approximately 70GB

So even users with 64 GB RAM need to use the page file !

By the way, is this with PP16 ?
How much RAM for FP8 ?

@DivineOmega
Copy link

DivineOmega commented Aug 19, 2024

Reverting to commit 3e52e03 seems to have resolved the issue for me.

git checkout 3e52e0364cf81764f58e5aa4f53f0b702f4d4a81

@JorgeR81
Copy link

Reverting to commit 3e52e03 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

@DivineOmega
Copy link

DivineOmega commented Aug 19, 2024

Reverting to commit 3e52e03 seems to have resolved the issue for me.

How much RAM do you need now ?

I needed above 32 GB, even before this commit.

I've not done exact measurement, but I have a 16 GB GeForce RTX 3060, and at that commit I am able to run the Flux dev FP8 with at least 1 Lora with no issues.

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

@JorgeR81
Copy link

Edit: I have 32 GB RAM (realised you were asking about RAM, not VRAM)

Yes, Comfy UI will use most of your available VRAM, while generating ( 8 GB in my case ), and the rest is offloaded to RAM.
So while the KSampler is running I use about 15 GB RAM, in FP8 mode.

The problem is that when the Flux mode is loading, it uses a lot of RAM.
With "only" 32 GB of RAM, if you don't get an OOM error, it's because your system uses the page file.

You can monitor RAM / VRAM usage in the Task Manager, while generating an image.

@DivineOmega
Copy link

It's also working fine for me at 83f3431.

@DivineOmega
Copy link

Okay. I've done some checks at different commits.

For me, the last commit which works is 14af129.

Commits beyond this (starting at bb222ce) cause out of memory issues (torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory).

If I'm understanding correcting, the issue may be being caused by the changes to the memory mangement code here: comfy/model_patcher.py. However, I'm familiar with code base so I might be looking at this wrong.

@JorgeR81
Copy link

For me, the last commit which works is 14af129

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

@DivineOmega
Copy link

For me, the last commit which works is 14af129

So, you have a specific memory error and you can't generate images ?

I think @Govee-Chan was just talking about Flux requiring a lot of RAM while loading, but still being able to generate images.

Yes, beyond the commit I mentioned I get a standard CUDA out of memory error every other generation when using Flux. For full transparency, I'm using ComfyUI via SwarmUI.

@JorgeR81
Copy link

I'm on the latest commit, with Comfy UI portable, and I don't have any errors.
Maybe it's a SwarmUI issue ?

Also you mention your error happens "every other generation", so that means the model was already loaded.

But I think @Govee-Chan refers to when the model is loaded on RAM for the first time ( on the first generation ).

@YureP
Copy link

YureP commented Aug 19, 2024

I have OOM's too after the yesterday's commits. Not only with flux but, strangely, even using SD 1.5 checkpoints. I've an RTX 3060 12 GB VRAM and 80 GB system RAM, Linux.
Now i'm using a 2 days ago commit and have no problem generating with the 22.2 GB Flux-DEV (FP16 etc.) plus Lora.

@D-Ogi
Copy link

D-Ogi commented Aug 19, 2024

Same here. I use flux only. First generation is successful. Second fails even for 512x512 images. Third is successful again and so on. RTX 4090, 64GB RAM.

@Chryseus
Copy link

Chryseus commented Aug 19, 2024

Getting OOM now after a few generations using Q8 quant, worked just fine a few days ago, 64GB RAM, 4060Ti 16GB.
Python 3.10.11, Windows 10, Pytorch 2.4.0 cu124, xformers 0.0.27.post2

@YureP
Copy link

YureP commented Aug 19, 2024

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

@comfyanonymous
Copy link
Owner

Can you check if you still have those OOM issues on the latest commit?

@dan4ik94
Copy link

dan4ik94 commented Aug 19, 2024

Can you check if you still have those OOM issues on the latest commit?

I still have OOM problems every 2-3 generations. Happens mostly when I change the prompt, it becomes very slow like I'm loading the checkpoint for the first time, then OOM. (flux schnell, rtx 3060 12 gb, 64gb ram)

File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 57, in stochastic_rounding
    return manual_stochastic_round_to_float8(value, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\comfy\float.py", line 40, in manual_stochastic_round_to_float8
    sign * (2.0 ** (exponent - EXPONENT_BIAS)) * (1.0 + mantissa),
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: Allocation on device
Got an OOM, unloading all loaded models.

@comfyanonymous
Copy link
Owner

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

@Govee-Chan
Copy link
Author

Can you check if you still have those OOM issues on the latest commit?

the latest commit seems to solve my problem, the comfy thread occupied 20% ram at the peak(Ive got 64 intotal, so 13g seems normal), but I haven't try it on my AWS instance where I found the OOM originally. I suspect that the issue is due to my instance having too little memory(16g), but theoretically, 16G should be sufficient to run it, right?

Thx anyway, I will try --reserve-vram 0.6 on my instance and see if it works

@Govee-Chan
Copy link
Author

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

i got no problem with the vram, I suspect there might be an issue during the transfer from memory to GPU memory while loading the model

@Foul-Tarnished
Copy link

IS it related to pytorch 2.4 ?
I tried sdwebui-forge with pytorch 2.4, and it also spike to ~70GB ram usage

@Chryseus
Copy link

I'm using Pytorch 2.4, RAM usage loading FP8 spikes to 38GB, switching model after this goes up to 58GB so maybe there is something that can be done to improve model switching, the latest updates seem to have fixed the OOM issue although I find it interesting how the VRAM usage creeps up with the first few runs of the text encoder, maybe something is not getting unloaded properly or maybe this is intended behaviour.

@JorgeR81
Copy link

switching model after this goes up to 58GB

When you do switch, is it for the Flux FP16 version ?
I think the FP16 one requires more RAM while loading.

@Chryseus
Copy link

Chryseus commented Aug 20, 2024

When you do switch, is it for the Flux FP16 version ? I think the FP16 one requires more RAM while loading.

I've tried switching between FP8 and the Q8 quant which are fairly similar on VRAM usage, Q8 is very slightly higher.

@JorgeR81
Copy link

When I use Q8, I don't have RAM spikes while loading.
It never goes above 32 GB.

But I never tried to use it after FP8.

@SchrodingersCatwalk
Copy link

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

OOM errors resolved at 0.7, NVIDIA GeForce RTX 3080 Laptop GPU, 16GB, Linux, normal VRAM mode

@DivineOmega
Copy link

The latest updates mostly worked fine for me, but after trying to use Flux with >= 1 Lora, I was receiving OOM errors. Setting --reserve-vram to 0.7 resolved this.

@YureP
Copy link

YureP commented Aug 20, 2024

OK, for me: commit d1a6bd6, at 0.6 I can generate using the full flux-dev model, but get an OOM using a lora (realism lora), and the same at 0.7. At 0.8 I can generate everything. I made a little stress test, generating several times with flux, then with XL, back to flux, alternating generation with the full model and the Q8, and so on, and had no OOMs.
The max VRAM load is 11.99/12 GB, and the max system RAM load is 46/80 GB.

@screan
Copy link

screan commented Aug 20, 2024

updated comfy and now getting OOM with Lora as well today, worked fine yesterday.

FIrst generations works fine, then OOM after.

@Foul-Tarnished
Copy link

Foul-Tarnished commented Aug 20, 2024

Q6_K is not even 0.4% worse than Q8 (for perplexity of 13B LLMs)
And you gain +1gb vram

@ErixStrong
Copy link

ErixStrong commented Aug 21, 2024

Just updated and tested, but i'm getting always OOM's (tested only flux). Returned to 14af129, which is working well

How to return to an older commit ?

Ok I found how!

@dan4ik94
Copy link

dan4ik94 commented Aug 21, 2024

--reserve-vram 0.6

If you add this argument does it fix it? If it doesn't try increasing it by 0.1 until it works then tell me what the value is.

I can confirm reserving a portion of vram (0.7-1.0) helps, after 20 generations with 3 loras, no more OOMs on 3060.
🌞

@RedDeltas
Copy link

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to 14af129 did fix it for me though.

@ltdrdata
Copy link
Collaborator

I had the same issue and the --reserve-vram flag didn't work for me, I tried values 0.6-1.0 and it didn't resolve the issue. Reverting back to 14af129 did fix it for me though.

try --disable-smart-memory

@tobias-varden
Copy link

I also got OOM with c6812947e98eb384250575d94108d9eb747765d9 so I had to revert back to 6ab1e6fd4a2f7cc5945310f0ecfc11617aa9a2cb which fixed the issue. I am using Flux fp8 together with two LORAs.

@btln
Copy link

btln commented Aug 27, 2024

--disable-smart-memory fixed it for me. Thank's!

@CasualDev242
Copy link

Same issue. I have 64gb of RAM which ought to be plenty, and as of the recent updates the RAM usage has skyrocketed to the point where ComfyUI uses up to 70-80% of my RAM and I have to shut off the app to prevent issues

@JorgeR81
Copy link

I think the full model it's being upcasted to FP32, while loading, so this would be about 45 GB ( without the T5 encoder ). 

Could it be possible to upcast the Flux model, block by block ( instead of all at once ), keeping RAM usage lower ?

@CasualDev242
Copy link

Why is this marked as "feature" and not "bug"? I had to revert to an earlier commit, and can now use ComfyUI. I can't use current versions due to the absurd RAM usage.

@JorgeR81
Copy link

JorgeR81 commented Sep 2, 2024

Why is this marked as "feature" and not "bug"?

I actually opened this, as a bug, a while back, but still not fixed.
#4239

@JorgeR81
Copy link

JorgeR81 commented Sep 2, 2024

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

@CasualDev242
Copy link

CasualDev242 commented Sep 2, 2024

I had to revert to an earlier commit, and can now use ComfyUI

As far as I know, the high RAM usage ( above 32 GB ), while loading, has been a problem since the beginning.

@CasualDev242, you may have a different issue. Do you always have high RAM usage or only while loading the Flux model ?

Like I mentioned, the bug is only with recent commits, and yes, it's using Flux. I did not have high RAM usage prior to these commits. It hasn't been a problem since the beginning for me since an earlier commit fixes it and it didn't use to occur. Loading the same Flux model and Loras with an earlier commits doesn't cause the absurd RAM issue (remember, I have 64gb of RAM, and ComfyUI is using 70%+ of it? How is that not an issue with the code?)

@comfyanonymous
Copy link
Owner

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

@JorgeR81
Copy link

JorgeR81 commented Sep 2, 2024

If you are on windows it's perfectly normal for it to use up to 2x the memory of your largest safetensors file when loading. If you use the 22GB file + 10GB t5 it might peak at: 22 * 2 + 10 so 54GB ram usage when loading then drop down to 32GB ram usage.

For the fp8 checkpoint it's going to peak at 17.2 * 2 so ~35 GB.

That's an issue with the safetensors library on windows. Linux doesn't have this issue.

So this in an issue with the safetensors file type.

I'm on windows 10.
This issue does not happen with the GGUF models ( e.g.: flux1-dev-Q8_0.gguf )
https://huggingface.co/city96/FLUX.1-dev-gguf/tree/main

But there is also a full quality version there: flux1-dev-F16.gguf ( 22 GB )
Do you think using this flux1-dev-F16.gguf model could fix the problem ?


EDIT: Apparently not. With flux1-dev-F16.gguf, my RAM usage still goes from 3.8 GB to above 32 GB.

I also tried a native FP8 Flux model, ( 11 GB ), but it also requires above 32 GB RAM while loading. 

@keyvez
Copy link

keyvez commented Sep 14, 2024

reserve

Thanks reserve-ram=0.6 worked for me, but I can't figure out how when I was able to run these workflows without issues. I even geneated 200 images for over an hour on 24gb vram, and then it broke in the middle of it. To be fair I was putting all those images in 1 large image, it would've been 400 1024x1024 images, which I think could have been the main cause, what I can't understand is how doing that broke the entire installation of comfyui that now i can't even generare 1 image with flux.

I also installed Crystool in the middle of that large image generation, but hadn't restarted the server and was waiting to restart once the task was over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A new feature to add to ComfyUI.
Projects
None yet
Development

No branches or pull requests