ComfyUI/Flux memory utilization when loading model ? #4318

snoil411 · 2024-08-11T23:04:44Z

Your question

First time ComfyUI user coming from Automatic1111. Ive had no issues using SD, SDXL and SD3 with CcomfyUI but haven't managed to get Flux working due to memory issues. Ive read a lot of people having similar issues but am confused about the following.

I have 32G RAM and 16G VRAM. (AMD card)
I started with flux1Dev_v10.safetensors with t5xxl_fp16.safetensors as I read many people were successful with the same hardware I have.

The model sits there 'loading' for roughly 5 minutes during that time RAM completely fills to 99% and after that HDD utilization sits at 100% until it loads. After it loads clips and vae, during the sampling spits out the 'not enough VRAM error' that ive seen many people get but so far don't see a solution for.

I tried using the schnell model instead + t5xxl_fp8_e4m3fn (i.e half the size) but get the same thing. It takes 500 seconds+ to load the 11Gig model and my RAM usage goes to 100% and HDD sits again at 100% utilization. Then im told again not enough VRAM. QUESTION: why does the schnell model utilize so much RAM ? by comparison I load SD3 (stableDiffusion3SD3_sd3MediumInclT5XXL), which is roughly the same size at 10.5G in less than a minute and it renders fine ?

Cheers....

Logs

No response

Other

No response

ltdrdata · 2024-08-12T00:38:26Z

FLUX fp8 model size is 11GB without T5.
FLUX fp16 model size is 24gb without T5.

And the T5 fp8 is 4GB+. fp16 is 9GB+.

Make sure you are using fp8.

JorgeR81 · 2024-08-12T01:32:23Z

I have 32 GB RAM and I can use Flux.
But I also run out of RAM, when the model is loading, so my system needs to use the page file, leading to SSD activity.

When VRAM is full, the system offloads to RAM.
And when RAM is full, it offloads to the page file ( in your SSD ).

Maybe you need to change some settings on your page file.
#4226

Even the fp8 version of Flux needs more than 32 GB, because it's upcasted to fp32, during loading, and then back down to fp8.
#4239

Recently there was a commit that adds support for Flux to be upcasted to fp16, instead.
But it still needs more than 32 GB ( nor sure why ? )
( and it also slows down my inference speeds, unfortunately )
8115d8c

Also, NF4 support was added today to Comfy UI, via custom node, so we can use even smaller NF4 model versions.
I haven't tried yet, but it seems it still needs the same amount for VRAM.
comfyanonymous/ComfyUI_bitsandbytes_NF4#9 (comment)

But I think support for these formats would probably be optimized in the future, so that we'll need less system resources to use Flux.

kakachiex2 · 2024-08-12T02:37:02Z

I have 6gb of V-Ram RTX260 and I can use Flux but at the cost of waiting 1hour for the model to load but this behavior start when I updated ComfyUI before it takes like 20 to 30 minutes to load.

ltdrdata · 2024-08-12T04:01:09Z

I have 32 GB RAM and I can use Flux. But I also run out of RAM, when the model is loading, so my system needs to use the page file, leading to SSD activity.

When VRAM is full, the system offloads to RAM. And when RAM is full, it offloads to the page file ( in your SSD ).

Maybe you need to change some settings on your page file. #4226

Even the fp8 version of Flux needs more than 32 GB, because it's upcasted to fp32, during loading, and then back down to fp8. #4239

Recently there was a commit that adds support for Flux to be upcasted to fp16, instead. But it still needs more than 32 GB ( nor sure why ? ) ( and it also slows down my inference speeds, unfortunately ) 8115d8c

Also, NF4 support was added today to Comfy UI, via custom node, so we can use even smaller NF4 model versions. I haven't tried yet, but it seems it still needs the same amount for VRAM. comfyanonymous/ComfyUI_bitsandbytes_NF4#9 (comment)

But I think support for these formats would probably be optimized in the future, so that we'll need less system resources to use Flux.

The issue is said to be a limitation of the safetensors library.
To address this, an update has been made to load the TextEncoder into VRAM when there is sufficient VRAM available.
This is expected to alleviate the temporary RAM shortage that causes swapping.
Try updating ComfyUI and testing it once.

5c69cde

JorgeR81 · 2024-08-12T08:15:50Z

The issue is said to be a limitation of the safetensors library.
To address this, an update has been made to load the TextEncoder into VRAM when there is sufficient VRAM available.
This is expected to alleviate the temporary RAM shortage that causes swapping.
Try updating ComfyUI and testing it once.

5c69cde

Still the same.
The page file is still needed when the model is loading, ( tried the fp8 checkpoint version ).
And about the same loading and sampling ( sec/it ) times.

I never ran out of RAM, when changing the prompt, when using the fp8 models.
I only had that issue with the fp16 models.

Also, when using the UNET loader, in fp8 mode, I run out of RAM, even before starting to load the t5 encoder.

JorgeR81 · 2024-08-12T08:22:49Z

I have 6gb of V-Ram RTX260 and I can use Flux but at the cost of waiting 1hour for the model to load but this behavior start when I updated ComfyUI before it takes like 20 to 30 minutes to load.

You have some sort or issue here.

My loading time is about 1 minute.
I have a SATA SSD.

snoil411 · 2024-08-12T11:02:13Z

FLUX fp8 model size is 11GB without T5. FLUX fp16 model size is 24gb without T5.
And the T5 fp8 is 4GB+. fp16 is 9GB+.
Make sure you are using fp8.

As said in OP I first tried fp16 then fp8, but as someone else said below fp8 is upscaled to fp32 during loading so I assume this is what causes my issue.

Even the fp8 version of Flux needs more than 32 GB, because it's upcasted to fp32, during loading, and then back down to fp8.
#4239

I didnt know this so Im guessing this is the issue for the RAM filling up.

My other question then is what could cause the VRAM issue/s (or how/what can I do to identify the issue) ? after it is downscaled back to fp8 and the clip models and vae are loaded I still get 'not enough VRAM' issues. (I have 16GB). The guy above me has only 6GB (6gb of V-Ram RTX260) compared to my 16GB and is able to use FLUX after the model finally loads.

JorgeR81 · 2024-08-12T11:22:15Z

My other question then is what could cause the VRAM issue/s (or how/what can I do to identify the issue) ? after it is downscaled back to fp8 and the clip models and vae are loaded I still get 'not enough VRAM' issues. (I have 16GB). The guy above me has only 6GB (6gb of V-Ram RTX260) compared to my 16GB and is able to use FLUX after the model finally loads.

Yeah that may be a different issue.
I can also run Flux with 8GB VRAM ( GTX 1070 )
When VRAM is full, the system should offload to RAM.
And once the Flux model is done loading, it should be in fp8, so you should have RAM available.
Have you looked at your task manager when you try to generate an image?
#4226 (comment)

snoil411 · 2024-08-12T13:29:33Z

And once the Flux model is done loading, it should be in fp8, so you should have RAM available.

According to task manager once the model is loaded RAM stays full ? despite being in fp8 ?

When VRAM is full, the system should offload to RAM.

Id assume thats my problem then ? my RAM remains full even though there should be some available after being downscaled to fp8. VRAM can now not be offloaded to RAM ?

Still leaves the question as to why RAM remains full after model is loaded and downscaled ?

Out of curiosity, how much VRAM should the Schnell and Dev models occupy respectively if you had unlimited VRAM to play around with ?

EDIT** I was thinking of trying another large non flux model to see if I could get same results/errors. Im only familiar with SD and the biggest is SD3 (works flawlessly). Are there any other LARGE 20gig+models that are likely to fill up my RAM and VRAM ?

JorgeR81 · 2024-08-12T14:09:31Z

Id assume thats my problem then ? my RAM remains full even though there should be some available after being downscaled to fp8. VRAM can now not be offloaded to RAM ?

Yeah, I also think that's the problem.
My RAM usage drops to about 20GB, after the fp8 model is done loading.

Out of curiosity, how much VRAM should the Schnell and Dev models occupy respectively if you had unlimited VRAM to play around with ?

The Schnell and Dev models are the same size.
The difference is between the fp8 and fp16 versions of them.
The T5 encoder also has fp8 and fp16 versions.

FLUX fp8 model size is 11GB without T5.
FLUX fp16 model size is 24gb without T5.

And the T5 fp8 is 4GB+. fp16 is 9GB+.

snoil411 · 2024-08-12T20:58:24Z

My RAM usage drops to about 20GB, after the fp8 model is done loading.

what is your VRAM usage when this happens ? and is this with the fp16 or fp8 models ?

Just trying to sum up the requirements and to see how it varies person to person
IF anyone else wants to include their RAM+VRAM usage once all the models are loaded and image is generating please do.

I was grasping at straws and though I doubt (have no idea :) ) if it will have an effect on memory I thought id use RocM instead of Directml but Pytorch+RocM 6 is only available on Linux so im off to do an Ubuntu install on an empty SSD.
Even if I cant get Flux working SD speeds are supposed to be quite a bit faster on Linux

The Schnell and Dev models are the same size.

Yeah got a bit confused, I downloaded different fp8 schnell models from different sources. One was 11G one was 17G.

JorgeR81 · 2024-08-12T22:04:09Z

what is your VRAM usage when this happens ? and is this with the fp16 or fp8 models ?

With fp8, while generating ( KSampler ), I use about 14+7 GB of RAM+VRAM. In idle, it's about 20+1 GB.
With fp16, while generating, I use about 20+7 GB of RAM+VRAM. In idle, it's about 26+1 GB.

So VRAM usage is the same with fp8 or fp16
I think the VRAM is filled as much as possible, during generation ( 8 GB in my case ), and the rest goes to RAM.
Ideally the whole model should fit in VRAM.
The more you need to have in RAM, the slower the generation would be.

so im off to do an Ubuntu install on an empty SSD

Are you on Linux or Windows, right now ?
If you can try Comfy UI on Linux, you probably should.
I have read here that Comfy UI has better support for AMD GPU's on Linux.

One was 11G one was 17G

These are both fp8.
The 11G uses the Unet loader node, and the t5 encoder is loaded from another node.
The 17G includes the t5 encoder and uses the default checkpoint loader node.

The workflows are in the example images:
https://comfyanonymous.github.io/ComfyUI_examples/flux/

snoil411 · 2024-08-13T01:57:59Z

Are you on Linux or Windows, right now ?
If you can try Comfy UI on Linux, you probably should.
I have read here that Comfy UI has better support for AMD GPU's on Linux.

All my above issues were on Windows but had it setup on Linux in about 30 minutes this time with RocM and Pytorch. Had SOME unrelated issues getting RocM to work with an integrated GPU (7900XT) but was an easy fix.

fp16 flux now working. Amazing how much faster the models load. 1024*1024 image generates in less than 2 minutes including the loading of the models.

Still a mystery as to why I had/have issues in WIndows but guess ill stick to linux for AI.

Thanks for help and explanations.

BigBanje · 2024-08-13T15:03:32Z

My 3090 can no longer do anything with Flux, as I experience extreme bottlenecking after some recent update to comfyui.

I can't even use fp8 anymore... I was previously using Loras with fp16 + hires fix, no issue. Now I can't make a single creation at all.

Everything is updated, so I'm just going to (once again) go through the process of downgrading my comfyui until I find one that isn't broken...

JorgeR81 · 2024-08-13T15:19:16Z

@BigBanje, I think you meant to post on the larger thread.
This issue is probably related.
But @comfyanonymous, asked yesterday for users to post their system specs and exact workflow, if the issue persists.

#4271 (comment)

ltdrdata · 2024-08-13T15:23:12Z

My 3090 can no longer do anything with Flux, as I experience extreme bottlenecking after some recent update to comfyui.

I can't even use fp8 anymore... I was previously using Loras with fp16 + hires fix, no issue. Now I can't make a single creation at all.

Everything is updated, so I'm just going to (once again) go through the process of downgrading my comfyui until I find one that isn't broken...

If you could identify which commit started causing this issue, it would be helpful for debugging.

jslegers · 2024-08-14T05:56:57Z

My 3090 can no longer do anything with Flux, as I experience extreme bottlenecking after some recent update to comfyui.
I can't even use fp8 anymore... I was previously using Loras with fp16 + hires fix, no issue. Now I can't make a single creation at all.

I noticed that the UNETLoader.load_unet takes a lot more memory since the most recent changes when loading a FLUX transformer unet of weight_dtype fp8_e4m3fn.

Before the changes I could stay under 12GB total VRAM usage when loading a fp8_e4m3fn version of the flux1-schnell after first loading the t5xxl text decoder (given a minor tweak to unet_offload_device - see #4319).

After the changes, I run into the 16GB memory limit when the FLUX transformer unet is loaded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComfyUI/Flux memory utilization when loading model ? #4318

ComfyUI/Flux memory utilization when loading model ? #4318

snoil411 commented Aug 11, 2024

ltdrdata commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024 •

edited

Loading

kakachiex2 commented Aug 12, 2024

ltdrdata commented Aug 12, 2024 •

edited

Loading

JorgeR81 commented Aug 12, 2024 •

edited

Loading

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024 •

edited

Loading

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 13, 2024

BigBanje commented Aug 13, 2024

JorgeR81 commented Aug 13, 2024

ltdrdata commented Aug 13, 2024

jslegers commented Aug 14, 2024

Francklin9999 commented Aug 20, 2024

JorgeR81 commented Aug 20, 2024 •

edited

Loading

ComfyUI/Flux memory utilization when loading model ? #4318

ComfyUI/Flux memory utilization when loading model ? #4318

Comments

snoil411 commented Aug 11, 2024

Your question

Logs

Other

ltdrdata commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024 • edited Loading

kakachiex2 commented Aug 12, 2024

ltdrdata commented Aug 12, 2024 • edited Loading

JorgeR81 commented Aug 12, 2024 • edited Loading

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024 • edited Loading

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 12, 2024

JorgeR81 commented Aug 12, 2024

snoil411 commented Aug 13, 2024

BigBanje commented Aug 13, 2024

JorgeR81 commented Aug 13, 2024

ltdrdata commented Aug 13, 2024

jslegers commented Aug 14, 2024

Francklin9999 commented Aug 20, 2024

JorgeR81 commented Aug 20, 2024 • edited Loading

JorgeR81 commented Aug 12, 2024 •

edited

Loading

ltdrdata commented Aug 12, 2024 •

edited

Loading

JorgeR81 commented Aug 12, 2024 •

edited

Loading

snoil411 commented Aug 12, 2024 •

edited

Loading

JorgeR81 commented Aug 20, 2024 •

edited

Loading