-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComfyUI/Flux memory utilization when loading model ? #4318
Comments
FLUX fp8 model size is 11GB without T5. And the T5 fp8 is 4GB+. fp16 is 9GB+. Make sure you are using fp8. |
I have 32 GB RAM and I can use Flux. When VRAM is full, the system offloads to RAM. Maybe you need to change some settings on your page file. Even the fp8 version of Flux needs more than 32 GB, because it's upcasted to fp32, during loading, and then back down to fp8. Recently there was a commit that adds support for Flux to be upcasted to fp16, instead. Also, NF4 support was added today to Comfy UI, via custom node, so we can use even smaller NF4 model versions. But I think support for these formats would probably be optimized in the future, so that we'll need less system resources to use Flux. |
I have 6gb of V-Ram RTX260 and I can use Flux but at the cost of waiting 1hour for the model to load but this behavior start when I updated ComfyUI before it takes like 20 to 30 minutes to load. |
The issue is said to be a limitation of the safetensors library. |
Still the same. I never ran out of RAM, when changing the prompt, when using the fp8 models. Also, when using the UNET loader, in fp8 mode, I run out of RAM, even before starting to load the t5 encoder. |
You have some sort or issue here. My loading time is about 1 minute. |
As said in OP I first tried fp16 then fp8, but as someone else said below fp8 is upscaled to fp32 during loading so I assume this is what causes my issue.
I didnt know this so Im guessing this is the issue for the RAM filling up. My other question then is what could cause the VRAM issue/s (or how/what can I do to identify the issue) ? after it is downscaled back to fp8 and the clip models and vae are loaded I still get 'not enough VRAM' issues. (I have 16GB). The guy above me has only 6GB (6gb of V-Ram RTX260) compared to my 16GB and is able to use FLUX after the model finally loads. |
Yeah that may be a different issue. |
According to task manager once the model is loaded RAM stays full ? despite being in fp8 ?
Id assume thats my problem then ? my RAM remains full even though there should be some available after being downscaled to fp8. VRAM can now not be offloaded to RAM ? Still leaves the question as to why RAM remains full after model is loaded and downscaled ? Out of curiosity, how much VRAM should the Schnell and Dev models occupy respectively if you had unlimited VRAM to play around with ? EDIT** I was thinking of trying another large non flux model to see if I could get same results/errors. Im only familiar with SD and the biggest is SD3 (works flawlessly). Are there any other LARGE 20gig+models that are likely to fill up my RAM and VRAM ? |
Yeah, I also think that's the problem.
The Schnell and Dev models are the same size.
|
what is your VRAM usage when this happens ? and is this with the fp16 or fp8 models ? Just trying to sum up the requirements and to see how it varies person to person I was grasping at straws and though I doubt (have no idea :) ) if it will have an effect on memory I thought id use RocM instead of Directml but Pytorch+RocM 6 is only available on Linux so im off to do an Ubuntu install on an empty SSD.
Yeah got a bit confused, I downloaded different fp8 schnell models from different sources. One was 11G one was 17G. |
So VRAM usage is the same with fp8 or fp16
Are you on Linux or Windows, right now ?
These are both fp8. The workflows are in the example images: |
All my above issues were on Windows but had it setup on Linux in about 30 minutes this time with RocM and Pytorch. Had SOME unrelated issues getting RocM to work with an integrated GPU (7900XT) but was an easy fix. fp16 flux now working. Amazing how much faster the models load. 1024*1024 image generates in less than 2 minutes including the loading of the models. Still a mystery as to why I had/have issues in WIndows but guess ill stick to linux for AI. Thanks for help and explanations. |
My 3090 can no longer do anything with Flux, as I experience extreme bottlenecking after some recent update to comfyui. I can't even use fp8 anymore... I was previously using Loras with fp16 + hires fix, no issue. Now I can't make a single creation at all. Everything is updated, so I'm just going to (once again) go through the process of downgrading my comfyui until I find one that isn't broken... |
@BigBanje, I think you meant to post on the larger thread. |
If you could identify which commit started causing this issue, it would be helpful for debugging. |
I noticed that the Before the changes I could stay under 12GB total VRAM usage when loading a After the changes, I run into the 16GB memory limit when the FLUX transformer unet is loaded. |
I have a i12700h cpu with 16 gb ram and a 3070ti 8gb vram 16gb (shared memory) and I run out of memory with both my cpu and gpu. When running on my gpu I get runner out of memory and on my cpu it just gets killed. Someone knows how I can still run it so I can generate images? I don't care if it takes an hour. |
This custom node allows you to use Flux and the t5 text encoder in smaller formats. They will use less RAM. The Q4 format is below 8GB in size. |
Your question
First time ComfyUI user coming from Automatic1111. Ive had no issues using SD, SDXL and SD3 with CcomfyUI but haven't managed to get Flux working due to memory issues. Ive read a lot of people having similar issues but am confused about the following.
I have 32G RAM and 16G VRAM. (AMD card)
I started with flux1Dev_v10.safetensors with t5xxl_fp16.safetensors as I read many people were successful with the same hardware I have.
The model sits there 'loading' for roughly 5 minutes during that time RAM completely fills to 99% and after that HDD utilization sits at 100% until it loads. After it loads clips and vae, during the sampling spits out the 'not enough VRAM error' that ive seen many people get but so far don't see a solution for.
I tried using the schnell model instead + t5xxl_fp8_e4m3fn (i.e half the size) but get the same thing. It takes 500 seconds+ to load the 11Gig model and my RAM usage goes to 100% and HDD sits again at 100% utilization. Then im told again not enough VRAM. QUESTION: why does the schnell model utilize so much RAM ? by comparison I load SD3 (stableDiffusion3SD3_sd3MediumInclT5XXL), which is roughly the same size at 10.5G in less than a minute and it renders fine ?
Cheers....
Logs
No response
Other
No response
The text was updated successfully, but these errors were encountered: