Memory optimizations to allow bigger images #2015

jn-jairo · 2023-11-21T12:06:44Z

I realized that my hardware could generate bigger images than I was being able to generate, with those changes I went from the limit of 1024x1024 to 2184x2184 on my 2GB VRAM GPU, that is ~4x bigger.

Now that we have the LCM and we only need 5 steps, it is reasonable to generate bigger images and the limitation was just the OOM errors.

List of changes:

TAESD PREVIEW - The TAESD preview uses the GPU by default and it gets OOM stopping the generation even when there is enough memory for the generation process, the first change was to ignore the OOM of the preview and just don't show the preview if it gets a OOM, the second change was the command line option --preview-cpu to use the CPU for the preview, so you can get TAESD previews even with low VRAM and big sizes.
SPLIT ATTENTION - The split attention allows to have bigger images but it was limiting the steps by a fixed amount that was not enough in many cases, another issue is that the steps were powers of 2, but the steps must be a divider of the q.shape[1], when generating images with sizes like 512x512, 1024x1024, 2048x2048 it is ok to have the steps been a power of 2 but for other sizes the dividers of the q.shape[1] where not a power of two, making it skip the correct divider and getting the OOM even when it could be using another number of steps, and in other situations the estimated memory requirement is bigger than the correct divider, skipping it. So to solve it I added the command line option --memory-estimation-multiplier so we can adjust the value of the modifier that already exists to fit our need and even disable the estimation by setting it to zero, the other change was to compute the next step by finding the next divider instead of multiplying by 2. With these changes it is unlikely someone will get a OOM again in this part of the code if they adjust the memory multiplier properly when needed, based on my tests you will get a OOM error in other parts of the code before, in places that cannot be split anyway.
VAE ENCODE/DECODE - The last thing limiting the generation was the VAE encoding/decoding, it already has a tiled option as fallback in case of OOM but it runs with just the default tile size, but it often gets OOM even when a smaller tile size would solve the problem, so now it tries first with the default tile size as before but in case of OOM in the tiled process, it tries a smaller tile size. I set the limit as the smaller value possible, but it is unlikely it will get there unless someone is generating an extreme batch size for their hardware, but if it gets there it will work anyway and even this lowest tile size has a reasonable quality for this extreme tiling, which is better than a OOM that makes you lose all images.

The default values keep the same behavior as before, so for those that don't have issues with OOM it will work in the same way.

mcmonkey4eva · 2024-09-24T08:31:37Z

A variety of upgrades to memory handling have been pushed since this PR was submitted. if there are still relevant improvements, please rebuild and resubmit against modern comfy

jn-jairo · 2024-09-25T17:32:15Z

Let's be honest, no one said anything about this PR before, and no one will care about it if I redo the PR now, so I won't waste my time.

mcmonkey4eva · 2024-09-26T06:32:49Z

Your PR was submitted when ComfyUI was a massive project ran by one person, and hundreds of random PRs stacked up for said one person to not have enough free time or energy to review them all. ComfyUI is now supported by Comfy Org, an entire team dedicated to improving the project.
That's why I'm going through all these old PRs to check them over and see what's worth bringing back vs. what needs to be closed - we want to work towards actually getting the PR and Issues lists on this repo fully cleared up.

That all said, if you just don't care to bother anymore, that's completely understandable and okay.

Memory optimizations to allow bigger images

0de7950

jn-jairo requested a review from comfyanonymous as a code owner November 21, 2023 12:06

jn-jairo added 2 commits November 21, 2023 19:03

Merge branch 'master' into memory-optim

dca30ca

Initialize variables

2806499

jn-jairo force-pushed the memory-optim branch from a04ae91 to 2806499 Compare November 21, 2023 22:25

jn-jairo added 2 commits November 30, 2023 13:09

Merge master

b9953b3

Merge branch 'master' into memory-optim

bfb7122

KerfuffleV2 mentioned this pull request Dec 22, 2023

feat: Run TAESD previews asyncly on CPU. #2347

Closed

mcmonkey4eva closed this Sep 24, 2024

This was referenced Sep 26, 2024

Fix validation for generic type #1566

Closed

Multiple inputs #1572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimizations to allow bigger images #2015

Memory optimizations to allow bigger images #2015

jn-jairo commented Nov 21, 2023

mcmonkey4eva commented Sep 24, 2024

jn-jairo commented Sep 25, 2024

mcmonkey4eva commented Sep 26, 2024

Memory optimizations to allow bigger images #2015

Memory optimizations to allow bigger images #2015

Conversation

jn-jairo commented Nov 21, 2023

mcmonkey4eva commented Sep 24, 2024

jn-jairo commented Sep 25, 2024

mcmonkey4eva commented Sep 26, 2024