Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimizations to allow bigger images #2015

Closed
wants to merge 5 commits into from

Conversation

jn-jairo
Copy link
Contributor

I realized that my hardware could generate bigger images than I was being able to generate, with those changes I went from the limit of 1024x1024 to 2184x2184 on my 2GB VRAM GPU, that is ~4x bigger.

Now that we have the LCM and we only need 5 steps, it is reasonable to generate bigger images and the limitation was just the OOM errors.

List of changes:

  • TAESD PREVIEW - The TAESD preview uses the GPU by default and it gets OOM stopping the generation even when there is enough memory for the generation process, the first change was to ignore the OOM of the preview and just don't show the preview if it gets a OOM, the second change was the command line option --preview-cpu to use the CPU for the preview, so you can get TAESD previews even with low VRAM and big sizes.

  • SPLIT ATTENTION - The split attention allows to have bigger images but it was limiting the steps by a fixed amount that was not enough in many cases, another issue is that the steps were powers of 2, but the steps must be a divider of the q.shape[1], when generating images with sizes like 512x512, 1024x1024, 2048x2048 it is ok to have the steps been a power of 2 but for other sizes the dividers of the q.shape[1] where not a power of two, making it skip the correct divider and getting the OOM even when it could be using another number of steps, and in other situations the estimated memory requirement is bigger than the correct divider, skipping it. So to solve it I added the command line option --memory-estimation-multiplier so we can adjust the value of the modifier that already exists to fit our need and even disable the estimation by setting it to zero, the other change was to compute the next step by finding the next divider instead of multiplying by 2. With these changes it is unlikely someone will get a OOM again in this part of the code if they adjust the memory multiplier properly when needed, based on my tests you will get a OOM error in other parts of the code before, in places that cannot be split anyway.

  • VAE ENCODE/DECODE - The last thing limiting the generation was the VAE encoding/decoding, it already has a tiled option as fallback in case of OOM but it runs with just the default tile size, but it often gets OOM even when a smaller tile size would solve the problem, so now it tries first with the default tile size as before but in case of OOM in the tiled process, it tries a smaller tile size. I set the limit as the smaller value possible, but it is unlikely it will get there unless someone is generating an extreme batch size for their hardware, but if it gets there it will work anyway and even this lowest tile size has a reasonable quality for this extreme tiling, which is better than a OOM that makes you lose all images.

The default values keep the same behavior as before, so for those that don't have issues with OOM it will work in the same way.

@mcmonkey4eva
Copy link
Collaborator

A variety of upgrades to memory handling have been pushed since this PR was submitted. if there are still relevant improvements, please rebuild and resubmit against modern comfy

@jn-jairo
Copy link
Contributor Author

Let's be honest, no one said anything about this PR before, and no one will care about it if I redo the PR now, so I won't waste my time.

@mcmonkey4eva
Copy link
Collaborator

Your PR was submitted when ComfyUI was a massive project ran by one person, and hundreds of random PRs stacked up for said one person to not have enough free time or energy to review them all. ComfyUI is now supported by Comfy Org, an entire team dedicated to improving the project.
That's why I'm going through all these old PRs to check them over and see what's worth bringing back vs. what needs to be closed - we want to work towards actually getting the PR and Issues lists on this repo fully cleared up.

That all said, if you just don't care to bother anymore, that's completely understandable and okay.

This was referenced Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants