Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved issue with black and white images in nodes_mask.py #4110

Closed
wants to merge 1 commit into from

Conversation

shawnington
Copy link
Contributor

@shawnington shawnington commented Jul 25, 2024

Nodes that use the composite function accept both bw 1:w:h and rgb b:3:h:w images as input. Both formats correctly display in the image preview node, however, the formats have transposed HW and incompatible C dimensions, which throws an error in nodes that attempt to composite them.

This is the normal format for PIL to create an image from a single channel numpy array such as a mask if .convert("RGB") is not used. This is a fairly common omission, as it does not impact preview behavior, and compositing of the mask preview over an RGB image was not a common workflow task.

However, with the introduction of UnionControlnet, compositing of mask previews is however now a common operation as it is part of creating a conditioning image for repaint union operations.

The resultant image will pass through most image nodes without issue when converted back into a tensor, and will display properly in preview nodes, but will not composite properly with an "RGB" image. This can make it confusing for users as to where and what the source of the error is.

This change introduces a function that checks if a tensor is in the PIL b&w shape 1:w:h and converts to the RGB shape b:3:h:w so that composite operations do not unexpectedly fail while the image preview displays an image successfully.

An alternative solution is to make it so that image inputs only accept tensors of shape b:3:h:w, instead of also accepting 1:w:h and b:1:w:h images.

Nodes that use the composite function accept both bw and rgb images. Both formats correctly display in the image preview node, however, the formats have transposed HW and incompatible C dimensions, even though they are both accepted as inputs. 

This change introduces a function that converts black and white inputs to rgb so that composite operations do not unexpectedly fail while the image preview displays an image successfully.
Copy link
Contributor

@guill guill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll recap some discussion we had on the #backend-development Discord.

Ultimately, I think that the root cause of the issue here is the fact that a node is returning a MASK (i.e. a BHW tensor) when it says it's returning an IMAGE (i.e. a BHWC tensor). Unfortunately, as @shawnington said, this displays just fine in the Preview Image node which confuses end-users and makes people think it's a valid image.

In my opinion, the right long-term fix is to prevent nodes from returning malformed types. The back-end could perform validation on built-in types (e.g. for IMAGE, ensuring that they are BHWC tensors) and throw an error on any nodes that fail this. This would force the node authors not to ship buggy nodes like this.

In the shorter term (to maintain backward-compatibility), we could throw warnings in the console to encourage node authors to fix their bugs.

A band-aid fix like this PR certainly doesn't hurt anything and would improve the user experience, so I think it's a reasonable stop-gap solution, but it really is just a band-aid. While this change would make this one node work when given a MASK that's masquerading as an IMAGE, many other nodes will still fail with it.

If we start finding ourselves adding this logic to more nodes, I really think we should be making the more foundational fix.

@@ -5,7 +5,19 @@

from nodes import MAX_RESOLUTION

def tensor_to_rgb(img):
# convert from bw to rgb : cwh -> bchw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. I'm fairly certain that masks in ComfyUI are BHW and images are BHWC. Your logic here works out when the batch size is 1, but will break if there is a batch of more than one mask.

Copy link
Contributor

@guill guill Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see -- .movedim(-1, 1) is called before calling this function for the RGB case -- so it would be BCHW. Still, I think you want to insert the channel dimension rather than the batch dimension.

Copy link
Contributor Author

@shawnington shawnington Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see -- .movedim(-1, 1) is called before calling this function for the RGB case -- so it would be BCHW. Still, I think you want to insert the channel dimension rather than the batch dimension.

Yeah, I am not sure why it wants BCHW When the LoadImage node outputs BHWC, but the composite nodes in nodes_mask.py definitely require this format to perform their operations, and it is also the format it returns.

The batch dimension is needed because it accesses channel dimension via index position, C is expected at index[1] in the composite node so HWC would also fail without Batch being added, but batch is only added if there is no batch currently, otherwise the channel is repeated at index[1]

Im guessing all of this is a result of there not being an enforced format for what an image is through the pipeline.

So forcing conformance to either BCHW or BHWC would fix a lot of confusion with things like this.

@shawnington
Copy link
Contributor Author

shawnington commented Jul 29, 2024

I'll recap some discussion we had on the #backend-development Discord.

Ultimately, I think that the root cause of the issue here is the fact that a node is returning a MASK (i.e. a BHW tensor) when it says it's returning an IMAGE (i.e. a BHWC tensor). Unfortunately, as @shawnington said, this displays just fine in the Preview Image node which confuses end-users and makes people think it's a valid image.

In my opinion, the right long-term fix is to prevent nodes from returning malformed types. The back-end could perform validation on built-in types (e.g. for IMAGE, ensuring that they are BHWC tensors) and throw an error on any nodes that fail this. This would force the node authors not to ship buggy nodes like this.

In the shorter term (to maintain backward-compatibility), we could throw warnings in the console to encourage node authors to fix their bugs.

A band-aid fix like this PR certainly doesn't hurt anything and would improve the user experience, so I think it's a reasonable stop-gap solution, but it really is just a band-aid. While this change would make this one node work when given a MASK that's masquerading as an IMAGE, many other nodes will still fail with it.

If we start finding ourselves adding this logic to more nodes, I really think we should be making the more foundational fix.

I agree with this 100%, there is one slight error, masks are passed as numpy arrays, not tensors. The results of this "bug" or what many users would consider a bug since preview displays it properly, is converting mask into an image by converting it from a numpy array into a tensor either directly or with PIL, like follows.

image = torch.from_numpy(np.array(mask).astype(np.float32) / 255.0)

or

mask = Image.fromarray(mask, mode="L")
torch.from_numpy(np.array(mask).astype(np.float32) / 255.0)

which at first glance doesn't look problematic, and will display in image preview 100% correct. However, both of these result in malformed images that preview correctly.

however, the correct way would be with .convert("RGB")

mask = Image.fromarray(mask, mode="L").convert("RGB")
torch.from_numpy(np.array(mask).astype(np.float32) / 255.0)

Typical you would squeeze the mask down to HW then unsqueezed after .convert("RGB") then unsqueeze zero to get BCHW format in this way. (Although you are correct PIL does the BHWC format)

What further can confuse users is that any image nodes that convert the tensor to PIL to use PIL image operations such as:

image = image.filter(ImageFilter.GaussianBlur(blur_radius))

Will work perfectly fine with the malformed black and white tensor when converted to a PIL image. They will also convert back to a tensor perfectly fine and once again preview perfectly fine, and the user will have just successfully passed a malformed image format through a node that performs operations on it, without error, and now the actual source of the error is obfuscated from them, because when an error is thrown, it does not have any trace back to where the original malformed tensor was created.

The rest I agree with, this is a quick patch that only deals with things in nodes_mask.py

A longer term solution should be enforcing proper type for image. Perhaps having a function that is similar to what I proposed that is either a node_helper that is shown in the examples for creating a custom node, or is done as validation on all image outputs without the user having a say and ensures it conforms to expected BHWC format is best.

A third solution could be to prevent the preview node from displaying malformed images, and showing an error message of the way the image is malformed.

@shawnington
Copy link
Contributor Author

superseded by #4149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants