-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AddressSanitizer: allocator is out of memory #3952
Comments
It's out of memory when trying to allocate 16GB. Could it just be that your machine didn't have enough free RAM? |
But I think the point is that it only looks like a 16GB image because it's a corrupted file with an absurd resolution. There is an Question 1: Should it be a lower default? It was hard to know the tradeoff between catching allocation errors from corrupt files, versus "false negative" cases of rejecting legitimately huge images. But maybe the 32GB default (for a single image, remember) is still drastically too big considering how commonly people will run on VMs with lower memory limits than this. For comparison, a 32k x 32k RGBA uint16 image is 8GB uncompressed.) Question 2: This limit attribute is informational only, it's up to each format reader to check the allocation size against this limit. PNG currently does not! So that should be fixed -- but note that even if we fixed it today, it would happily let the allocation happen if the default is 32GB. Crazy idea: What if instead of defaulting this limit to a flat 32GB, we set it to |
Sanity check so that there's a clean early out is nice, especially when the machine does have TBs of RAM but we don't want to wait hours for it to try to process a corrupt image. But I don't think Maybe we need to be strategic and find the few places where we are likely to allocate GBs of data (anywhere that allocates all the pixels?) and check if the allocation passed? |
May be relevant: https://stackoverflow.com/questions/18684951/how-and-why-an-allocation-memory-can-fail I think the intent of limits:imagesize_MB is actually not to catch or recover from bad allocations (which is both difficult to catch everywhere in the code base, and as noted in the link above, may not work the way you think on modern systems). Rather, it's intending to catch corrupted or malicious files that have implausible resolutions. Like you said, even if a fishy allocation "succeeds", it may be so large that it's effectively a DOS attack. Ideally, every reader should do this check. In practice, I implemented it for TIFF (adding limits:imagesize_MB in the process) but only fixed that one error I was facing at the time (very similar to this one -- the result of fuzz testing TIFF files) and never quite got around to revisiting all the other readers. |
Side note: the POC file here is 21 bytes, so this is not in fact a large image, it is a corrupted/malicious header masquerading an image file, and the resolution it describes is implausibly large. |
Yeah, I've always interpreted that limit as just a safety net for "zip bombs" (a small file, 21 bytes here, expanding out to several gb of memory): https://en.wikipedia.org/wiki/Zip_bomb Writing software that properly handles |
On my laptop, with plenty of memory for a 16GB allocation, it still fails very quickly with:
So at least it does detect in other ways that the file is bad before getting into too much trouble... unless the allocation fails. It's hard to tell from this error message exactly which reader caused the problem (because there is no file extension and it's not valid anyway, it will try them all). Turns out the bit pattern of the file looks like a Targa file, and that's the reader that hits the EOF while reading the pixels. |
That error message I get is not a crash, by the way. It's a graceful exit with error message. |
Notice how the stack trace from address sanitizer says it failed INSIDE operator new. Thus, the application (a) never got back a null, (b) nor did it fail to catch a bad_alloc exception. It just dies inside new, so there is nothing we can do in our code to handle the problem if the allocation fails, because control never returns to us. We have to detect the allocation size as implausible beforehand and just say no. Back to my suggestions: I feel like lowering the default is an independent question -- it's valid to ask if 32GB is too high a default, but at the same time, if we lowered to 16GB, there certainly exists a VM somewhere where even that can fail. (We have to ask ourselves, though, if we care.) I still kinda like my suggestion about lowering the default to be at most some fraction of physical memory size to try to catch it regardless of how constrained a system it is. |
I have a patch to suggest, coming shortly. |
Proposed fix: #3954 |
That error message is a graceful exit when running under address sanitizer because address sanitizer has instrumented the code to look for nullptr returned from new and added code to gracefully exit. The real OIIO code would likely have crashed once it tried to access this null pointer unless it already is handling this case. I still think there's nothing we can really do for this particular issue short of adding null checks after the allocation, which sometimes will help and other times won't for the reasons lg linked to already. Machines with enough memory, like lg's will be able to do the allocation. Machines without enough memory will crash. Tweaking |
As it turns out, it wouldn't have crashed outside the sanitizer (I think???), because I found the exact site of the You are right, the physical_memory/2 trick is a little hokey and can backfire on people with 4GB of RAM as you say. But how often is that really the case, the combination of huge image, tiny memory, and user/app that doesn't proactively raise the attribute to reflect the size of image they expect? If you or others don't think that changing the default based on the physical memory is a good idea, I can back out of that aspect of the PR. But I'm not sure what is the best thing to do in this case. I hear what you're saying about the crash in new being an artifact of address sanitizer. The dilemma, though, is that there are people out there running fuzz tests against the software with the sanitizer, and will report these crashes here (or file CVE's) when these failures occur, unaware that they are seeing a crash that can only happen for the address sanitizer. Yet each one will still require us to track it down and verify that without the sanitizer in place, we correctly catch the failed allocation. So I still want a way, even in the sanitizer case, to find a likely-to-be-mistaken image size ahead of the allocation in order to stem the tide of these pseudo-bugs being reported. |
I can imagine a user app is oiiotool or maketx. Do these tools proactively raise the attribute? If it requires a special flag, that's extra complexity and chances are a user that doesn't have huge amounts of RAM probably is not a power user and so will have an extra hard time knowing how to pass the appropriate flags to overcome this. Doubly so if these tools are being wrapped by a script/GUI. As for user/machine, it's not that rare for non-professional users, such as students, to have a modest machine. 8GB of RAM until recently was the default for most mac laptops. So half of that is 4GB. I'll admit 4GB is still a pretty big and rare image, so perhaps the majority of users would still be fine with this limitation. So I won't fight against the divide by 2 proposal if you think it's needed. For the fuzz tests, it would not surprise me if eventually we get people running them on VMs that have a few GBs of RAM, no swap, and have much less than half of their RAM still free. They'll report issues which none of us will be able to repro on our production hardware and we'll spend extra time trying to figure out what is going on. I do like your suggestion in the PR of checking if address sanitizer is running. I suggest though getting very aggressive. Don't divide by 2. Divide by 8. You can do this with |
I was thinking of "don't divide by 2" also, but by just using the physical memory itself. The rationale is that it gives more tolerance for users with small mem computers, and it still signifies an important boundary -- if they try to read images larger than this into memory, at the very least they will see swapping and a huge reduction in performance. The error message itself SAYS why it's rejecting the file and the name of the attribute to set to make the error go away if they really do intend to have files that large. I'd like to think (but could obviously be wrong) that the combo of { small machine, not a power user } might be fairly unlikely to have multi-GB images that aren't mistakes. Remember, for reference, that a 16k x 16k x RGBA x half image is still only 2 GB. The GitHub runners have 8GB VMs, BTW. So I think the choices we have before us are:
Allocations that "succeed" but lead to terrible performance aren't really desired anyway. I still think there's merit to the default size for the sanity check size to be somehow tied to the physical memory, but maybe 0.5 was too aggressive? |
Option 2 might prevent useful fuzzing tests where we want to detect bugs when exceeding 32-bit counters. If the machine has enough memory, they should be allowed to test that. So option 3 or 4 are the ones I vote for since it avoids the bulk of the out of memory cases while still letting those with lots of RAM to use OIIO without noticeable limitations. For 4, I would use the 0.5 multiplier or smaller. For 3 (real users) I would maybe suggest 1.0 so that we're a little bit more lenient. |
I'm currently leaning toward 3, but raising the fraction to 1, so that the only cases where we might have a false positive for small-memory users are ones where they will end up swapping anyway. We can always revisit if this is causing problems for people. |
I'm making that change and will update the PR shortly. Just to reiterate a very subtle point: Just because a full uncompressed image is larger than available memory doesn't necessarily mean that it couldn't be processed in certain ways by OIIO (consider the case of carefully reading, processing, and writing one scanline at a time, so the working set can be tiny). This limit is not "how big an image is too big to process," but rather it means "how big is implausible to be a 'real' image rather than a corrupted one." There is no correct answer to that question, it's a heuristic that we are using to catch many corrupt images while having very few false positives. And we're assuming that users are unlikely to be encountering legit images that exceed the physical memory of the machine they have chosen for the task, unless they are aware that they are doing something unusual and raising the limit. |
Fixes #3952 Several interrelated fixes to guard against corrupted files: * Extend the `limits:imagesize_MB` sanity check to Targa files to address #3952 specifically, and check in that failing example file into the testsuite. (N.B.: Only a couple readers use this, TIFF and now Targa. it should be extended to the others. Mackerel!) * Improve the error message used for the above, both for Targa and TIFF. * For low-memory situations, perhaps the default of 32GB is too large. Make the default be the minimum of 32GB and 1/2 the size of physical memory on the machine. (An app that legit expects humongous files is expected to raise the limit appropriately.) * The generic "Read error: hit end of file" we sometimes issued from `ImageInput::ioread()` was hard to discern which reader it came from, in cases where the file extension was absent or misleading and a succession of files were tried. By changing this error message to say which format reader triggered the error, it might make these errors easier to track down in the future. This also changed the reference output of a few test involving truncated or corrupted files. --------- Signed-off-by: Larry Gritz <[email protected]>
…dation#3954) Fixes AcademySoftwareFoundation#3952 Several interrelated fixes to guard against corrupted files: * Extend the `limits:imagesize_MB` sanity check to Targa files to address AcademySoftwareFoundation#3952 specifically, and check in that failing example file into the testsuite. (N.B.: Only a couple readers use this, TIFF and now Targa. it should be extended to the others. Mackerel!) * Improve the error message used for the above, both for Targa and TIFF. * For low-memory situations, perhaps the default of 32GB is too large. Make the default be the minimum of 32GB and 1/2 the size of physical memory on the machine. (An app that legit expects humongous files is expected to raise the limit appropriately.) * The generic "Read error: hit end of file" we sometimes issued from `ImageInput::ioread()` was hard to discern which reader it came from, in cases where the file extension was absent or misleading and a succession of files were tried. By changing this error message to say which format reader triggered the error, it might make these errors easier to track down in the future. This also changed the reference output of a few test involving truncated or corrupted files. --------- Signed-off-by: Larry Gritz <[email protected]>
version: 2.4 12.0
./bin/iconvert crash_000122 1.png
poc:
crash_000122.zip
The text was updated successfully, but these errors were encountered: