-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete fix of Issue #206 makes use-after-free still possible #248
Comments
|
Hello @pete4abw, thank you for the quick reaction! I want to mention that this is a continuation of #206 (a concurrent UAF), so to stably reproduce the issue we need to follow the same procedure as in #206, that is:
Another potential obstacle is that you might see "fatal exit" due to "MD5 CHECK FAILED" (as also shown in your above log):
This "fatal exit" happens at this line, resulting in a program termination before runzip_fd() can return and clear_rulist() can be executed. But this error can be bypassed in the following ways; Method 1. The "Stored" MD5 value is at the end of the input archive (i.e., POC.lrz), to avoid the MD5 check failure we can just replace the stored MD5 value to the "correct" value ( Actually I have already replaced the MD5 value when preparing POC.lrz, based on the Method 2. Though I haven't tested it, I envision that we might also be able to turn off the MD5 check (e.g., NO_MD5 at this line), which is controlled by some flags in the "control" structure. Method 3. Alternatively, I also observed that fatal_return() in lrzip will only terminate the program (e.g., exit()) when Following the above procedures (I used Method 1 to avoid MD5 failure), I can reliably reproduce the concurrency UAF bug locally. After viewing your comments, I have also tried to re-run the same procedure on a different machine with the same POC.lrz (w/o even modifying the MD5 value), and triggered the UAF again. I attached the detailed command log below:
IMHO the root cause of the issue seems quite clear from the code audit as described previously, I'm happy to provide more details and try my best to help fix the issue if you think it's necessary (I personally think we'd better fix the race condition to avoid this whole class of concurrency UAF issues). Thank you for your help! |
@bugbot97 . If you spent as much time solving bugs instead of creating artificial ones, your efforts would be appreciated. You are injecting a bug into the program and causing it to abend. The program fails as it should when compiled and executed without mods. If you examine the verbose output,
|
Hello @pete4abw , I want to stress again that I have not "artificially" injected any bugs during the whole process. As explained before, the injected delays (i.e., sleep(1)) merely intend to help reproduce the bug more efficiently, and delay injection is an extremely widely used approach to detect concurrency bugs. Please consider the fact that during its execution, one thread could be preempted at any time and paused for an uncertain amount of time, creating many different possible thread interleavings - some of which could trigger bugs (e.g., even we do not inject any delays, the thread could be preempted and paused for 1s before the said two locations - it's just that this may not happen every time, for example in your above run). The delay injection is just a method to reliably reproduce a specific bug-triggering thread interleaving (otherwise we may need to try many times to hit a specific thread interleaving to trigger a bug). In other words, given the popularity of lrzip, there are so many users using it on different machines every day, it's almost for sure that they will encounter all different kinds of thread interleaving, including those ones that can trigger the UAF bug as mentioned before. Yes, in your above experiment the UAF is not triggered (because the execution happens to not hit the required thread interleaving, which is also why there are error messages like "Unable to decompress in parallel"), but it doesn't mean that other executions of lrzip will not trigger the bug by running into a different thread interleaving. Shouldn't a widely-used utility like lrzip be error-proof for all possible thread interleaving that users may encounter? I'm also trying to solving bugs, but before solving any bugs, the first step is to understand the bugs and confirm with you, right? I spent much time in understanding the nature of these issues, which is unfortunately accused by you of "creating artificial bugs". But I still think that the bug is there, harmful thread interleavings are not fundamentally prevented by design in lrzip, and the UAF (or other concurrency issues) will eventually happen. I also tried to propose a potential fix in the first post (e.g., introduce pthread_join() in clear_rulist()), but as mentioned, before being able to solve bugs, the bug itself is questioned. |
Dear developers,
Issue #206 reported a UAF bug triggered with the following flow:
Then the previous fix of #206 is basically to postpone the invocation of clear_rulist() until runzip_fd() returns. This fix works for that specific POC provided in #206, because after the 1st invocation of runzip_chunk() by runzip_fd(), that POC will lead to the 2nd invocation of runzip_chunk() at the same site within the do...while loop, and during this 2nd invocation that POC causes a fatal_return , so that the program terminates before runzip_fd() gets a chance to return and execute clear_rulist().
But I want to mention that such a fix could be incomplete, because we can modify the original POC in a way that after the 1st invocation of runzip_chunk(), the runzip_fd() will exit the do...while loop (e.g., by modifying the "expected_size") and go ahead to return to its caller site and execute the now postponed clear_rulist(), at this point, if those delayed stream-reading threads resume, the UAF will happen similarly as before. I tried to modify the original POC as aforementioned and triggered the UAF again in the latest 465afe8:
The updated POC: POC.lrz
The reproduction instructions are the same as in #206
The ASAN report I got:
I have read your previous comments and understand your concerns about these bug reports related to malformed program input (e.g., POC), so I want to make some additional comments to help fix the issue:
Though we can employ some validation procedures about the input archive (e.g., ps2lrz), we may not expect every lrzip user to perform these additional checks, and the possibility is that a malicious actor may lure a normal user to uncompress some malformed packets to perform the attack (this lies behind many real-world attacks). So I think it might still be better to fundamentally fix the race condition here - this brings my considerations for potential fixes in the following point:
From my understanding, the root cause of these issues is that, when we try to free "ucthreads" in clear_rulist(), there is no guarantee that all the previously created threads have finished. So I want to propose to add pthread_join() for all threads recorded in control->pthreads before the free of "ucthreads" in clear_rulist() at here, but I'm not sure whether this is a proper fix (e.g., may lead to performance concerns), so I haven't made a PR about it.
Thank you for your time!
The text was updated successfully, but these errors were encountered: