Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete fix of Issue #206 makes use-after-free still possible #248

Open
bugbot97 opened this issue Oct 1, 2023 · 4 comments
Open

Incomplete fix of Issue #206 makes use-after-free still possible #248

bugbot97 opened this issue Oct 1, 2023 · 4 comments

Comments

@bugbot97
Copy link

bugbot97 commented Oct 1, 2023

Dear developers,

Issue #206 reported a UAF bug triggered with the following flow:

  1. runzip_fd() calls runzip_chunk() for the 1st time at here, the latter then creates multiple threads to read in the stream, however, some threads may be delayed (i.e., fill_buffer() only waits for the completion of the immediate "unext_thread" but not all the created threads: see here).
  2. Before the 1st invocation of runzip_chunk() returns, it calls clear_rulist(), which frees the shared "ucthreads". If at this point those delayed threads in step 1 resume, they will access the now freed "ucthreads", causing use-after-free.

Then the previous fix of #206 is basically to postpone the invocation of clear_rulist() until runzip_fd() returns. This fix works for that specific POC provided in #206, because after the 1st invocation of runzip_chunk() by runzip_fd(), that POC will lead to the 2nd invocation of runzip_chunk() at the same site within the do...while loop, and during this 2nd invocation that POC causes a fatal_return , so that the program terminates before runzip_fd() gets a chance to return and execute clear_rulist().

But I want to mention that such a fix could be incomplete, because we can modify the original POC in a way that after the 1st invocation of runzip_chunk(), the runzip_fd() will exit the do...while loop (e.g., by modifying the "expected_size") and go ahead to return to its caller site and execute the now postponed clear_rulist(), at this point, if those delayed stream-reading threads resume, the UAF will happen similarly as before. I tried to modify the original POC as aforementioned and triggered the UAF again in the latest 465afe8:

The updated POC: POC.lrz
The reproduction instructions are the same as in #206
The ASAN report I got:

==2111279==ERROR: AddressSanitizer: heap-use-after-free on address 0x6100000000a8 at pc 0x5559426ff61d bp 0x7f7cc32fdd60 sp 0x7f7cc32fdd50
READ of size 8 at 0x6100000000a8 thread T3
[OK] - 219 bytes
Total time: 00:00:01.02
    #0 0x5559426ff61c in zpaq_decompress_buf ./lrzip/stream.c:433
    #1 0x555942708c82 in ucompthread ./lrzip/stream.c:1566
    #2 0x7f7cc749c608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477
    #3 0x7f7cc7073132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)

0x6100000000a8 is located 104 bytes inside of 192-byte region [0x610000000040,0x610000000100)
freed by thread T0 here:
    #0 0x7f7cc763940f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
    #1 0x5559426e2421 in clear_rulist ./lrzip/lrzip.c:786
    #2 0x5559426e42a0 in decompress_file ./lrzip/lrzip.c:958
    #3 0x5559426db193 in main ./lrzip/main.c:720
    #4 0x7f7cc6f78082 in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
    #0 0x7f7cc7639a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0x5559427041b8 in open_stream_in ./lrzip/stream.c:1101
    #2 0x5559426f9e4a in runzip_chunk ./lrzip/runzip.c:309
    #3 0x5559426fa8eb in runzip_fd ./lrzip/runzip.c:387
    #4 0x5559426e426c in decompress_file ./lrzip/lrzip.c:951
    #5 0x5559426db193 in main ./lrzip/main.c:720
    #6 0x7f7cc6f78082 in __libc_start_main ../csu/libc-start.c:308

Thread T3 created by T0 here:
    #0 0x7f7cc7566815 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cc:208
    #1 0x5559426fcfcf in create_pthread ./lrzip/stream.c:125
    #2 0x55594270ad78 in fill_buffer ./lrzip/stream.c:1725
    #3 0x55594270bca5 in read_stream ./lrzip/stream.c:1812
    #4 0x5559426f8b22 in unzip_literal ./lrzip/runzip.c:162
    #5 0x5559426fa056 in runzip_chunk ./lrzip/runzip.c:325
    #6 0x5559426fa8eb in runzip_fd ./lrzip/runzip.c:387
    #7 0x5559426e426c in decompress_file ./lrzip/lrzip.c:951
    #8 0x5559426db193 in main ./lrzip/main.c:720
    #9 0x7f7cc6f78082 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ./lrzip/stream.c:433 in zpaq_decompress_buf
Shadow bytes around the buggy address:
  0x0c207fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c207fff8010: fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd
  0x0c207fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==2111279==ABORTING

I have read your previous comments and understand your concerns about these bug reports related to malformed program input (e.g., POC), so I want to make some additional comments to help fix the issue:

  1. Though we can employ some validation procedures about the input archive (e.g., ps2lrz), we may not expect every lrzip user to perform these additional checks, and the possibility is that a malicious actor may lure a normal user to uncompress some malformed packets to perform the attack (this lies behind many real-world attacks). So I think it might still be better to fundamentally fix the race condition here - this brings my considerations for potential fixes in the following point:

  2. From my understanding, the root cause of these issues is that, when we try to free "ucthreads" in clear_rulist(), there is no guarantee that all the previously created threads have finished. So I want to propose to add pthread_join() for all threads recorded in control->pthreads before the free of "ucthreads" in clear_rulist() at here, but I'm not sure whether this is a proper fix (e.g., may lead to performance concerns), so I haven't made a PR about it.

Thank you for your time!

@pete4abw
Copy link
Contributor

pete4abw commented Oct 2, 2023

$ lrzip -t POC.lrz
Decompressing...
Inconsistent length after decompression. Got 0 bytes, expected 205600%  
Inconsistent length after decompression. Got 0 bytes, expected 2056     6:100%  
Inconsistent length after decompression. Got 0 bytes, expected 2056
Inconsistent length after decompression. Got 0 bytes, expected 2056
89932%  196952.00 /    219.00 
Average DeCompression Speed:  0.000MB/s
MD5 CHECK FAILED.
Stored:fc98427843f47beb2510386d42d7fb64
Output file:0428bb8c2dcc7f1abc10f714729caf4c
Fatal error - exiting

@bugbot97
Copy link
Author

bugbot97 commented Oct 2, 2023

Hello @pete4abw, thank you for the quick reaction! I want to mention that this is a continuation of #206 (a concurrent UAF), so to stably reproduce the issue we need to follow the same procedure as in #206, that is:

  1. Inject delays (e.g., sleep(1)) to two locations in the program (one in clear_rulist() - the free site, and another in zpaq_decompress_buf() - the use site, the details can be seen in Multiple concurrency UAF bug between zpaq_decompress_buf() and clear_rulist() function #206 ) to make the vulnerable thread interleaving more likely to happen. Note that such delay injection simply intends to reproduce this concurrent UAF issue more stably - without this injection, the bug can still be triggered but maybe in an unstable way (e.g., once in every ten runs - there is randomness in thread schedule and interleaving as we know).

  2. Compile the program and uncompress the POC with multiple threads (since this is a concurrent UAF), as done in Multiple concurrency UAF bug between zpaq_decompress_buf() and clear_rulist() function #206:

./autogen.sh
CC=gcc CXX=g++ CFLAGS="-g -O0 -fsanitize=address" CXXFLAGS="-g -O0 -fsanitize=address" ./configure --enable-static-bin
make
./lrzip -t -p2 POC.lrz

Another potential obstacle is that you might see "fatal exit" due to "MD5 CHECK FAILED" (as also shown in your above log):

MD5 CHECK FAILED.
Stored:fc98427843f47beb2510386d42d7fb64
Output file:0428bb8c2dcc7f1abc10f714729caf4c
Fatal error - exiting

This "fatal exit" happens at this line, resulting in a program termination before runzip_fd() can return and clear_rulist() can be executed. But this error can be bypassed in the following ways;

Method 1. The "Stored" MD5 value is at the end of the input archive (i.e., POC.lrz), to avoid the MD5 check failure we can just replace the stored MD5 value to the "correct" value (Output file:0428bb8c2dcc7f1abc10f714729caf4c as shown above), this can be done with a hex editor like "xxd".

Actually I have already replaced the MD5 value when preparing POC.lrz, based on the Output file hash I observed on my machine (e.g., Stored:fc98427843f47beb2510386d42d7fb64), but I guess there may be some dynamics in the uncompression process, so that in your experiment the output file of POC.lrz uncompression is different than I had, resulting in a different MD5 value.

Method 2. Though I haven't tested it, I envision that we might also be able to turn off the MD5 check (e.g., NO_MD5 at this line), which is controlled by some flags in the "control" structure.

Method 3. Alternatively, I also observed that fatal_return() in lrzip will only terminate the program (e.g., exit()) when control->library_mode is not set (see this line), otherwise it will simply return "-1" without terminating. If there are scenarios that library_mode is set (I guess when lrzip is used as an embedded library of another host program), all fatal_return() will become "return -1" and then runzip_fd() will return and trigger the clear_rulist().

Following the above procedures (I used Method 1 to avoid MD5 failure), I can reliably reproduce the concurrency UAF bug locally. After viewing your comments, I have also tried to re-run the same procedure on a different machine with the same POC.lrz (w/o even modifying the MD5 value), and triggered the UAF again. I attached the detailed command log below:

bugbot97@matrix:/data/lrzip$ ./lrzip -t -vvv -p2 POC.lrz
Threading is ENABLED. Number of CPUs detected: 2
Detected 67501146112 bytes ram
Compression level 7
Nice Value: 19
Show Progress
Max Verbose
Test file integrity
Temporary Directory set as: ./
Created temporary outfile ./lrzipout.0x3nRr
Detected lrzip version 0.6 file.
MD5 being used for integrity testing.
Decompressing...
Reading chunk_bytes at 24
Expected size: 219
Chunk byte width: 2
Reading eof flag at 25
EOF: 1
Reading expected chunksize at 26
Chunk size: 53248
Reading stream 0 header at 29
Reading stream 1 header at 36
Reading ucomp header at 43
Fill_buffer stream 0 c_len 26 u_len 65493 last_head 60
Starting thread 0 to decompress 26 bytes from stream 0
Thread 0 decompressed 65493 bytes from stream 0
Taking decompressed data from thread 0
Reading ucomp header at 43
Fill_buffer stream 1 c_len 26 u_len 65493 last_head 60
Starting thread 1 to decompress 26 bytes from stream 1
Reading ucomp header at 89
Thread 1 decompressed 65493 bytes from stream 1
Fill_buffer stream 1 c_len 2056 u_len 2056 last_head 2032
Starting thread 2 to decompress 2056 bytes from stream 1
Reading ucomp header at 2061
Fill_buffer stream 1 c_len 2056 u_len 2056 last_head 2056
Starting thread 3 to decompress 2056 bytes from stream 1
Taking decompressed data from thread 1
Closing stream at 4123, want to seek to 4234

Average DeCompression Speed:  0.000MB/s
MD5: fc98427843f47beb2510386d42d7fb64
=================================================================
==1516389==ERROR: AddressSanitizer: heap-use-after-free on address 0x6100000000a8 at pc 0x555555596332 bp 0x7ffff33fdd60 sp 0x7ffff33fdd50
(this part is the same ASAN report as I attached before)

IMHO the root cause of the issue seems quite clear from the code audit as described previously, I'm happy to provide more details and try my best to help fix the issue if you think it's necessary (I personally think we'd better fix the race condition to avoid this whole class of concurrency UAF issues). Thank you for your help!

@pete4abw
Copy link
Contributor

pete4abw commented Oct 2, 2023

@bugbot97 . If you spent as much time solving bugs instead of creating artificial ones, your efforts would be appreciated. You are injecting a bug into the program and causing it to abend. The program fails as it should when compiled and executed without mods. If you examine the verbose output, lrzip and lrzip-next both complain that data cannot be decompressed in parallel. I don't see the problem, and certainly would not entertain spending any time on one that is artificial.

Reading expected chunksize at 26
Chunk size: 53,248
Reading stream 0 header at 29
Reading stream 1 header at 36
Reading ucomp header at 43
Fill_buffer stream 0 c_len 26 u_len 65,493 last_head 60
Starting thread 0 to decompress 26 bytes from stream 0
Thread 0 decompressed 65,493 bytes from stream 0
Taking decompressed data from thread 0
Reading ucomp header at 43
Fill_buffer stream 1 c_len 26 u_len 65,493 last_head 60
Starting thread 1 to decompress 26 bytes from stream 1
Reading ucomp header at 89
Fill_buffer stream 1 c_len 2,056 u_len 2,056 last_head 2,032
Starting thread 2 to decompress 2,056 bytes from stream 1
Thread 1 decompressed 65,493 bytes from stream 1
Reading ucomp header at 2,061
Fill_buffer stream 1 c_len 2,056 u_len 2,056 last_head 2,056
Starting thread 3 to decompress 2,056 bytes from stream 1
Reading ucomp header at 2,085
Fill_buffer stream 1 c_len 2,056 u_len 2,056 last_head 2,072
Inconsistent length after decompression. Got 0 bytes, expected 2,056
Unable to decompress in parallel, waiting for previous thread to complete before trying again
Starting thread 4 to decompress 2,056 bytes from stream 1
Reading ucomp header at 2,101
Fill_buffer stream 1 c_len 2,056 u_len 2,056 last_head 0
Starting thread 5 to decompress 2,056 bytes from stream 1
Taking decompressed data from thread 1
Inconsistent length after decompression. Got 0 bytes, expected 2,056
Unable to decompress in parallel, waiting for previous thread to complete before trying again
Inconsistent length after decompression. Got 0 bytes, expected 2,056
Unable to decompress in parallel, waiting for previous thread to complete before trying again
Inconsistent length after decompression. Got 0 bytes, expected 2,056
Unable to decompress in parallel, waiting for previous thread to complete before trying again

Closing stream at 4,163, want to seek to 8,360
Average DeCompression Speed: 0.000MB/s
MD5 CHECK FAILED.
Stored: fc98427843f47beb2510386d42d7fb64
Decompressed hash: 0428bb8c2dcc7f1abc10f714729caf4c
Corrupt Decompression.
Fatal error - exiting

@bugbot97
Copy link
Author

bugbot97 commented Oct 3, 2023

Hello @pete4abw , I want to stress again that I have not "artificially" injected any bugs during the whole process. As explained before, the injected delays (i.e., sleep(1)) merely intend to help reproduce the bug more efficiently, and delay injection is an extremely widely used approach to detect concurrency bugs. Please consider the fact that during its execution, one thread could be preempted at any time and paused for an uncertain amount of time, creating many different possible thread interleavings - some of which could trigger bugs (e.g., even we do not inject any delays, the thread could be preempted and paused for 1s before the said two locations - it's just that this may not happen every time, for example in your above run).

The delay injection is just a method to reliably reproduce a specific bug-triggering thread interleaving (otherwise we may need to try many times to hit a specific thread interleaving to trigger a bug). In other words, given the popularity of lrzip, there are so many users using it on different machines every day, it's almost for sure that they will encounter all different kinds of thread interleaving, including those ones that can trigger the UAF bug as mentioned before. Yes, in your above experiment the UAF is not triggered (because the execution happens to not hit the required thread interleaving, which is also why there are error messages like "Unable to decompress in parallel"), but it doesn't mean that other executions of lrzip will not trigger the bug by running into a different thread interleaving. Shouldn't a widely-used utility like lrzip be error-proof for all possible thread interleaving that users may encounter?

I'm also trying to solving bugs, but before solving any bugs, the first step is to understand the bugs and confirm with you, right? I spent much time in understanding the nature of these issues, which is unfortunately accused by you of "creating artificial bugs". But I still think that the bug is there, harmful thread interleavings are not fundamentally prevented by design in lrzip, and the UAF (or other concurrency issues) will eventually happen. I also tried to propose a potential fix in the first post (e.g., introduce pthread_join() in clear_rulist()), but as mentioned, before being able to solve bugs, the bug itself is questioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants