Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support raw Zlib Compressed data #139

Closed
T-RN-R opened this issue Nov 20, 2023 · 9 comments
Closed

Support raw Zlib Compressed data #139

T-RN-R opened this issue Nov 20, 2023 · 9 comments
Assignees
Labels
accepted This issue was accepted, we will work on this at some point enhancement New feature or request service-frankenstrings

Comments

@T-RN-R
Copy link

T-RN-R commented Nov 20, 2023

Describe the bug
Typically when compressing data with GZip, the data is packaged into an archive format that is easy to detect for AssemblyLine. However, there is also the option to useZlib directly to compress it into a raw data stream without any GZip, or other archive format, container to fingerprint it. AssemblyLine cannot detect the raw Zlib compression and it will not inflate it for further analysis.

To Reproduce
Steps to reproduce the behavior:

  1. Go to CyberChef
  2. Enter the EICAR Test string into the "Input" box.
  3. Drag the "Raw Deflate" operation into the "Recipe" box
  4. Save the output to a file.
  5. Upload the file to AssemblyLine

Expected behavior
AssemblyLine should be able to detect this as "Malicious" due to it being a compressed version of the EICAR string. Currently, the only tool within AL that detects this is Kaspersky. AssemblyLine itself is unable to inflate the file for further analysis, and instead relies solely on Kaspersky.

Screenshots
EICAR

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.4..0.78 (whatever version is running in prod)

Additional context
Zlib raw compression is used in many places, including in Python and .NET. Unpacking raw compressed data would enable further detection of malicious activity. Given the way AL currently classifies files based upon magic values among other things, maybe the change could involve taking any application/octet-stream that has an unknown
al_type after trying the existing classifiers, and then attempt to raw inflate it with ZLib.

@T-RN-R T-RN-R added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Nov 20, 2023
@gdesmar
Copy link

gdesmar commented Nov 20, 2023

I'm sorry to be bearer of bad news, but I don't think this can be fixed. This is very probably a problem with CyberChef, and more specifically this issue: gchq/CyberChef#671.
If you use the "zip" operation in CyberChef, you will receive a valid zip file, but if you password protect that zip file using CyberChef, no tool that I've tried is able to read it. I'm assuming this is the same zlib.js dependency that is causing issue here.
On the upside, we already have support for zlib stream in Assemblyline. It should be identified as archive/zlib whenever the mimetype is identified as application/zlib. Here is an example from VirusTotal: https://www.virustotal.com/gui/file/3df85e947d6a7f05a4ca9008e44db6c814722378b178531d2bf8b29ebfce3bff where the VirusTotal File Type is zlib. Do you have the sha256 of the zlib'd EICAR string that CyberChef gave you, and if so, are you able to find it in VirusTotal? I would be curious how they would identify that file.

I tried to encode a different string in CyberChef, and use the python zlib library to read it (as I'm doing with that previous sample) and it is erroring out. If you can use any tool other than CyberChef to read the resulting zlib stream, I would be glad to investigate the difference.

I will close this issue for the moment, but we can still discuss about it, and if you do have another tool that reads it, you should completely re-open it!! :)

@gdesmar gdesmar closed this as completed Nov 20, 2023
@T-RN-R
Copy link
Author

T-RN-R commented Nov 20, 2023

Sorry, I wasn’t entirely clear, mostly because I was confusing the formats at hand. This StackOverflow post details what I’m talking about nicely:

https://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check

in this case, I’m talking about the “deflate” format, not the Zlib or Gzip formats. I got them all mixed up. AssemblyLine cannot detect the deflate format because it is header-less. You need to actually try to inflate it to detect it properly. There are several libraries that support this format, including .NET’s System.Io.Compression namespace and Python’s zlib (see StackOverflow post).

This issue has nothing to do with CyberChef, I was just using it as a quick example for producing an EICAR case, but you can produce it with the Python library or in .NET no problem.

The output from CyberChef should still be a valid test regardless, as I don’t think that dependency is affecting this case.

@T-RN-R
Copy link
Author

T-RN-R commented Nov 21, 2023

It turns out the MIME for this type is ‘ application/x-deflate’

@gdesmar gdesmar reopened this Nov 21, 2023
@gdesmar
Copy link

gdesmar commented Nov 21, 2023

Could you point me toward a sample that have a mimetype of application/x-deflate ?
I generated a sample from CyberChef and it is a application/octet-stream like your first message was saying.

@gdesmar
Copy link

gdesmar commented Nov 21, 2023

I was able to recover the original content with zlib.decompress(data, -15), but considering that we can get very large random blob of bytes, it would probably be a bad idea to try each application/octet-stream just in case. Having application/x-deflate as the mimetype would be a clear solution for Identification.

@T-RN-R
Copy link
Author

T-RN-R commented Nov 21, 2023

application/x-deflate is something that an application has to declare. So an HTTP server could tell an HTTP client that it is using application/x-deflate . In cases of malicious software, it would not come declared as x-deflate. You would need a way to "fingerprint" like AL already does with ZIP, GZIP and other archive formats. Since there are no headers, there is no way to a-priori know that it is x-deflate without trying to inflate it. Its irrelevant whether a sample is marked as application/x-deflate, as it is purely a friendly way for servers and clients to indicate the format of their communications. zlib.decompress(data, -15) gives you a application/x-deflate output, it just doesn't have the MIME with it. That MIME is the proper way (by spec) to refer to this, but I have seen no instances of it actually being used in the wild.

Maybe instead of inflating all application/octet-streams, maybe there is some form of heuristic that could be applied, maybe there are some entropy-based heuristics that could be applied? Or maybe a size cap placed on the attempts? I'm not sure how frequently AssemblyLine comes across unknown application/octet-stream samples, but this is a huge gap in analysis capability, given that this is a common feature in some malware strains.

@gdesmar
Copy link

gdesmar commented Nov 21, 2023

Could you point out a sample that does use this? Is this file dropped by a javascript file? Is it contained in a iso image?

@gdesmar
Copy link

gdesmar commented Nov 21, 2023

We've discussed this offline and figured out that the stream comes out of a base64 encoded blob already extracted by FrankenStrings. FrankenStrings should test to see if the content of the blob is deflated and could decompress it before being re-uploaded.

@gdesmar gdesmar assigned cccs-jh and unassigned gdesmar Nov 21, 2023
@gdesmar gdesmar added enhancement New feature or request service-frankenstrings accepted This issue was accepted, we will work on this at some point and removed bug Something isn't working assess We still haven't decided if this will be worked on or not labels Nov 21, 2023
@cccs-jh
Copy link

cccs-jh commented Dec 6, 2023

This is now implemented in FrankenStrings v4.4.0.stable22.

@cccs-jh cccs-jh closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This issue was accepted, we will work on this at some point enhancement New feature or request service-frankenstrings
Projects
None yet
Development

No branches or pull requests

3 participants