Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial integration of jq. #4980

Merged
merged 8 commits into from
Jul 11, 2023
Merged

Conversation

DavidKorczynski
Copy link
Collaborator

@DavidKorczynski DavidKorczynski commented Jan 15, 2021

Initial integration of jq. jq is a lightweight and flexible command-line JSON processor. It's a widely used command-line for handling json processing, and has more than 25K stars on Github. In essence am not sure which corporations use it as such (but I would assume many), but I use it frequently and would place it in some form of position similar to Binutils from a user perspective. It is, in essence, the sed command in the JSON world.


@jonathanmetzman @oliverchang this one is ready for review!

Signed-off-by: David Korczynski <[email protected]>
@nicowilliams
Copy link

Do you need to specify that the inputs are JSON? Do you need to specify an initial test corpus?

Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
@nicowilliams
Copy link

How does this know what sorts of inputs to start with?

@nicowilliams
Copy link

Also, I sent you gmail addresses as well.

Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
@DavidKorczynski
Copy link
Collaborator Author

Do you need to specify that the inputs are JSON? Do you need to specify an initial test corpus?

We don't need it for now. I would prefer to leave it as is and let the fuzzer engine explore the codebase.

How does this know what sorts of inputs to start with?

In short: It starts with basically a NULL byte. The fuzzing engine then relies on mutating and adding arbitrary bytes to the seed, while consecutively running the seed against the target and measuring the code coverage using relevant instrumentation. If a given seed is determined to trigger new code coverage the seed is saved in a seed database. Thus, the seed database will contain a set of inputs that each trigger unique code paths. In this sense, it's a genetic mutational algorithm.

After running the fuzzer for 180 seconds, this is the code coverage achieved:
Screenshot 2023-07-10 210535

OSS-Fuzz will run it for a lot longer than 180 seconds, and will make code coverage reports available as well. As such, we can assess after a few days what's the missing code to be analysed and adjust accordingly using techniques e.g. (1) add a corpus; (2) add a dictionary; (3) add a new fuzzer; (4) modify the existing fuzzer.

Signed-off-by: David Korczynski <[email protected]>
@DavidKorczynski DavidKorczynski marked this pull request as ready for review July 10, 2023 20:16
@nicowilliams
Copy link

nicowilliams commented Jul 10, 2023

Do you need to specify that the inputs are JSON? Do you need to specify an initial test corpus?

We don't need it for now. I would prefer to leave it as is and let the fuzzer engine explore the codebase.

+1

How does this know what sorts of inputs to start with?

In short: It starts with basically a NULL byte. [...]

I imagine that starting with some test corpus might help find certain bugs faster than starting with one byte and going from there, but starting from one byte (or even no bytes) makes a lot of sense if you have lots of cycles to spare.

After running the fuzzer for 180 seconds, this is the code coverage achieved: [...]

That's quite good!

OSS-Fuzz will run it for a lot longer than 180 seconds, and will make code coverage reports available as well. As such, we can assess after a few days what's the missing code to be analysed and adjust accordingly using techniques e.g. (1) add a corpus; (2) add a dictionary; (3) add a new fuzzer; (4) modify the existing fuzzer.

Great! Thanks for the info!

I'll see about making more fuzzer interface functions available for fuzzing the language too, not just the JSON parser, as well as for fuzzing the streaming JSON parser. Should these be differently named source files?

@nicowilliams
Copy link

@DavidKorczynski how would one fuzz things that need authentication? I'd like to write fuzzer functions for Heimdal, but much of that codebase deals in cryptographic network protocols (mainly Kerberos, but also PKI). One idea I have is that the fuzzer interface can just create credentials as needed and create an envelope with credentials around the payload provided by the fuzzer, but this will reduce coverage. Anyways, there must be examples of codebases like that that are in OSS-Fuzz.

@nicowilliams
Copy link

Also, is there a link for a dashboard to check the fuzzer's progress?

@DavidKorczynski
Copy link
Collaborator Author

I'll see about making more fuzzer interface functions available for fuzzing the language too, not just the JSON parser, as well as for fuzzing the streaming JSON parser. Should these be differently named source files?

That would be great! Feel free to take over and adjust things however you like. I'm also happy to continue contributing fuzzers upstream.

@DavidKorczynski how would one fuzz things that need authentication? I'd like to write fuzzer functions for Heimdal, but much of that codebase deals in cryptographic network protocols (mainly Kerberos, but also PKI). One idea I have is that the fuzzer interface can just create credentials as needed and create an envelope with credentials around the payload provided by the fuzzer, but this will reduce coverage. Anyways, there must be examples of codebases like that that are in OSS-Fuzz.

Am giving Heimdal a look now and will get back on this.

Also, is there a link for a dashboard to check the fuzzer's progress?

Yes, once this PR is merged you should be able to track things on https://oss-fuzz.com as well as introspector.oss-fuzz.com See https://introspector.oss-fuzz.com/project-profile?project=liblouis for an example of how progress can be tracked, as well as further links to code coverage reports and Fuzz Introspector reports.

@nicowilliams
Copy link

Am giving Heimdal a look now and will get back on this.

You could start with one of the Heimdal ASN.1 compiler's READMEs where I document how I've fuzzed it with AFL.

ASN.1 of course doesn't have the credentials problem I mentioned above -- it's as easy to fuzz as jq's JSON parser.

@nicowilliams
Copy link

nicowilliams commented Jul 10, 2023

There's also things like MIT Kerberos (ping @greghudson) (EDIT: MIT Kerberos is already enrolled). All sorts of JWT, OAuth, and other things out there. I think in general the best thing to do may be to factor out all the crypto and fuzz just payloads and also envelopes where the fuzz interface will ignore credentials.

@nicowilliams
Copy link

Looking at https://github.com/google/oss-fuzz/tree/master/projects/krb5 it looks like doing fuzz testing for cryptographic and/or stateful protocols is just hard.

Copy link
Contributor

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jonathanmetzman jonathanmetzman enabled auto-merge (squash) July 11, 2023 03:33
@jonathanmetzman jonathanmetzman enabled auto-merge (squash) July 11, 2023 03:33
@jonathanmetzman jonathanmetzman merged commit 330cc0c into google:master Jul 11, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants