-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow due to excessive recursion when using DomainBuilder #174
Comments
I came across fuzztest grammars and noticed that they support a generation budget to limit recursion: fuzztest/fuzztest/internal/domains/in_grammar_impl.h Lines 197 to 213 in 895ea48
Grammars don't seem to be documented as far as I can tell. Here's what I found:
A quick attempt at using this feature for my problem failed because the bazel macro can't be used outside of the fuzztest workspace:
Bazel tries to resolve the //fuzztest reference to a package within my workspace instead of @com_google_fuzztest//fuzztest
I fixed it with diff --git a/build_defs/cc_fuzztest_grammar_library.bzl b/build_defs/cc_fuzztest_grammar_library.bzl
index ea8de65..657958c 100644
--- a/build_defs/cc_fuzztest_grammar_library.bzl
+++ b/build_defs/cc_fuzztest_grammar_library.bzl
@@ -33,7 +33,7 @@ def cc_fuzztest_grammar_library(name, srcs, top_level_rule = None):
"""
output_file_name = name + ".h"
- cmd = "$(location //tools:grammar_domain_code_generator)" + \
+ cmd = "$(location @com_google_fuzztest//tools:grammar_domain_code_generator)" + \
" --output_header_file_path " + "$(@D)/" + output_file_name + \
" --input_grammar_files " + "`echo $(SRCS) | tr ' ' ','`"
if top_level_rule:
@@ -45,11 +45,11 @@ def cc_fuzztest_grammar_library(name, srcs, top_level_rule = None):
outs = [output_file_name],
cmd = cmd,
heuristic_label_expansion = False,
- tools = ["//tools:grammar_domain_code_generator"],
+ tools = ["@com_google_fuzztest//tools:grammar_domain_code_generator"],
)
native.cc_library(
name = name,
hdrs = [output_file_name],
- deps = ["//fuzztest:domain"],
+ deps = ["@com_google_fuzztest//fuzztest:domain"],
) Then I created the following grammar: // regex.g4
grammar REGEX_GRAMMAR;
E : E '|' E | T ;
T : T T | F ;
F : F '*' | F '?' | G ;
G : '(' E ')' | '\\' H | [a-zA-Z0-9_] ;
H : '\\' | '|' | '*' | '(' | ')' ;
WSPACE : [ \t\n\r]+ -> skip; Used # some/package/BUILD
load("@com_google_fuzztest//build_defs:cc_fuzztest_grammar_library.bzl", "cc_fuzztest_grammar_library")
cc_fuzztest_grammar_library(
name = "regex_grammar",
srcs = ["regex.g4"],
top_level_rule = "E",
)
cc_test(
name = "fuzztest",
size = "small",
srcs = ["fuzztest.cpp"],
deps = [
# ...
"@com_google_fuzztest//fuzztest",
"@com_google_fuzztest//fuzztest:fuzztest_gtest_main",
"@com_google_googletest//:gtest",
":regex_grammar",
],
) And finally /// @file fuzztest.cpp
#include "some/package/regex_grammar.h"
// ...
FUZZ_TEST(Fixture, Test).WithDomains(fuzztest::InEGrammar()); And it works 🙂 With
I need to investigate more when I find the time. I won't close this issue because using |
I didn't get to investigating the stack smashing yet, but I found other things: My understanding of antlr grammars was flawed: The above only has "lexer rules" but no "parser rules". There were some more issues which at least http://lab.antlr.org/ didn't accept. Now I use grammar REGEX_GRAMMAR;
e : t ( '|' t )* ;
t : f f* ;
f : f '*' | f '?' | g ;
g : '(' e ')' | '\\' h | CHAR ;
h : '\\' | '|' | '*' | '(' | ')';
CHAR : [a-zA-Z0-9_] ; which is accepted by both http://lab.antlr.org/ and fuzztest - at least after some fixes: |
Using https://stackoverflow.com/a/51897264/8857097 I was able to narrow down the stack smashing problem. ABSL_SWISSTABLE_ENABLE_GENERATIONS has some influence on the problem: It is only set in sanitizer builds and determines whether CommonFields inherits from an empty base class (without sanitizers) or CommonFieldsGenerationInfoEnabled (with sanitizers).
Totalling 24 Bytes. Ignoring the base class,
Totalling 32 Bytes. But I see in the assembly that always this code is generated in the 0x00007ffff7b36630 <+0>: push rbp
0x00007ffff7b36631 <+1>: mov rbp,rsp
0x00007ffff7b36634 <+4>: sub rsp,0x70
0x00007ffff7b36638 <+8>: mov rax,QWORD PTR fs:0x28
0x00007ffff7b36641 <+17>: mov QWORD PTR [rbp-0x8],rax
...
0x00007ffff7b3665c <+44>: lea rdi,[rbp-0x30]
0x00007ffff7b36660 <+48>: call 0x7ffff7b1c080 <absl::container_internal::CommonFields::CommonFields()@plt> At the start we see the preamble, where the stack canary is stored at The With empty base class:
We see that And the other case: Dump of assembler code for function _ZN4absl18container_internal12CommonFieldsC2Ev:
...
0x000055555571b51e <+30>: mov QWORD PTR [rbp-0x20],rdi
0x000055555571b522 <+34>: call 0x55555571b6a0 <absl::container_internal::CommonFieldsGenerationInfoEnabled::CommonFieldsGenerationInfoEnabled()>
0x000055555571b527 <+39>: mov rax,QWORD PTR [rbp-0x20]
0x000055555571b52b <+43>: add rax,0x18
0x000055555571b52f <+47>: mov QWORD PTR [rbp-0x18],rax
0x000055555571b533 <+51>: call 0x55555571b760 <absl::container_internal::EmptyGroup()>
0x000055555571b538 <+56>: mov rcx,rax
...
0x000055555571b53f <+63>: mov QWORD PTR [rbp-0x10],rcx
...
0x000055555571b55d <+93>: mov rax,QWORD PTR [rbp-0x20]
0x000055555571b561 <+97>: mov rcx,QWORD PTR [rbp-0x18]
0x000055555571b565 <+101>: mov rdx,QWORD PTR [rbp-0x10]
0x000055555571b569 <+105>: mov QWORD PTR [rcx],rdx
0x000055555571b56c <+108>: add rax,0x20
0x000055555571b570 <+112>: mov QWORD PTR [rbp-0x28],rax
...
0x000055555571b58e <+142>: mov rax,QWORD PTR [rbp-0x20]
0x000055555571b592 <+146>: mov rcx,QWORD PTR [rbp-0x28]
0x000055555571b596 <+150>: mov QWORD PTR [rcx],0x0
0x000055555571b59d <+157>: add rax,0x28
0x000055555571b5a1 <+161>: mov QWORD PTR [rbp-0x30],rax
...
0x000055555571b5bf <+191>: mov rax,QWORD PTR [rbp-0x20]
0x000055555571b5c3 <+195>: mov rcx,QWORD PTR [rbp-0x30]
0x000055555571b5c7 <+199>: mov QWORD PTR [rcx],0x0
0x000055555571b5ce <+206>: add rax,0x30
0x000055555571b5d2 <+210>: mov QWORD PTR [rbp-0x38],rax
...
0x000055555571b5f0 <+240>: mov rax,QWORD PTR [rbp-0x38]
0x000055555571b5f4 <+244>: mov QWORD PTR [rax],0x0 It's very convoluted due to the asan checks which I have omitted here, but we see that Interestingly, the bug occurs in
but not in
In the second case very different assembly is generated and I'm having a tough time figuring out what is going on . Various asan references are noticable, which are not found in the broken case 😕 Smells like an ODR violation: Somehow a library that includes |
I figured that it could only be related to the missing I reran So now I have no problem with grammar fuzzing anymore 🙂 |
I experimented with workarounds for Limited Strict RecursionThe basic idea in all alternatives is that we provide separate, recursively defined domains, i.e functions returning domains. This allows us, for example to pass a size argument to the function that we can decrease in recursive calls. By providing a recursion base, we can avoid infinite recursion. And by choosing the size argument appropriately, we can avoid stack overflows: namespace strict
{
fuzztest::Domain<std::string> ArbitraryE(std::size_t size);
fuzztest::Domain<char> ArbitraryH() { return ElementOf({'\\', '|', '*', '(', ')'}); }
fuzztest::Domain<std::string> ArbitraryG(const std::size_t size)
{
if (size == 0)
{
return OneOf(Map([](const char h) { return std::string{"\\"} + h; }, ArbitraryH()),
StringOf(AlphaNumericChar()).WithSize(1));
}
return OneOf(Map([](std::string&& e) { return '(' + std::move(e) + ')'; }, ArbitraryE(size - 1)),
Map([](const char h) { return std::string{"\\"} + h; }, ArbitraryH()),
StringOf(AlphaNumericChar()).WithSize(1));
}
fuzztest::Domain<std::string> ArbitraryF(const std::size_t size)
{
if (size == 0)
{
return OneOf(ArbitraryG(size));
}
return OneOf(Map([](std::string&& f) { return std::move(f) + '*'; }, ArbitraryF(size - 1)),
Map([](std::string&& f) { return std::move(f) + '?'; }, ArbitraryF(size - 1)),
ArbitraryG(size));
}
fuzztest::Domain<std::string> ArbitraryT(const std::size_t size)
{
if (size == 0)
{
return OneOf(ArbitraryF(size));
}
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + std::move(e2); },
ArbitraryT(size - 1),
ArbitraryT(size - 1)),
ArbitraryF(size));
}
fuzztest::Domain<std::string> ArbitraryE(const std::size_t size)
{
if (size == 0)
{
return OneOf(ArbitraryT(size));
}
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + '|' + std::move(e2); },
ArbitraryE(size - 1),
ArbitraryE(size - 1)),
ArbitraryT(size));
}
} // namespace strict
FUZZ_TEST(Foo, Bar).WithDomains(strict::ArbitraryE(10)); A downside of this version is that the construction of the domain is "strict", not "lazy": Before fuzzing can even begin fuzzing, just the initial call to Limited Lazy RecursionWe can improve by creating the domains lazily. Here FlatMap comes in handy: Calling namespace lazy
{
fuzztest::Domain<std::string> ArbitraryE(std::size_t size);
fuzztest::Domain<char> ArbitraryH() { return ElementOf({'\\', '|', '*', '(', ')'}); }
fuzztest::Domain<std::string> ArbitraryG(const std::size_t s)
{
return FlatMap(
[](const std::size_t size) -> fuzztest::Domain<std::string> {
if (size == 0)
{
return OneOf(Map([](const char h) { return std::string{"\\"} + h; }, ArbitraryH()),
StringOf(AlphaNumericChar()).WithSize(1));
}
return OneOf(Map([](std::string&& e) { return '(' + std::move(e) + ')'; }, ArbitraryE(size - 1)),
Map([](const char h) { return std::string{"\\"} + h; }, ArbitraryH()),
StringOf(AlphaNumericChar()).WithSize(1));
},
Just(s));
}
fuzztest::Domain<std::string> ArbitraryF(const std::size_t s)
{
return FlatMap(
[](const std::size_t size) -> fuzztest::Domain<std::string> {
if (size == 0)
{
return OneOf(ArbitraryG(size));
}
return OneOf(Map([](std::string&& f) { return std::move(f) + '*'; }, ArbitraryF(size - 1)),
Map([](std::string&& f) { return std::move(f) + '?'; }, ArbitraryF(size - 1)),
ArbitraryG(size));
},
Just(s));
}
fuzztest::Domain<std::string> ArbitraryT(const std::size_t s)
{
return FlatMap(
[](const std::size_t size) -> fuzztest::Domain<std::string> {
if (size == 0)
{
return OneOf(ArbitraryF(size));
}
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + std::move(e2); },
ArbitraryT(size - 1),
ArbitraryT(size - 1)),
ArbitraryF(size));
},
Just(s));
}
fuzztest::Domain<std::string> ArbitraryE(const std::size_t s)
{
return FlatMap(
[](const std::size_t size) -> fuzztest::Domain<std::string> {
if (size == 0)
{
return OneOf(ArbitraryT(size));
}
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + '|' + std::move(e2); },
ArbitraryE(size - 1),
ArbitraryE(size - 1)),
ArbitraryT(size));
},
Just(s));
}
} // namespace lazy This lazy version is actually usable. Looking at the Unlimited Lazy RecursionThis is not an actual suggestion, just a theoretical observation. Since we now have a lazy version, we don't actually need the limit anymore just to avoid infinite recursion. By using namespace lazy_no_limit
{
fuzztest::Domain<std::string> ArbitraryE();
fuzztest::Domain<char> ArbitraryH() { return ElementOf({'\\', '|', '*', '(', ')'}); }
fuzztest::Domain<std::string> ArbitraryG()
{
return FlatMap([] {
return OneOf(Map([](std::string&& e) { return '(' + std::move(e) + ')'; }, ArbitraryE()),
Map([](const char h) { return std::string{"\\"} + h; }, ArbitraryH()),
StringOf(AlphaNumericChar()).WithSize(1));
});
}
fuzztest::Domain<std::string> ArbitraryF()
{
return FlatMap([] {
return OneOf(Map([](std::string&& f) { return std::move(f) + '*'; }, ArbitraryF()),
Map([](std::string&& f) { return std::move(f) + '?'; }, ArbitraryF()),
ArbitraryG());
});
}
fuzztest::Domain<std::string> ArbitraryT()
{
return FlatMap([] {
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + std::move(e2); },
ArbitraryT(),
ArbitraryT()),
ArbitraryF());
});
}
fuzztest::Domain<std::string> ArbitraryE()
{
return FlatMap([] {
return OneOf(Map([](std::string&& e1, std::string&& e2) { return std::move(e1) + '|' + std::move(e2); },
ArbitraryE(),
ArbitraryE()),
ArbitraryT());
});
}
} // namespace lazy_no_limit Unfortunately and not unexpected, we also get the same result as with |
A warning about this has been added: #1007 |
I was trying to define a domain of somewhat simple regexes:
The recursion in this domain blows up even 100MB of stack (
ulimit -s 100000
) almost immediately after starting fuzzing runs. Here's the bottom of the stack trace:I tried to see if simplifying the domain definition helps, and it does. The following takes a few seconds more to blow up:
And the following doesn't run into stack overflows at all:
I'm thinking about how I could limit the recursion by changing the domain definition (e.g. introduce a "max depth" parameter. But I can't see how, because I can't pass additional parameters to
DomainBuilder::Get
.If
Get
took additional parameters andSet
could take a function (instead of the domain) which in turn takes these parameters and returns a domain, then I could consciously limit recursion, I think 🤔The text was updated successfully, but these errors were encountered: