Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

DmitryBochkarev · 2016-12-15T13:24:36Z

Without this patch method find_last_newline returns value bigger than
find_next_newline and in line original_line.length = line_end - line_start;
overflow happens.

Before changes newly added test failed with segmentation failure:

./test-driver: line 107: 12171 Segmentation fault      (core dumped)
"$@" > $log_file 2>&1

This slightly changed copy of code used in nokogumbo gem.
Link

googlebot · 2016-12-15T13:24:39Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

DmitryBochkarev · 2016-12-15T13:28:44Z

I signed it!

googlebot · 2016-12-15T13:28:45Z

CLAs look good, thanks!

Without this patch method find_last_newline returns value bigger than find_next_newline and in line `original_line.length = line_end - line_start;` overflow happens. Before changes newly added test failed with segmentation failure: ``` ./test-driver: line 107: 12171 Segmentation fault (core dumped) "$@" > $log_file 2>&1 ``` This slightly changed copy of code used in nokogumbo gem. [Link](https://github.com/rubys/nokogumbo/blob/8b4446847dea5c614759684ebcae4c580c47f4ad/ext/nokogumboc/nokogumbo.c#L230)

stevecheckoway · 2017-02-23T22:41:05Z

src/error.c

@@ -140,7 +140,7 @@ static const char* find_last_newline(
    // There may be an error at EOF, which would be a nul byte.
    assert(*c || c == error_location);
  }
-  return c == original_text ? c : c + 1;
+  return c == original_text || c == error_location ? c : c + 1;


Is this actually the correct fix? If *error_location is \n, shouldn't this move to the previous \n (if any)? For example, given the source text "<\n", that line is the one with the error. With your patch, wouldn't original_line in gumbo_caret_diagnostic_to_string() point to the new line and have length 0?

Maybe you're right. Actually i did not care about error message and did not thinking about that. But i think that is are edge case and handling of that case should be done in caller function.

kevinhendricks · 2017-02-24T18:12:55Z

FWIW, here is how we are fixing it in our fork of gumbo used inside Sigil:

diff --git a/src/error.c b/src/error.c
index 4e124a9..4b081d0 100644
--- a/src/error.c
+++ b/src/error.c
@@ -137,6 +137,9 @@ static const char* find_last_newline(
     const char* original_text, const char* error_location) {
   assert(error_location >= original_text);
   const char* c = error_location;
+  // if the error location itself is a newline then start searching for 
+  // the preceding newline one character earlier
+  if (*error_location == '\n') --c;
   for (; c != original_text && *c != '\n'; --c) {
     // There may be an error at EOF, which would be a nul byte.
     assert(*c || c == error_location);

Hope this helps.

kevinhendricks · 2017-02-24T18:28:27Z

Initial loop test prevents special case decrement from ever being a problem. Tested with our own well_formed test:

sigil-gumbo kbhend$ ./well_formed ~/Desktop/junk.html
<!DOCTYPE html>
<html>
<head><title>test</title></head>
<body>
<
</body>
</html>

--------
line: 5 col: 2 type 10 @5:2: Tokenizer error with an unimplemented error message.
@5:2: Tokenizer error with an unimplemented error message.
<
 ^

stevecheckoway · 2017-02-24T18:30:08Z

@kevinhendricks That looks about how I worked around the bug for nokogumbo. rubys/nokogumbo@bd62355

I'm not sure if it's possible for an error to occur at a newline that is the first byte in original_text or not. But if so, then decrementing c in that case causes the loop to look for a newline in whatever memory happens to be before original_text. (I believe there are also issues with undefined behavior if error_location points to the beginning of an object as --c would cause c to point to a byte before the object.)

All of which is to say that I think

if (*c == '\n' && c != original_text)
  --c;

is a better fix.

kevinhendricks · 2017-02-24T18:38:56Z

@stevecheckoway
agreed, your way is better.

Not sure if upstream (this) gumbo is still being actively maintained, but there is an error in tag name from original text that can be nasty in a parse - serialize loop. You may want to pull that fix in as well.

thanks

stevecheckoway · 2017-02-24T22:05:44Z

@kevinhendricks Do you have a pointer to the bug/fix? I'm not a maintainer for nokogumbo, merely a user, but they might want to do something about it. (Right now, it uses a git submodule to pull in gumbo-parser, but if this project isn't maintained any more, a local version or a fork might make more sense.)

kevinhendricks · 2017-02-24T22:09:40Z

See #375 The bug is only really dangerous if you parse and then re-serialize the resulting tree.

…

Sent from my iPad

On Feb 24, 2017, at 5:05 PM, Stephen Checkoway ***@***.***> wrote: @kevinhendricks Do you have a pointer to the bug/fix? I'm not a maintainer for nokogumbo, merely a user, but they might want to do something about it. (Right now, it uses a git submodule to pull in gumbo-parser, but if this project isn't maintained any more, a local version or a fork might make more sense.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

DmitryBochkarev force-pushed the segmentation_failure_fix branch from 5da65e7 to ef94f4b Compare December 15, 2016 13:31

DmitryBochkarev mentioned this pull request Dec 16, 2016

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string abak-press/gumbo-parser#1

Merged

srjacobs mentioned this pull request Jan 11, 2017

Travis configuration tries to get googletest archive from incorrect url #373

Closed

stevecheckoway mentioned this pull request Feb 23, 2017

Segfault on parse using 1.4.10 rubys/nokogumbo#50

Closed

stevecheckoway reviewed Feb 23, 2017

View reviewed changes

stevecheckoway mentioned this pull request Mar 2, 2017

Fork gumbo-parser? rubys/nokogumbo#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

DmitryBochkarev commented Dec 15, 2016

googlebot commented Dec 15, 2016

DmitryBochkarev commented Dec 15, 2016

googlebot commented Dec 15, 2016

stevecheckoway Feb 23, 2017

DmitryBochkarev Feb 24, 2017

kevinhendricks commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017

stevecheckoway commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017

stevecheckoway commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017 via email

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

Are you sure you want to change the base?

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

Conversation

DmitryBochkarev commented Dec 15, 2016

googlebot commented Dec 15, 2016

DmitryBochkarev commented Dec 15, 2016

googlebot commented Dec 15, 2016

stevecheckoway Feb 23, 2017

Choose a reason for hiding this comment

DmitryBochkarev Feb 24, 2017

Choose a reason for hiding this comment

kevinhendricks commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017

stevecheckoway commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017

stevecheckoway commented Feb 24, 2017

kevinhendricks commented Feb 24, 2017 via email