Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

Fix segmentations failure in error.c gumbo_caret_diagnostic_to_string #371

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DmitryBochkarev
Copy link

Without this patch method find_last_newline returns value bigger than
find_next_newline and in line original_line.length = line_end - line_start;
overflow happens.

Before changes newly added test failed with segmentation failure:

./test-driver: line 107: 12171 Segmentation fault      (core dumped)
"$@" > $log_file 2>&1

This slightly changed copy of code used in nokogumbo gem.
Link

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@DmitryBochkarev
Copy link
Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

Without this patch method find_last_newline returns value bigger than
find_next_newline and in line `original_line.length = line_end -
line_start;`
overflow happens.

Before changes newly added test failed with segmentation failure:
```
./test-driver: line 107: 12171 Segmentation fault      (core dumped)
"$@" > $log_file 2>&1
```

This slightly changed copy of code used in nokogumbo gem.
[Link](https://github.com/rubys/nokogumbo/blob/8b4446847dea5c614759684ebcae4c580c47f4ad/ext/nokogumboc/nokogumbo.c#L230)
@@ -140,7 +140,7 @@ static const char* find_last_newline(
// There may be an error at EOF, which would be a nul byte.
assert(*c || c == error_location);
}
return c == original_text ? c : c + 1;
return c == original_text || c == error_location ? c : c + 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually the correct fix? If *error_location is \n, shouldn't this move to the previous \n (if any)? For example, given the source text "<\n", that line is the one with the error. With your patch, wouldn't original_line in gumbo_caret_diagnostic_to_string() point to the new line and have length 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you're right. Actually i did not care about error message and did not thinking about that. But i think that is are edge case and handling of that case should be done in caller function.

@kevinhendricks
Copy link
Contributor

FWIW, here is how we are fixing it in our fork of gumbo used inside Sigil:

diff --git a/src/error.c b/src/error.c
index 4e124a9..4b081d0 100644
--- a/src/error.c
+++ b/src/error.c
@@ -137,6 +137,9 @@ static const char* find_last_newline(
     const char* original_text, const char* error_location) {
   assert(error_location >= original_text);
   const char* c = error_location;
+  // if the error location itself is a newline then start searching for 
+  // the preceding newline one character earlier
+  if (*error_location == '\n') --c;
   for (; c != original_text && *c != '\n'; --c) {
     // There may be an error at EOF, which would be a nul byte.
     assert(*c || c == error_location);

Hope this helps.

@kevinhendricks
Copy link
Contributor

Initial loop test prevents special case decrement from ever being a problem. Tested with our own well_formed test:

sigil-gumbo kbhend$ ./well_formed ~/Desktop/junk.html
<!DOCTYPE html>
<html>
<head><title>test</title></head>
<body>
<
</body>
</html>

--------
line: 5 col: 2 type 10 @5:2: Tokenizer error with an unimplemented error message.
@5:2: Tokenizer error with an unimplemented error message.
<
 ^

@stevecheckoway
Copy link

@kevinhendricks That looks about how I worked around the bug for nokogumbo. rubys/nokogumbo@bd62355

I'm not sure if it's possible for an error to occur at a newline that is the first byte in original_text or not. But if so, then decrementing c in that case causes the loop to look for a newline in whatever memory happens to be before original_text. (I believe there are also issues with undefined behavior if error_location points to the beginning of an object as --c would cause c to point to a byte before the object.)

All of which is to say that I think

if (*c == '\n' && c != original_text)
  --c;

is a better fix.

@kevinhendricks
Copy link
Contributor

@stevecheckoway
agreed, your way is better.

Not sure if upstream (this) gumbo is still being actively maintained, but there is an error in tag name from original text that can be nasty in a parse - serialize loop. You may want to pull that fix in as well.

thanks

@stevecheckoway
Copy link

@kevinhendricks Do you have a pointer to the bug/fix? I'm not a maintainer for nokogumbo, merely a user, but they might want to do something about it. (Right now, it uses a git submodule to pull in gumbo-parser, but if this project isn't maintained any more, a local version or a fork might make more sense.)

@kevinhendricks
Copy link
Contributor

kevinhendricks commented Feb 24, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants