Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests for relaxed <select> parser #178

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

josepharhar
Copy link

@josepharhar josepharhar commented Oct 11, 2024

This PR updates the tree-construction dat files for the HTML change which will allow additional tags within <select>:
whatwg/html#10557

This PR updates the tree-construction dat files for the HTML change
which will allow additional tags within <select>:
whatwg/html#10557
@josepharhar
Copy link
Author

I'm not sure what the best practice is for rebaselining errors, but for now I removed all errors from affected tests. There are probably errors in tests I didn't change which may need to be rebaselined as well.

@annevk
Copy link
Contributor

annevk commented Oct 14, 2024

Will there be a separate PR for new tests?

@flavorjones
Copy link
Contributor

FWIW when CI workflows are enabled, Nokogiri (downstream) tests will fail. I've started working on a branch with the proposed changes from whatwg/html#10557

@josepharhar
Copy link
Author

Will there be a separate PR for new tests?

I added new test cases to webkit02 including:

@flavorjones
Copy link
Contributor

Nokogiri work-in-progress at sparklemotion/nokogiri#3317

flavorjones added a commit to sparklemotion/nokogiri that referenced this pull request Oct 16, 2024
flavorjones added a commit to flavorjones/html5lib-tests that referenced this pull request Oct 16, 2024
@flavorjones
Copy link
Contributor

@josepharhar I've got a question about two tests that are very similar. Zooming in on this one from tree-construction/tests1.dat:

#data
<select><b><option><select><option></b></select>
#errors
#document
| <html>
|   <head>
|   <body>
|     <select>
|       <b>
|         <option>
|     <b>
|     <select>
|       <b>
|         <option>

Nokogiri is constructing a different tree:

<body>
  <select>
    <b>
      <option>
  <b>
    <select>
      <option>

and I wanted to ask for a double-check that the test's assertion is correct, before I dive into Nokogiri's parser. Thank you!

@josepharhar
Copy link
Author

Thanks for asking!

It looks like Nokogiri is nesting one <select> inside the other, and my spec/chromium change intentionally fully closes the outer select before inserting the new one, so yeah I think the test's assertion is correct.

Here's the relevant part of the spec PR: https://whatpr.org/html/10557/parsing.html#:~:text=A%20start%20tag%20whose%20tag%20name%20is%20%22select%22

As for the <b> in between the <select> and the <option>, I think there is a mismatch between my spec PR and the chromium implementation - specificially the chromium implementations stops in this case before reconstructing formatting contexts, which isn't in the spec: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/parser/html_tree_builder.cc;l=951;drc=9776ce6636ab1d68d2b1c4f1f719eb71becf39a3

I'll take a closer look at which we should do and get back to you on that. Thanks!

@flavorjones
Copy link
Contributor

flavorjones commented Oct 17, 2024

Thanks for replying so quickly.

It looks like Nokogiri is nesting one select inside the other

No, sorry, unless I'm misunderstanding your comment, this is not a correct description of what Nokogiri's parser is doing. Here's a more graphical representation of the tree from my previous comment:

body
├── select
│   └── b
│       └── option
└── b
    └── select
        └── option

i.e. <body><select><b><option></option></b></select><b><select><option></option></select></b></body>.

The select tags are not nested. Just wanted to clarify. (And if it's helpful context, Nokogiri is maintaining a fork of libgumbo.)

@josepharhar
Copy link
Author

Ah whoops, I failed to read the tree properly 😅

I looked into why chromium is putting the b inside the select instead of the other way around, and i found that the </b> tag in the test is what's making the difference - without it I am getting the same output as Nokogiri.

This is happening due to this call to the adoption agency algorithm when </b> is parsed: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody:adoption-agency-algorithm-4

I verified that chromium does the same thing as Nokogiri when I comment out that call to the adoption agency algorithm.

There's a lot of steps in that algorithm and I'm having a hard time wrapping my head around it, but does Nokogiri have that call to the adoption agency algorithm too? Or perhaps the implementations of the algorithm are different?

@flavorjones
Copy link
Contributor

@josepharhar Thanks again for your kind reply!

Yes, Nokogiri's libgumbo has implemented the adoption agency algorithm, and I have confirmed that in these tests we are invoking it. It's helpful to know this is likely the source of the behavior difference, so I'll focus my efforts on making sure it matches the current spec (though I'll note it passes every other test in this suite ... 🤷).

@flavorjones
Copy link
Contributor

flavorjones commented Oct 18, 2024

OK, I think I know what's going on here. I think Chromium has missed this change:

image

https://github.com/whatwg/html/pull/10557/files#diff-41cf6794ba4200b839c53531555f0f3998df4cbb01a4d5cb0b94e3ca5e23947dR124618

If select is not in the list for default scope tags, then this adoption agency step:

If formattingElement is in the stack of open elements, but the element is not in scope, then this is a parse error; return.

will not trigger a parse-error-and-return, and the algorithm will continue. But select is in the list now, and so the formatting element b is not in scope.

If I remove the select tag from the default scope tags in Nokogiri's libgumbo, then behavior matches the test (and presumably chromium).

Can you check to see if my hunch is right?

@josepharhar
Copy link
Author

Thanks so much for figuring this out! Yeah I totally missed that in the chromium implementation but I just added it and updated the tests here.

@flavorjones
Copy link
Contributor

@josepharhar That's great! Thank you!

I've got a patch to get the error messages to a point where libgumbo is passing, would you mind taking a look and potentially applying to this PR? (Renamed to .txt to get by the github attachment filter.)

select-test-errors.patch.txt

@josepharhar
Copy link
Author

Looks good, thanks! I applied it.

Sometimes I wonder why chromium throws away parse errors.

aarongable pushed a commit to chromium/chromium that referenced this pull request Oct 30, 2024
This change was included in the parser spec changes but I forgot to add
it to the implementation. It was identified here:
html5lib/html5lib-tests#178

Change-Id: I13f7ba11dc2dda814e488829a05fe4ee7c670d52
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5948083
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: David Baron <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1375607}
untitaker added a commit to untitaker/html5ever that referenced this pull request Oct 30, 2024
Neither the proposal nor the test changes are merged yet. Also, the
tests are still failing.

html5lib/html5lib-tests#178
whatwg/html#10557
@untitaker
Copy link
Contributor

untitaker commented Nov 1, 2024

the changes to #errors make webkit02 highly inconsistent and break existing norms in e.g. parse5. See also whatwg/html#1339

untitaker added a commit to untitaker/parse5 that referenced this pull request Nov 1, 2024
html5lib/html5lib-tests#178

The proposal isn't merged yet, and the error codes are off. The forked
html5lib-tests makes it more complicated. I recommend to ditch the fork
of html5lib-tests and give up on standardizing errors.
Co-authored-by: Markus Unterwaditzer <[email protected]>
Comment on lines +456 to +474
| <select>
| <option>
| "B"
| <select>
| <option>
| "C"
| <select>
| <option>
| "D"
| <select>
| <option>
| "E"
| <select>
| <option>
| "F"
| <select>
| <option>
| "G"
| <select>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expectation should be reverted per whatwg/html#10557 (comment)

@@ -438,34 +439,34 @@ eof-in-math
| <select>
| <optgroup>
| <option>
| <hr>
| <hr>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted per whatwg/html#10557 (comment)

Comment on lines +570 to +571
| <option>
| <i>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<i> should be the parent per whatwg/html#10557 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants