Skip to content

Commit

Permalink
HTML API: Fix splitting single text node.
Browse files Browse the repository at this point in the history
When `next_token()` was introduced, it brought a subtle bug. When encountering a `<` in the HTML stream which did not lead to a tag or comment or other token, it was treating the full text span to that point as one text node, and the following span another text node.

The entire span should be one text node.

In this patch the Tag Processor properly detects this scenario and combines the spans into one text node.

Follow-up to [57348]

Props jonsurrell
Fixes #60385



git-svn-id: https://develop.svn.wordpress.org/trunk@57489 602fd350-edb4-49c9-b593-d223f7449a82
  • Loading branch information
dmsnell committed Jan 30, 2024
1 parent a172e31 commit 0b800d7
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 10 deletions.
41 changes: 31 additions & 10 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1512,16 +1512,6 @@ private function parse_next_tag() {
while ( false !== $at && $at < $doc_length ) {
$at = strpos( $html, '<', $at );

if ( $at > $was_at ) {
$this->parser_state = self::STATE_TEXT_NODE;
$this->token_starts_at = $was_at;
$this->token_length = $at - $was_at;
$this->text_starts_at = $was_at;
$this->text_length = $this->token_length;
$this->bytes_already_parsed = $at;
return true;
}

/*
* This does not imply an incomplete parse; it indicates that there
* can be nothing left in the document other than a #text node.
Expand All @@ -1536,6 +1526,37 @@ private function parse_next_tag() {
return true;
}

if ( $at > $was_at ) {
/*
* A "<" has been found in the document. That may be the start of another node, or
* it may be an "ivalid-first-character-of-tag-name" error. If this is not the start
* of another node the "<" should be included in this text node and another
* termination point should be found for the text node.
*
* @see https://html.spec.whatwg.org/#tag-open-state
*/
if ( strlen( $html ) > $at + 1 ) {
$next_character = $html[ $at + 1 ];
$at_another_node =
'!' === $next_character ||
'/' === $next_character ||
'?' === $next_character ||
( 'A' <= $next_character && $next_character <= 'z' );
if ( ! $at_another_node ) {
++$at;
continue;
}
}

$this->parser_state = self::STATE_TEXT_NODE;
$this->token_starts_at = $was_at;
$this->token_length = $at - $was_at;
$this->text_starts_at = $was_at;
$this->text_length = $this->token_length;
$this->bytes_already_parsed = $at;
return true;
}

$this->token_starts_at = $at;

if ( $at + 1 < $doc_length && '/' === $this->html[ $at + 1 ] ) {
Expand Down
12 changes: 12 additions & 0 deletions tests/phpunit/tests/html-api/wpHtmlTagProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -2715,4 +2715,16 @@ public function test_handles_malformed_taglike_close_short_html() {
$result = $p->next_tag();
$this->assertFalse( $result, 'Did not handle "</ " html properly.' );
}

/**
* Ensures that non-tag syntax starting with `<` is consumed inside a text node.
*
* @ticket 60385
*/
public function test_single_text_node_with_taglike_text() {
$p = new WP_HTML_Tag_Processor( 'test< /A>' );
$p->next_token();
$this->assertSame( '#text', $p->get_token_type(), 'Did not find text node.' );
$this->assertSame( 'test< /A>', $p->get_modifiable_text(), 'Did not find complete text node.' );
}
}

0 comments on commit 0b800d7

Please sign in to comment.