-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process inner html of blocks when escaping text content #719
Process inner html of blocks when escaping text content #719
Conversation
bdf39dd
to
9f43b73
Compare
Update remaining tests with inner markup. Try and fix tests. Format tests.
807653d
to
194764c
Compare
$tokens[] = "</{$token_name}>"; | ||
} else { | ||
// Depending on the HTML tag, we may need to process attributes so they are correctly added to the placeholder. | ||
switch ( $token_name ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I would like feedback on is whether this is too fragile, complicated, or tedious an approach to maintain.
I plan to add a few more unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good to rely on the output of get_token_name()
, so based on that, I believe this is a fine way to parse these tags. We have a limited scope since we're currently only dealing with a
, img
and mark
, and otherwise we default to the original tag.
That's only my opinion though 😅 Was there anything you were specifically concerned about with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just needing to add cases for specific tags and if the attributes changes, it may break.
But the options for adding inline HTML / formatting inside blocks are rather limited, so maybe it's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to deconstruct the attributes and reassemble them? I'm worried about dropping attributes or other parts of the markup that might be added by plugins or filters.
For example, could we replace <a href="https://wordpress.org">WordPress</a>
with something like
<a href="<?php echo esc_url( 'https://wordpress.org' ) >?"><?php echo esc_html__( 'WordPress' ) ?></a>
but maintain any other attributes added to the <a>
tag?
I'm not really worried about escaping every possible attribute, I think the main thing is to make sure the translated strings are escaped as a form of user input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about dropping attributes or other parts of the markup that might be added by plugins or filters.
That's a good point. I could not find a way with the HTML Tag Processor to just carry over the whole tag token and all its attributes. But I refactored the approach so that all the attributes should be carried over, rather than processing the attributes based on the type of tag: 678a8ef#diff-fc11ad66397ed68725a8fe74f3031f104368f850b5b94370123482ec9e719e1bR54-R70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works! But I encountered following scenario where multiple individually translated strings now becomes the single translation.
I guess this is because the template in the editor comes with a single string (after translations) and now it is seen as a single string.
I don't think this should be blocking this change. It is to be addressed separately by reading original pattern and current template together may be?
Original
<h2 class="wp-block-heading" id="botswana-new-zealand-south-korea-japan-madagascar" style="font-size:80px;font-style:normal;font-weight:900;line-height:1.1;text-transform:uppercase"><a href=""><?php echo esc_html__( 'ITALY', 'adventurer' ); ?></a><br><a href=""><?php echo esc_html__( 'COSTA RICA', 'adventurer' ); ?></a><br><a href=""><?php echo esc_html__( 'CANADA', 'adventurer' ); ?></a><br><a href=""><?php echo esc_html__( 'LAOS', 'adventurer' ); ?></a><br><a href=""><?php echo esc_html__( 'TURKEY', 'adventurer' ); ?></a></h2>
After CBT localization
<h2 class="wp-block-heading" id="botswana-new-zealand-south-korea-japan-madagascar" style="font-size:80px;font-style:normal;font-weight:900;line-height:1.1;text-transform:uppercase"><?php /* Translators: %s are html tags */ echo sprintf( esc_html__( '%sITALY%s%s%sCOSTA RICA%s%s%sCANADA%s%s%sLAOS%s%s%sTURKEY%s', 'adventurer' ), '<a href="">', '</a>', '<br>', '<a href="">', '</a>', '<br>', '<a href="">', '</a>', '<br>', '<a href="">', '</a>', '<br>', '<a href="">', '</a>' ); ?></h2>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for handling this @jffng! I can see this will fix the issues we've been seeing with translations ✨
@creativecoder, if you get a chance it would be great to hear your thoughts on this.
$tokens[] = "</{$token_name}>"; | ||
} else { | ||
// Depending on the HTML tag, we may need to process attributes so they are correctly added to the placeholder. | ||
switch ( $token_name ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good to rely on the output of get_token_name()
, so based on that, I believe this is a fine way to parse these tags. We have a limited scope since we're currently only dealing with a
, img
and mark
, and otherwise we default to the original tag.
That's only my opinion though 😅 Was there anything you were specifically concerned about with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying this; it looks like a very good start!
$tokens[] = "</{$token_name}>"; | ||
} else { | ||
// Depending on the HTML tag, we may need to process attributes so they are correctly added to the placeholder. | ||
switch ( $token_name ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to deconstruct the attributes and reassemble them? I'm worried about dropping attributes or other parts of the markup that might be added by plugins or filters.
For example, could we replace <a href="https://wordpress.org">WordPress</a>
with something like
<a href="<?php echo esc_url( 'https://wordpress.org' ) >?"><?php echo esc_html__( 'WordPress' ) ?></a>
but maintain any other attributes added to the <a>
tag?
I'm not really worried about escaping every possible attribute, I think the main thing is to make sure the translated strings are escaped as a form of user input.
Escape at the end. Provide better translation note.
I think I addressed all the feedback:
Ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @jffng! I've tested the latest changes out and I believe everything is working as described. Great job 👏
Here are my test results, using the Adventurer theme as my test theme, and editing the Blog Home template (which uses a list of headings with links):
trunk | This PR |
---|---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working well at solving the problem in my testing! 🎉
Here's the block markup I used:
<!-- wp:paragraph {"align":"center"} -->
<p class="has-text-align-center"><strong>Études</strong> is a pioneering firm that seamlessly merges creativity<br>and functionality to <a href="https://wordpress.org" data-type="link" data-id="https://wordpress.org">redefine <em>architectural</em> excellence</a>.</p>
<!-- /wp:paragraph -->
And here's the output in the .php pattern file:
<?php /* Translators: %1$s is the start of a 'strong' HTML element, %2$s is the end of a 'strong' HTML element, %3$s is the start of a 'br' HTML element, %4$s is the start of a 'a' HTML element, %5$s is the start of a 'em' HTML element, %6$s is the end of a 'em' HTML element, %7$s is the end of a 'a' HTML element */ echo sprintf( esc_html__( '%1$sÉtudes%2$s is a pioneering firm that seamlessly merges creativity%3$sand functionality to %4$sredefine %5$sarchitectural%6$s excellence%7$s.', 'twentytwentyfour' ), '<strong>', '</strong>', '<br>', '<a href="' . esc_url( 'https://wordpress.org' ) . '" data-type="link" data-id="https://wordpress.org">', '<em>', '</em>', '</a>' ); ?>
A few changes I think we need to make this output correct
- I believe the
/* translators:
comment needs to be on a separate line, immediately above the call to the translation function (but I'm not 100% sure) - When there are multiple placeholders in a string, the comment should be numbered, e.g.
/* Translators: 1: start of 'strong' HTML element, 2: ...
(see plugin docs) - If there's a way to differentiate between tags that are self closing (like
<br>
), it would be nice to modify the comment (1: 'br' HTML element, 2: ...
), so it's not confusing when there's no corresponding closing tag
Thanks for all the feedback on this one! I think I've addressed this feedback:
Done in 1e317ce.
Done in c665286.
Done in 6b5fe21. Would appreciate any reviews on this so we can hopefully land this soon 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing fine for me 👍
I think it's good enough to solve the translation needs of theme builders.
Let's merge.
Thank you all so much for getting this done! |
Nice! Thank you so much for the fix. |
What
This PR enables HTML markup inside block content to be processed while still allowing the text content to be safely escaped.
Kapture.2024-09-11.at.15.22.11.mp4
719.after_reduced.mp4
How
Updates the
CBT_Theme_Locale::escape_text_content
function to handle nested HTML markup.Uses
WP_HTML_Tag_Processor
to process every token of the blocks content, and generates a string that can be formatted and then translated.Why
Fixes #573 #682 and #691.
To test