Skip to content
This repository has been archived by the owner on Sep 27, 2024. It is now read-only.

Use Jsoup instead of TagSoup for HTML parsing. #1019

Merged
merged 3 commits into from
Jul 17, 2024

Conversation

jmartinesp
Copy link
Contributor

Jsoup, being way more flexible, allows us to simplify the parsing a lot and do checks we couldn't do before like one which lets us know when extra line breaks between inline nodes and block ones are needed.

Being way more flexible, it allows us to simplify the parsing *a lot* and do checks we couldn't do before, like one which was needed to know when extra line breaks between inline nodes and block ones are needed.
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.53%. Comparing base (1e975b1) to head (5cfb891).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1019      +/-   ##
==========================================
- Coverage   90.28%   89.53%   -0.75%     
==========================================
  Files         179      104      -75     
  Lines       21474    17996    -3478     
  Branches      280      280              
==========================================
- Hits        19387    16113    -3274     
+ Misses       2084     1880     -204     
  Partials        3        3              
Flag Coverage Δ
uitests ?
uitests-ios ?
unittests 89.53% <ø> (+0.29%) ⬆️
unittests-ios ?
unittests-react 88.66% <ø> (ø)
unittests-rust 89.67% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -57,7 +58,7 @@ class InterceptInputConnectionIntegrationTest {
textView.text.dumpSpans(), equalTo(
listOf(
"hello: android.widget.TextView.ChangeWatcher (0-5) fl=#6553618",
"hello: android.text.style.StyleSpan (0-5) fl=#33",
"hello: android.text.style.StyleSpan (0-5) fl=#17",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are caused by using inSpans, but they shouldn't affect the functionality of the editor.

Copy link
Contributor

@bmarty bmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks!

@@ -81,6 +81,9 @@ class MainActivity : ComponentActivity() {
var linkDialogAction by remember { mutableStateOf<LinkAction?>(null) }
val coroutineScope = rememberCoroutineScope()

LaunchedEffect(state.messageHtml) {
Timber.d("Message HTML: '${state.messageHtml}'")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want to remove this log (privacy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is in the compose integration example only 😅 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I read it a bit fast. I am always paranoid regarding what app and libraries are logging out.

"pre" -> parseCodeBlock(element)
"blockquote" -> parseQuote(element)
"p" -> parseParagraph(element)
"br" -> parseLineBreak(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a log for un-handled tag names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! In fact, I added a clean step where we remove any tags and attributes we don't recognise.

InlineFormat.Italic -> StyleSpan(Typeface.ITALIC)
InlineFormat.Underline -> UnderlineSpan()
InlineFormat.StrikeThrough -> StrikethroughSpan()
InlineFormat.InlineCode -> return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return statement is strange to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to clarify this.

Copy link

sonarcloud bot commented Jul 17, 2024

@jmartinesp jmartinesp requested a review from bmarty July 17, 2024 07:59
@jmartinesp
Copy link
Contributor Author

I had to add some extra processing for whitespaces because Jsoup apparently doesn't directly strip it as Tagsoup did, and this could potentially cause index issues in the future.

Copy link
Contributor

@bmarty bmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@jmartinesp jmartinesp merged commit 6f982ac into main Jul 17, 2024
7 checks passed
@jmartinesp jmartinesp deleted the android/use-jsoup-for-html-parsing-fixes-line-breaks branch July 17, 2024 12:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants