Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MV3: Noscript tags with img breaking DOMParser in translator sandbox #477

Open
adomasven opened this issue Jun 5, 2024 · 2 comments
Open

Comments

@adomasven
Copy link
Member

Discovered in: zotero/translators#3311 (comment)
Problem page: https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml

EM translator is not detected, because no meta tags are found. I've discovered, that

document.head.children.length // 191
new DOMParser().parseFromString(document.head.outerHTML, 'text/html').head.children // 20

This is because the 21st tag is

<noscript id="page_tag"><img alt="" vspace="0" hspace="0" border="0" width="1" height="1" language="//pftag.scholarlyiq.com/siqpagetag.gif?js=0"/></noscript>

Apparently, img in noscript before body is invalid, and will cause the head element to be parsed as immediately terminated and body element to begin. So it seems like this page is intentionally breaking crawlers and such from accessing the meta tags in the head element, or something like that.

Anyway, as a proposed solution, I think we should strip all <noscript> tags from <head> in MV3 before parsing.

dstillman added a commit that referenced this issue Jun 5, 2024
…e. See #477"

I thought we could just strip `<noscript>` and save a couple cycles, but
I didn't realize this is operating on the live page, so better to just
remove what's invalid and leave the rest (for all the people running the
Zotero Connector with JavaScript disabled, which may or may not even be
possible...).

This reverts commit f1be738.
@dstillman
Copy link
Member

I'm still not getting EM on this page. The main translator is fixed, so it's less of an issue here, but this is likely preventing EM from working elsewhere.

(4)(+0000028): Translate: Binding sandbox to https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml

debug.js:87 (4)(+0000003): Translate: Parsing code for PubFactory Journals (8d1fb775-df6d-4069-8830-1dfe8e8387dd, 2024-06-04 18:20:00)

debug.js:87 (4)(+0000014): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2019-06-10 23:11:21)

debug.js:87 (4)(+0000002): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2021-06-01 17:38:46)

debug.js:87 (4)(+0000004): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2024-03-27 20:15:00)

debug.js:87 (3)(+0000000): Translate: Prefix 'og' => 'http://ogp.me/ns#'

debug.js:87 (3)(+0000000): Translate: Prefix 'fb' => 'http://ogp.me/ns/fb#'

debug.js:87 (3)(+0000000): Translate: Prefix 'article' => 'http://ogp.me/ns/article#'

debug.js:87 (3)(+0000000): Translate: Embedded Metadata: found 0 meta tags.

debug.js:87 (4)(+0000013): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2024-05-17 20:25:00)

debug.js:87 (3)(+0000000): Translate: All translator detect calls and RPC calls complete:

debug.js:87 (3)(+0000001): 	PubFactory Journals: 200

debug.js:87 (3)(+0000000): 	DOI: 400

@dstillman dstillman reopened this Jun 10, 2024
@adomasven
Copy link
Member Author

I cannot reproduce this in a new profile with the current release build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants