-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3723 – Fix Kwai parser #387
Conversation
2c8cc15
to
0c88a92
Compare
The parser stopped working, the classes we were using to parse the doc don't seem to be there anymore. There doesn't seem to be something we could directly replace them. But there is a ldjson available, so we are now getting our data from it.
right now I'm keeping the old behavior and adding the new one as a fallback. not too sure about this yet.
the title we were getting from the ld+json could be the same for multiple videos from the same author, so we are setting the title to the url instead and since what we get for the title and description might vary, in the integration test we only check if those two are strings
0c88a92
to
473706b
Compare
The url fallback is handled by Media. I wanted to test that, and the Kwai parser directly, so I made two tests: test "assigns description and username to hash from the json+ld" "assigns values to hash from the json+ld and falls back to url as title" The second test had worked locally, but failed on Travis, then failed in weird ways locally, but now seems to be successful. I'll run it on Travis again and see how this behaves.
Code Climate has analyzed commit bd60776 and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (50% is the threshold). This pull request will bring the total coverage in the repository to 98.6%. View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It aligns with what we discussed in the ticket.
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Description
Our Kwai test has been failing for a while, since it had been flaky before we assumed it was just flaky again. However, it seems that the parsing is indeed broken: we search the doc for '.info .title' and for '.name', and both of those are returning nil.
We can get data from the jsonld script. Besides the
name
(we can get that fromcreator name
), it doesn't return the same exact data as before. We can usedescription
ortranscription
fordata['description']
. Fortitle
we don't have a good substitute, so we are just using the default fallback fromMedia
(url
).We are leaving the old behavior and adding the data from the jsonld script as a fallback. Kwai has changed their HTML and changed it back again before. So we think it's worth keeping the old behavior, at least for now.
References: 3723, 3716
How has this been tested?
I updated the integration test to use the new values we are fetching (
rails test test/models/parser/kwai_test.rb:4
), and added two unit tests:rails test test/models/parser/kwai_test.rb:50
)rails test test/models/parser/kwai_test.rb:61
)