Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3723 – Fix Kwai parser #387

Merged
merged 5 commits into from
Sep 11, 2023
Merged

3723 – Fix Kwai parser #387

merged 5 commits into from
Sep 11, 2023

Conversation

vasconsaurus
Copy link
Contributor

@vasconsaurus vasconsaurus commented Sep 7, 2023

Description

Our Kwai test has been failing for a while, since it had been flaky before we assumed it was just flaky again. However, it seems that the parsing is indeed broken: we search the doc for '.info .title' and for '.name', and both of those are returning nil.

We can get data from the jsonld script. Besides the name(we can get that from creator name), it doesn't return the same exact data as before. We can use description or transcription for data['description']. For title we don't have a good substitute, so we are just using the default fallback from Media (url).

We are leaving the old behavior and adding the data from the jsonld script as a fallback. Kwai has changed their HTML and changed it back again before. So we think it's worth keeping the old behavior, at least for now.

References: 3723, 3716

How has this been tested?

I updated the integration test to use the new values we are fetching (rails test test/models/parser/kwai_test.rb:4), and added two unit tests:

  • make sure we are getting author_name and description from json+ld successfully (rails test test/models/parser/kwai_test.rb:50)
  • make sure the title uses the url as a fallback (rails test test/models/parser/kwai_test.rb:61)

The parser stopped working, the classes we were using to parse the doc
don't seem to be there anymore. There doesn't seem to be something we could
directly replace them.

But there is a ldjson available, so we are now getting our data from it.
right now I'm keeping the old behavior and adding the new one as a fallback.
not too sure about this yet.
the title we were getting from the ld+json could be the same for
multiple videos from the same author, so we are setting the title to
the url instead

and since what we get for the title and description might vary, in the
integration test we only check if those two are strings
The url fallback is handled by Media. I wanted to test that, and the Kwai
parser directly, so I made two tests:
test "assigns description and username to hash from the json+ld"
"assigns values to hash from the json+ld and falls back to url as title"

The second test had worked locally, but failed on Travis, then failed in
weird ways locally, but now seems to be successful. I'll run it on Travis
again and see how this behaves.
@codeclimate
Copy link

codeclimate bot commented Sep 11, 2023

Code Climate has analyzed commit bd60776 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 98.6%.

View more on Code Climate.

@vasconsaurus vasconsaurus changed the title wip: 3723 – Fix Kwai parser 3723 – Fix Kwai parser Sep 11, 2023
@vasconsaurus vasconsaurus marked this pull request as ready for review September 11, 2023 16:16
Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It aligns with what we discussed in the ticket.

@vasconsaurus vasconsaurus merged commit 2d6dd8c into develop Sep 11, 2023
4 checks passed
@vasconsaurus vasconsaurus deleted the fix-3723-fix-kwai-parsing branch September 11, 2023 17:17
@sentry-io
Copy link

sentry-io bot commented Sep 11, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ **NoMethodError: undefined method strip' for nil:NilClass** Api::V1::MediasController#index` View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants