Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for twitter search URLs #420

Merged

Conversation

jayjay-w
Copy link
Contributor

@jayjay-w jayjay-w commented Dec 5, 2023

Description

Add parser for twitter search URLs. Currently, these URLs are failing and throwing errors.

Related to: CV2-3708

How has this been tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can verify the changes. Please describe whether or not you implemented automated tests.

Things to pay attention to during code review

Please describe parts of the change that require extra attention during code review, for example:

  • File FFFF, line LL: This refactoring does this and this. Is it consistent with how it’s implemented elsewhere?
  • Etc.

Checklist

  • I have performed a self-review of my own code
  • I have added unit and feature tests, if the PR implements a new feature or otherwise would benefit from additional testing
  • I have added regression tests, if the PR fixes a bug
  • I have added logging, exception reporting, and custom tracing with any additional information required for debugging
  • I considered secure coding practices when writing this code. Any security concerns are noted above.
  • I have commented my code in hard-to-understand areas, if any
  • I have made needed changes to the README
  • My changes generate no new warnings
  • If I added a third party module, I included a rationale for doing so and followed our current guidelines

@jayjay-w jayjay-w force-pushed the CV2-3708-Look-into-and-add-search-endpoint-to-Twitter-parser branch from 8d72005 to 77334de Compare December 5, 2023 18:52
Add parser for twitter search URLs. Currently, these URLs are failing and throwing errors.

Related to: CV2-3708
@jayjay-w jayjay-w force-pushed the CV2-3708-Look-into-and-add-search-endpoint-to-Twitter-parser branch from 77334de to bd72351 Compare December 5, 2023 19:38
@jayjay-w jayjay-w marked this pull request as ready for review December 5, 2023 20:41
Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on your first feature PR, @jayjay-w ! :)

I tested locally and works great! I made some suggestions and a couple of requests :)

Captura de tela de 2023-12-05 23-36-54

class TwitterSearchItem < Base
include ProviderTwitter

TWITTER_SEARCH_ITEM_URL = /https:\/\/twitter\.com\/search[^\s"]*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request: I suggest we make this regular expression broader:

  • We should always expect at least the q parameter
  • We should support HTTP
  • We need to be sure it starts with http
  • We should support other subdomains (www, mobile, etc.)

That said, I suggest to use the host part we have here: https://github.com/meedan/pender/blob/develop/app/models/parser/twitter_item.rb#L5

Suggestion: We could extract the host part of that regular expression into a TWITTER_HOST_URL constant defined in ProviderTwitter and reused by the other Twitter modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, these were great suggestions. I have implemented everything except the last suggestion, which I will move later.

app/models/parser/twitter_search_item.rb Show resolved Hide resolved
title: search_term,
search_term: search_term,
src: src,
description: "Twitter search for #{search_term}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request: Testing locally, I got two items with different URLs to match because of this description, so I suggest we keep it simple and just return the URL (unescaped, for better readability) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as well. The description is now just the decoded URL.

# Main function for class
def parse_data_for_parser(_doc, _original_url, _jsonld_array)
handle_exceptions(StandardError) do
params = URI.parse(url).query.split('&').map { |pair| pair.split('=') }.to_h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request: Testing locally, I noticed that in some cases the URLs we receive contain escaped characters, so I suggest we unescape when parsing the URL here.

Captura de tela de 2023-12-05 23-35-13

test/models/parser/twitter_item_test.rb Show resolved Hide resolved
Add reviewer feedback:
 - Escape URLs sent to parser
 - Handle different domains (m.twitter.com and mobile.twitter.com)
 - Move src to api
Copy link

codeclimate bot commented Dec 8, 2023

Code Climate has analyzed commit 4b1e2a9 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 98.6% (0.0% change).

View more on Code Climate.

@jayjay-w
Copy link
Contributor Author

jayjay-w commented Dec 8, 2023

@caiosba this is ready for another review

@jayjay-w jayjay-w requested a review from caiosba December 8, 2023 11:55
Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @jayjay-w ! Thanks for addressing the comments from the previous review. I also tested it locally and it works well.

@jayjay-w
Copy link
Contributor Author

jayjay-w commented Dec 8, 2023

Thank you!

@jayjay-w jayjay-w merged commit e60b15d into develop Dec 8, 2023
4 checks passed
@jayjay-w jayjay-w deleted the CV2-3708-Look-into-and-add-search-endpoint-to-Twitter-parser branch December 8, 2023 16:12
Copy link

sentry-io bot commented Dec 9, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ Errno::ENOENT: No such file or directory - @Space_Sstationsdfdsf Api::V1::MediasController#index View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants