Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3653 - extract username correctly when there is a query #385

Merged
merged 3 commits into from
Sep 6, 2023

Conversation

vasconsaurus
Copy link
Contributor

Description

Sometimes a profile url might come with a query, e.g: https://twitter.com/username?ref_src=blah.
We need to make sure we are getting only the username to make the request to the twitter api endpoint. To do that I updated the regex inside twiter_profile.rb.

References: 3653

How has this been tested?

I added two new tests to the test "matches known URL patterns, and returns instance on success" (rails test test/models/parser/twitter_profile_test.rb:41).

I also added a new test that checks for this specific scenario "should parse tweet profile with a query on the url" (rails test test/models/parser/twitter_profile_test.rb:203).

sometimes a profile url might come with a query, we need to make sure
we are getting only the username to make the request to the twitter api
endpoint
test/models/parser/twitter_profile_test.rb Dismissed Show dismissed Hide dismissed
it was missing the final character when there wasn't a query
trying to fit everything into two patterns didn't work, it ended up
matching the twitter item as well. So I broke it into two more patterns,
and ordered them from the two that are more specific to the less.
@vasconsaurus
Copy link
Contributor Author

The travis errors are the expected ones 🥲 (kwai and facebook, still need to deal with them)

@vasconsaurus vasconsaurus changed the title wip - 3653 - remove query after username 3653 - remove query after username Sep 5, 2023
@vasconsaurus vasconsaurus changed the title 3653 - remove query after username 3653 - extract username correctly when there is a query Sep 5, 2023
@vasconsaurus vasconsaurus marked this pull request as ready for review September 5, 2023 19:59
Comment on lines +203 to +210
test "should parse tweet profile with a query on the url" do
stub_profile_lookup.returns(twitter_profile_response_success)

data = Parser::TwitterProfile.new('https://www.twitter.com/fake_user?ref_src=twsrc%5Etfw').parse_data(empty_doc)

assert_equal 'fake_user', data['external_id']
assert_equal '@fake_user', data['username']
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Comment on lines 14 to 15
/^https?:\/\/(www\.)?twitter\.com\/(?<username>[^\/]+)$/,
/^https?:\/\/(0|m|mobile)\.twitter\.com\/(?<username>[^\/]+)$/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: These two regular expressions are still too broad and will keep matching things we don't want, right? But it can be a separate ticket, where we follow this: https://help.twitter.com/en/managing-your-account/twitter-username-rules#error

Your username cannot be longer than 15 characters. Your name can be longer (50 characters) or shorter than 4 characters, but usernames are kept shorter for the sake of ease.

A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I kept it as it was from our previous twitter parser (that used v1.1). Anything that has twitter and isn't a tweet will match this one. Do we want to keep it this broad or make it more specific? If someone mistyped, e.g: twitter.com/fake user, it would still try to parse it as a twitter link (though, at this moment, it would fail)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as is for now, and refine as we notice those cases (like we're handling in this ticket).

@vasconsaurus vasconsaurus merged commit 80195d0 into develop Sep 6, 2023
3 of 4 checks passed
@vasconsaurus vasconsaurus deleted the fix-3653-remove-query-after-username branch September 6, 2023 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants