-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3653 - extract username correctly when there is a query #385
Conversation
sometimes a profile url might come with a query, we need to make sure we are getting only the username to make the request to the twitter api endpoint
trying to fit everything into two patterns didn't work, it ended up matching the twitter item as well. So I broke it into two more patterns, and ordered them from the two that are more specific to the less.
The travis errors are the expected ones 🥲 (kwai and facebook, still need to deal with them) |
test "should parse tweet profile with a query on the url" do | ||
stub_profile_lookup.returns(twitter_profile_response_success) | ||
|
||
data = Parser::TwitterProfile.new('https://www.twitter.com/fake_user?ref_src=twsrc%5Etfw').parse_data(empty_doc) | ||
|
||
assert_equal 'fake_user', data['external_id'] | ||
assert_equal '@fake_user', data['username'] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test!
/^https?:\/\/(www\.)?twitter\.com\/(?<username>[^\/]+)$/, | ||
/^https?:\/\/(0|m|mobile)\.twitter\.com\/(?<username>[^\/]+)$/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: These two regular expressions are still too broad and will keep matching things we don't want, right? But it can be a separate ticket, where we follow this: https://help.twitter.com/en/managing-your-account/twitter-username-rules#error
Your username cannot be longer than 15 characters. Your name can be longer (50 characters) or shorter than 4 characters, but usernames are kept shorter for the sake of ease.
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I kept it as it was from our previous twitter parser (that used v1.1). Anything that has twitter and isn't a tweet will match this one. Do we want to keep it this broad or make it more specific? If someone mistyped, e.g: twitter.com/fake user
, it would still try to parse it as a twitter link (though, at this moment, it would fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it as is for now, and refine as we notice those cases (like we're handling in this ticket).
Description
Sometimes a profile url might come with a query, e.g:
https://twitter.com/username?ref_src=blah
.We need to make sure we are getting only the username to make the request to the twitter api endpoint. To do that I updated the regex inside
twiter_profile.rb
.References: 3653
How has this been tested?
I added two new tests to the test "matches known URL patterns, and returns instance on success" (
rails test test/models/parser/twitter_profile_test.rb:41
).I also added a new test that checks for this specific scenario "should parse tweet profile with a query on the url" (
rails test test/models/parser/twitter_profile_test.rb:203
).