Skip to content

Commit

Permalink
3309 - Update pender to use twitter's v2 API (#371)
Browse files Browse the repository at this point in the history
* add a TwitterClient class to deal with requests to the v2 api

- I used Rack::Utils.build_nested_query(params) instead of Rails' to_query
because the first one keeps the order of hash when it converts to the query. I think that will make it easier to read when debugging.

- We have a provider_instagram that also does the api call, it's very similar to what we were doing in the class. So I followed the structure we used for that provider, making it a Concern, instead of adding a new Class.

- The url the new api returns is the url linked in the profile, not the twitter url. So for now I'm only combining the twitter url with the username to get the author url.

* move twitter parser integration test to a dedicated folder

- We want to separate the integration tests from the unit tests, so that
in the future it can be easier for us to choose when we want to run them. We want to do this to all parsers

* update twitter item tests

- We don't need that many integration tests, so we are moving some of them back to test/models/parser/twitter_item_test.rb, and updating them to not make live requests, instead they will be stubbed.

- I added tests to check the basic request functionality that we now have.
And removed "assigns values to hash from the API response" since that is already being tested in "it makes a get request to the tweet lookup endpoint successfully"

- "should decode html entities" was removed because that happens
inside Media and is not done by the individual parser, which means
the test actually fails (as it should)

- fake_tweet and fake_twitter_user were removed, since they used
methods from the old Twitter gem. Now we are stubbing a response from
our new method: tweet_lookup

- added .squish to parsed_data['raw']['api']['data'][0]['text'] to clean up
line breaks from title and description. Our test was failling because it was
not being removed. also since title and description are the same, I just
set the description to be the same as the title instead of parsing twice.

- removed the test for truncated text, that behavior is no longer present
in the v2 api, only retweets might be truncated (we don't fetch those),
and the way to deal with it is different. It does not take truncated as a
query param.

- removed storing oembed test because that happens inside Media and not
the twitter profile parser

- removed old error handling behavior tests and added new ones

* remove twitter spec

I think this was relying on the twitter gem error handling,
so I don't think it makes sense to keep this for now.

* update twitter config on config.example

* remove twitter gem

* update archiver_worker_test

now they work with the twitter links, but since twitter is a bit
unstable regarding changes, we should probably avoid using twitter links
where it isn't absolutely needed

* update according to Christa's review

main notes:

- instead of re-raising the error inside the provider, we are notifying
sentry and returning an errors hash.
#371 (comment)

- we re-wrote the parsers according to what we feel is 'safer' moving
forward: using merge! to set the defaults.

- I had to update some of page_item_tests, so I used this as an opportunity
to move the integration tests to their own file. (we are working on moving
all the integration tests to their own files, separated from the unit ones)

- Updated the error tests, now we test 3 scenarios:
200 response with error in json, non-200 response, exception in Net::HTTP

- Updated to use dig, ie. parsed_data.dig('raw','api','data',0). this will
return nil if data is missing, before it would raise an error for invalid access

* update test shouldn't error when cannot get twitter author url

when we get_twitter_metadata in the base parser, we check if there is
a twitter username, if there is an username, we get the twitter author_url.
if there isn't an username, there won't be an author_url, it shouldn't
error when that happens

* update errors hash and error testing

I think it's better if we return the same keys inside errors as the
twitter api, it will make it easier to test.

* update how we deal with picture inside twitter item

if there is no picture, it should be an empty string
but it doesn't always make sense to test for it's presence, because
if we get an error, it will be set to a string inside Media and
not the parser

---------

Co-authored-by: Caio Almeida <[email protected]>
  • Loading branch information
vasconsaurus and caiosba authored Aug 14, 2023
1 parent 5ffd289 commit 293c11c
Show file tree
Hide file tree
Showing 24 changed files with 654 additions and 931 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
language: minimal
before_install:
- openssl aes-256-cbc -K $encrypted_8cec9149bf7a_key -iv $encrypted_8cec9149bf7a_iv -in config/config.yml.enc -out config/config.yml -d
- openssl aes-256-cbc -K $encrypted_3491736328a2_key -iv $encrypted_3491736328a2_iv -in config/config.yml.enc -out config/config.yml -d
- cp config/database.yml.example config/database.yml
- cp config/sidekiq.yml.example config/sidekiq.yml
- echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin
Expand Down
1 change: 0 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ gem 'yt', '~> 0.25.5'
gem 'rswag-api'
gem 'rswag-ui'
gem 'sass-rails'
gem 'twitter'
gem 'open_uri_redirections', require: false
gem 'postrank-uri', git: 'https://github.com/postrank-labs/postrank-uri.git', ref: '485ac46', require: false # Ruby 3.0 support, as of 2/6/23 no gem relaease
gem 'retryable'
Expand Down
39 changes: 0 additions & 39 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,6 @@ GEM
aws-eventstream (~> 1, >= 1.0.2)
benchmark-ips (2.12.0)
bindex (0.8.1)
buftok (0.3.0)
builder (3.2.4)
byebug (11.1.3)
codeclimate-test-reporter (1.0.8)
Expand Down Expand Up @@ -130,14 +129,8 @@ GEM
thor (>= 0.19, < 2)
diff-lcs (1.5.0)
docile (1.1.5)
domain_name (0.5.20190701)
unf (>= 0.0.5, < 1.0.0)
equalizer (0.0.11)
erubi (1.12.0)
ffi (1.15.5)
ffi-compiler (1.0.1)
ffi (>= 1.0.0)
rake
gem-licenses (0.2.2)
get_process_mem (0.2.7)
ffi (~> 1.0)
Expand All @@ -150,23 +143,12 @@ GEM
heapy (0.2.0)
thor
htmlentities (4.3.4)
http (5.1.1)
addressable (~> 2.8)
http-cookie (~> 1.0)
http-form_data (~> 2.2)
llhttp-ffi (~> 0.4.0)
http-cookie (1.0.5)
domain_name (~> 0.5)
http-form_data (2.3.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
jmespath (1.6.2)
json (2.6.3)
json-schema (2.8.1)
addressable (>= 2.4)
llhttp-ffi (0.4.0)
ffi-compiler (~> 1.0)
rake (~> 13.0)
lograge (0.12.0)
actionpack (>= 4)
activesupport (>= 4)
Expand All @@ -184,8 +166,6 @@ GEM
net-pop
net-smtp
marcel (1.0.2)
memoizable (0.4.2)
thread_safe (~> 0.3, >= 0.3.1)
memory_profiler (1.0.1)
method_source (1.0.0)
mini_histogram (0.3.1)
Expand All @@ -195,8 +175,6 @@ GEM
minitest-retry (0.2.2)
minitest (>= 5.0)
mocha (1.14.0)
multipart-post (2.3.0)
naught (1.1.0)
net-http (0.3.2)
uri
net-imap (0.3.6)
Expand Down Expand Up @@ -388,7 +366,6 @@ GEM
connection_pool (>= 2.2.2)
rack (~> 2.0)
redis (>= 4.2.0)
simple_oauth (0.3.1)
simplecov (0.13.0)
docile (~> 1.1.0)
json (>= 1.8, < 3)
Expand All @@ -409,25 +386,10 @@ GEM
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
thor (1.2.2)
thread_safe (0.3.6)
tilt (2.2.0)
timeout (0.4.0)
twitter (8.0.0)
addressable (~> 2.3)
buftok (~> 0.3.0)
equalizer (~> 0.0.11)
http (~> 5.1)
http-form_data (~> 2.3)
llhttp-ffi (~> 0.4.0)
memoizable (~> 0.4.0)
multipart-post (~> 2.0)
naught (~> 1.0)
simple_oauth (~> 0.3.0)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.8.2)
unicode-display_width (2.4.2)
uri (0.12.2)
web-console (3.5.1)
Expand Down Expand Up @@ -513,7 +475,6 @@ DEPENDENCIES
simplecov-console
spring
sprockets (= 3.7.2)
twitter
web-console (~> 3.5.1)
webmock
yt (~> 0.25.5)
Expand Down
57 changes: 48 additions & 9 deletions app/models/concerns/provider_twitter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,61 @@
module ProviderTwitter
extend ActiveSupport::Concern

class ApiError < StandardError; end

BASE_URI = "https://api.twitter.com/2/"

def oembed_url(_ = nil)
"https://publish.twitter.com/oembed?url=#{self.url}"
end

def tweet_lookup(tweet_id)
params = {
"ids": tweet_id,
"tweet.fields": "author_id,created_at,text",
"expansions": "author_id,attachments.media_keys",
"user.fields": "profile_image_url,username,url",
"media.fields": "url",
}

get "tweets", params
end

def user_lookup_by_username(username)
params = {
"usernames": username,
"user.fields": "profile_image_url,name,username,description,created_at,url",
}

get "users/by", params
end

private

def handle_twitter_exceptions
def get(path, params)
uri = URI(URI.join(BASE_URI, path))
uri.query = Rack::Utils.build_query(params)

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

headers = {
"Authorization": "Bearer #{PenderConfig.get('twitter_bearer_token')}",
}

request = Net::HTTP::Get.new(uri.request_uri, headers)

begin
yield
rescue Twitter::Error::TooManyRequests => e
raise Pender::Exception::ApiLimitReached.new(e.rate_limit.reset_in)
rescue Twitter::Error => error
PenderSentry.notify(error, url: url)
@parsed_data[:raw][:api] = { error: { message: "#{error.class}: #{error.code} #{error.message}", code: Lapis::ErrorCodes::const_get('INVALID_VALUE') }}
Rails.logger.warn level: 'WARN', message: "[Parser] #{error.message}", url: url, code: error.code, error_class: error.class
return
response = http.request(request)
raise ApiError.new("#{response.code} - #{response.message}") unless response.code.to_i < 400
JSON.parse(response.body)
rescue StandardError => e
PenderSentry.notify(e, url: url, response_body: response&.body)
{ 'errors' => [{
title: "#{e&.class} - #{e&.message}",
detail: response&.body
}]
}
end
end

Expand Down
17 changes: 1 addition & 16 deletions app/models/parser/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -73,15 +73,6 @@ def parse_data_for_parser(doc, original_url, jsonld_array)
raise NotImplementedError.new("Parser subclasses must implement parse_data_for_parser")
end

def twitter_client
@twitter_client ||= Twitter::REST::Client.new do |config|
config.consumer_key = PenderConfig.get('twitter_consumer_key')
config.consumer_secret = PenderConfig.get('twitter_consumer_secret')
config.access_token = PenderConfig.get('twitter_access_token')
config.access_token_secret = PenderConfig.get('twitter_access_token_secret')
end
end

def ignore_url?(url)
self.ignored_urls.each do |item|
if url.match?(item[:pattern])
Expand Down Expand Up @@ -166,13 +157,7 @@ def get_twitter_metadata

def twitter_author_url(username)
return if bad_username?(username)
begin
twitter_client.user(username)&.url&.to_s
rescue Twitter::Error => e
PenderSentry.notify(e, url: url, username: username)
Rails.logger.warn level: 'WARN', message: "[Parser] #{e.message}", username: username, error_class: e.class
nil
end
"https://twitter.com/" + username.gsub("@","")
end

def bad_username?(value)
Expand Down
76 changes: 42 additions & 34 deletions app/models/parser/twitter_item.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,48 +18,56 @@ def patterns

# Main function for class
def parse_data_for_parser(_doc, _original_url, _jsonld_array)
@url.gsub!(/(%23|#)!\//, '')
@url = replace_subdomain_pattern(url)
parts = url.match(TWITTER_ITEM_URL)
user, id = parts['user'], parts['id']

@parsed_data['raw']['api'] = {}
handle_twitter_exceptions do
@parsed_data['raw']['api'] = twitter_client.status(id, tweet_mode: 'extended').as_json
handle_exceptions(StandardError) do
@url.gsub!(/(%23|#)!\//, '')
@url.gsub!(/\s/, '')
@url = replace_subdomain_pattern(url)

parts = url.match(TWITTER_ITEM_URL)
user, id = parts['user'], parts['id']

@parsed_data.merge!(
external_id: id,
username: '@' + user,
author_url: get_author_url(user)
)

@parsed_data['raw']['api'] = tweet_lookup(id)
@parsed_data[:error] = parsed_data.dig('raw', 'api', 'errors')

if @parsed_data[:error]
@parsed_data.merge!(
author_name: user,
)
elsif @parsed_data[:error].nil?
raw_data = parsed_data.dig('raw','api','data',0)
raw_user_data = parsed_data.dig('raw','api','includes','users',0)

@parsed_data.merge!({
picture: get_twitter_item_picture(parsed_data),
title: raw_data['text'].squish,
description: raw_data['text'].squish,
author_picture: raw_user_data['profile_image_url'].gsub('_normal', ''),
published_at: raw_data['created_at'],
html: html_for_twitter_item(url),
author_name: raw_user_data['name'],
})
end
end
@parsed_data[:error] = parsed_data.dig(:raw, :api, :error)
@parsed_data.merge!({
external_id: id,
username: '@' + user,
title: stripped_title(parsed_data),
description: parsed_data.dig('raw', 'api', 'text') || parsed_data.dig('raw', 'api', 'full_text'),
picture: picture_url(parsed_data),
author_picture: author_picture_url(parsed_data),
published_at: parsed_data.dig('raw', 'api', 'created_at'),
html: html_for_twitter_item(parsed_data, url),
author_name: parsed_data.dig('raw', 'api', 'user', 'name'),
author_url: twitter_author_url(user) || RequestHelper.top_url(url)
})
parsed_data
end

def stripped_title(data)
title = (data.dig('raw', 'api', 'text') || data.dig('raw', 'api', 'full_text'))
title.gsub(/\s+/, ' ') if title
def get_author_url(user)
'https://twitter.com/' + user
end

def author_picture_url(data)
picture_url = data.dig('raw', 'api', 'user', 'profile_image_url_https')
picture_url.gsub('_normal', '') if picture_url
def get_twitter_item_picture(parsed_data)
return unless parsed_data.dig('raw', 'api', 'includes')
item_media = parsed_data.dig('raw', 'api', 'includes', 'media')
item_media ? item_media.dig(0, 'url') : ''
end

def picture_url(data)
item_media = data.dig('raw', 'api', 'entities', 'media')
(item_media.dig(0, 'media_url_https') || item_media.dig(0, 'media_url')) if item_media
end

def html_for_twitter_item(data, url)
return '' unless data.dig(:raw, :api, :error).blank?
def html_for_twitter_item(url)
'<blockquote class="twitter-tweet">' +
'<a href="' + url + '"></a>' +
'</blockquote>' +
Expand Down
51 changes: 30 additions & 21 deletions app/models/parser/twitter_profile.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ def type

def patterns
[
/^https?:\/\/(www\.)?twitter\.com\/([^\/]+)$/,
/^https?:\/\/(0|m|mobile)\.twitter\.com\/([^\/]+)$/
/^https?:\/\/(www\.)?twitter\.com\/(?<username>[^\/]+)$/,
/^https?:\/\/(0|m|mobile)\.twitter\.com\/(?<username>[^\/]+)$/
]
end
end
Expand All @@ -19,26 +19,35 @@ def patterns

# Main function for class
def parse_data_for_parser(doc, _original_url, _jsonld_array)
@url = replace_subdomain_pattern(url)
username = url.match(/^https?:\/\/(www\.)?twitter\.com\/([^\/]+)$/)[2]
handle_exceptions(StandardError) do
@url.gsub!(/\s/, '')
@url = replace_subdomain_pattern(url)
username = compare_patterns(@url, self.patterns, 'username')

@parsed_data.merge!(
url: url,
external_id: username,
username: '@' + username,
title: username,
)

@parsed_data[:raw][:api] = user_lookup_by_username(username)
@parsed_data[:error] = parsed_data.dig('raw', 'api', 'errors')

if @parsed_data[:error]
@parsed_data.merge!(author_name: username)
elsif @parsed_data[:error].nil?
raw_data = parsed_data.dig('raw', 'api', 'data', 0)

@parsed_data[:raw][:api] = {}
handle_twitter_exceptions do
@parsed_data[:raw][:api] = twitter_client.user(username).as_json
picture_url = parsed_data[:raw][:api][:profile_image_url_https].gsub('_normal', '')
set_data_field('picture', picture_url)
set_data_field('author_picture', picture_url)
end
@parsed_data[:error] = parsed_data.dig(:raw, :api, :error)
set_data_field('title', parsed_data.dig(:raw, :api, :name), username)
@parsed_data.merge!({
url: url,
external_id: username,
username: '@' + username,
author_name: parsed_data[:title],
description: parsed_data.dig(:raw, :api, :description),
published_at: parsed_data.dig(:raw, :api, :created_at),
})
@parsed_data.merge!({
picture: raw_data['profile_image_url'].gsub('_normal', ''),
author_name: raw_data['name'],
author_picture: raw_data['profile_image_url'].gsub('_normal', ''),
description: raw_data['description'].squish,
published_at: raw_data['created_at']
})
end
end
parsed_data
end
end
Expand Down
Binary file modified config/config.yml.enc
Binary file not shown.
5 changes: 1 addition & 4 deletions config/config.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,7 @@ development: &default
#
# REQUIRED for Twitter posts
#
twitter_consumer_key: # '<TWITTER APP CONSUMER KEY>'
twitter_consumer_secret: # '<TWITTER APP CONSUMER SECRET>'
twitter_access_token: # '<TWITTER APP ACCESS TOKEN>'
twitter_access_token_secret: # '<TWITTER APP ACCESS TOKEN SECRET>'
twitter_bearer_token: # '<TWITTER API BEARER TOKEN>'

# Facebook API
#
Expand Down
Loading

0 comments on commit 293c11c

Please sign in to comment.