Fetching status blocks for Tweets2011 - hits twitter api limit #13

nigel-v-thomas · 2013-02-02T12:04:50Z

I have been trying to get this tool to download the blocks from Tweets2011 collection, unfortunately current implementation hits the twitter api limit each time.

The Twitter limit on the read api is 180 hits per hour, see http://twitter4j.org/en/api-support.html, and 150 for unauthenticated.

I have tried

to create authenticated request to twitter 1.1 api (since older API is deprecated, and possible removed from March 2013 onwards)
parsing the content out of the web pages directly (a brittle solution!), however this doesn't work with protected accounts and missing pages

Given the number of requests generated by this solution, I am not sure how to build the Tweets2011 corpus.

isoboroff · 2013-02-21T18:13:21Z

This is a longstanding issue that the API-based crawler can't fix. The answer is to use the HTML-based crawlers, which need updating to Twitter's current HTML layout.

nigel-v-thomas · 2013-02-21T18:36:50Z

Thank you for the feedback.

I have tried the html based crawler, and found that it will not work with users who have closed their accounts, or made their tweets private since the original data was gathered.

I am resigned to believe that this approach of fetching status blocks from API or directly from HTML will no longer work, due to API limits and expired or private accounts and tweets.

isoboroff · 2013-02-21T21:48:01Z

We are working on updating the HTML crawler. Stay tuned, or alternatively patches accepted ;-)

nigel-v-thomas · 2013-02-21T22:02:09Z

I have submitted pull request #12 , which uses api to fetch.
I have not committed the code which fetches from HTML, it is a simple hack and suffers from problems described earlier ie expired or private tweets not being available. I can commit code if needed..

andrewyates · 2013-02-26T16:48:35Z

Could you commit the HTML scraping code somewhere for reference?

From what I understand, there's no way around the fact that private tweets and deleted tweets aren't available.

nigel-v-thomas · 2013-02-26T21:21:37Z

@andrewyates @isoboroff
FYI, below is the quick hack to scrape HTML content
nigel-v-thomas@5689ff7

shakirak · 2013-05-23T04:37:41Z

Hi Nigel,

I was trying the link https://github.com/nigel-v-thomas/twitter-tools and still getting the error "Unable to parse text from this, possible change in format"... I am using html scraping....
Am i missing something?????

nigel-v-thomas · 2013-05-23T12:03:13Z

Hi Shakirak, it is quite likely the HTML markup has changed since I updated the code, as I said this solution is far from ideal, it would be much better to use the standard API.
Depending on what your end goal is:

if it is to reconstruct the 2011 corpus, then I am not sure this solution will help, I was not able to do rebuild it.
if it is to work around the API limit, then one way you could proceed is to tweak code for paring HTML markup in this file see lines 65 to 67:
https://github.com/nigel-v-thomas/twitter-tools/blob/5689ff79705f63fc84cdaa40c239ed85c06825a2/src/main/java/cc/twittertools/corpus/data/Status.java
Another solution is to use the Streaming API to workaround the API limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetching status blocks for Tweets2011 - hits twitter api limit #13

Fetching status blocks for Tweets2011 - hits twitter api limit #13

nigel-v-thomas commented Feb 2, 2013

isoboroff commented Feb 21, 2013

nigel-v-thomas commented Feb 21, 2013

isoboroff commented Feb 21, 2013

nigel-v-thomas commented Feb 21, 2013

andrewyates commented Feb 26, 2013

nigel-v-thomas commented Feb 26, 2013

shakirak commented May 23, 2013

nigel-v-thomas commented May 23, 2013

Fetching status blocks for Tweets2011 - hits twitter api limit #13

Fetching status blocks for Tweets2011 - hits twitter api limit #13

Comments

nigel-v-thomas commented Feb 2, 2013

isoboroff commented Feb 21, 2013

nigel-v-thomas commented Feb 21, 2013

isoboroff commented Feb 21, 2013

nigel-v-thomas commented Feb 21, 2013

andrewyates commented Feb 26, 2013

nigel-v-thomas commented Feb 26, 2013

shakirak commented May 23, 2013

nigel-v-thomas commented May 23, 2013