Assorted

Assorted/Bitly

The limitation on length of tweets compels user to shorten URLs in their tweets. I have used a python script to resolve short URLs to their extended form. Nearly 2 Million URLs were parsed by this Python script. The purpose of short URL expansion is two folds. [1] To find which URLs points to the same target page (Extended URL). – To infer Tweets cascade. [2] To study the characteristic of topics spread temporally and spatially.

Assorted/Elsevier

There are several source files in this folder. These have been used to study and contrast the behavior of Celebrity users against the Regular users on twitter. Nearly 468 Million Tweets posted by 26 Million Users are studied in this project. The work is published in Elsevier COMCOM journal. Title : “The rich and middle classes on Twitter: Are popular users indeed different from regular users” URL : http://www.sciencedirect.com/science/article/pii/S0140366415002625

Assorted/Hadoop

To get the frequency of all the topics in a dataset of size 18GB, I installed and run the hadoop library on a cluster of two Linux machines and retrieve the topics frequency in a single run. This has significantly reduced our coding efforts and processing time.

Assorted/Yahoo

We observed that only 61% of all 7.39M users in our dataset supplied their location information. No common format was used though. Therefore, we first converted all the extracted locations into a common format as a pair of latitude and longitude coordinates, and then we reverse converted the coordinates to a triplet of city, state and country. We considered options of Yahoo! Placefinder, Google Geocoding and the MapQuest Geocoding APIs to geocode the extracted locations. Since Yahoo provided the maximum free rate limit of 50,000 requests per IP addresses/day, we used it to geocode all locations.

Assorted/NLP

Manual identification of topics in a dataset of 196 Million tweets is impossible. We, therefore, used Reuters OpenCalais web service to automatically extract topics (entities and tags) from all tweets. Since OpenCalais was rate limited, therefore optimum bundle of tweets were queried to OpenCalais. The RDF response from OpenCalais parsed to get topics from the tweets. We obtained 39 million URLs and 7.5 million unique topics from OpenCalais.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Bitly		Bitly
Elsevier		Elsevier
Hadoop/src/org/myorg		Hadoop/src/org/myorg
NLP		NLP
Yahoo		Yahoo
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assorted

Assorted/Bitly

Assorted/Elsevier

Assorted/Hadoop

Assorted/Yahoo

Assorted/NLP

About

Releases

Packages

Languages

aruhela/Assorted

Folders and files

Latest commit

History

Repository files navigation

Assorted

Assorted/Bitly

Assorted/Elsevier

Assorted/Hadoop

Assorted/Yahoo

Assorted/NLP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages