Skip to content

sentiment analysis + newsfeed scraping + great visualization

Notifications You must be signed in to change notification settings

tylermalin/storylinenews

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DESCRIPTION

Storylinenews is a web mining utility that scrapes major newsfeeds to discover
the day's most important topics and determine the overall sentiment of 
coverage on those topics. It includes a log converter compatible with the
excellent 'storylines' visualization technique.

DEPENDENCIES

The project requires version 1.5 of Pattern:
http://www.clips.ua.ac.be/pages/pattern

and the SentiWordNet polarity lexicon (sign up for a research license):
http://sentiwordnet.isti.cnr.it/


INSTRUCTIONS

First install SentiWordNet. It goes in $(PATTERNDIR)/en/wordnet.

Run scraper.py with the location of a feed list as the first argument.

> python scraper.py feeds.json

Let it run for as long as you'd like or indefinitely. At any time you
can run converter.py to generate an XML file compatible 
with Michael Ogawa's Processing implementation of the 'storyline'
cluster graphing algorithm. The 'news.config' file provided in the
data directory will allow the generation of sentiment-displaying
topical storylines.

For more information and the download, visit the storylines website:
http://www.michaelogawa.com/research/storylines/


FEED LIST FORMAT

The feed list is a JSON string dump of a dict of the form

{'FEEDNAME':'feedurl'}

where feedurl points to an RSS or Atom feed.

About

sentiment analysis + newsfeed scraping + great visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published