forked from gtank/storylinenews
-
Notifications
You must be signed in to change notification settings - Fork 0
tylermalin/storylinenews
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DESCRIPTION Storylinenews is a web mining utility that scrapes major newsfeeds to discover the day's most important topics and determine the overall sentiment of coverage on those topics. It includes a log converter compatible with the excellent 'storylines' visualization technique. DEPENDENCIES The project requires version 1.5 of Pattern: http://www.clips.ua.ac.be/pages/pattern and the SentiWordNet polarity lexicon (sign up for a research license): http://sentiwordnet.isti.cnr.it/ INSTRUCTIONS First install SentiWordNet. It goes in $(PATTERNDIR)/en/wordnet. Run scraper.py with the location of a feed list as the first argument. > python scraper.py feeds.json Let it run for as long as you'd like or indefinitely. At any time you can run converter.py to generate an XML file compatible with Michael Ogawa's Processing implementation of the 'storyline' cluster graphing algorithm. The 'news.config' file provided in the data directory will allow the generation of sentiment-displaying topical storylines. For more information and the download, visit the storylines website: http://www.michaelogawa.com/research/storylines/ FEED LIST FORMAT The feed list is a JSON string dump of a dict of the form {'FEEDNAME':'feedurl'} where feedurl points to an RSS or Atom feed.
About
sentiment analysis + newsfeed scraping + great visualization
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published