Skip to content

Commit

Permalink
Merge branch 'c/vagrany-spacy-fix'
Browse files Browse the repository at this point in the history
  • Loading branch information
m1cr0man committed Mar 14, 2016
2 parents 126659f + 2f0e957 commit 31e84e4
Show file tree
Hide file tree
Showing 9 changed files with 261 additions and 9 deletions.
14 changes: 7 additions & 7 deletions Vagrantfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Vagrant.configure(2) do |config|
sudo apt-get install -y python3
sudo apt-get install -y python-pip
sudo apt-get install -y python-dev
sudo apt-get install -y python-sklearn
sudo apt-get install -y gearman-job-server
sudo apt-get install -y git
sudo apt-get install -y golang
Expand All @@ -83,22 +84,18 @@ Vagrant.configure(2) do |config|
sudo apt-get install -y scala
sudo apt-get install -y ruby
sudo apt-get upgrade -y
sudo apt-get autoremove
sudo apt-get clean
sudo pip install -r /vagrant/script/requirements.txt
sudo apt-get install build-essential python-dev
sudo pip install spacy
sudo python -m spacy.en.download
sudo su -c "gem install sass"
cd /vagrant/server
npm install -y
npm dedupe
npm cache clean
sudo apt-get autoremove
sudo apt-get clean
echo "export GOPATH=/home/vagrant/.go" > /home/vagrant/.profile
echo "export PATH=/vagrant/server/node_modules/.bin:$PATH:" >> /home/vagrant/.profile
mkdir -p /home/vagrant/.go
export GOPATH=/home/vagrant/.go
go get github.com/mikespook/gearman-go/worker
go get github.com/mikespook/gearman-go/worker
go get gopkg.in/mgo.v2
go get gopkg.in/mgo.v2/bson
mkdir -p /home/vagrant/.mongodb
Expand All @@ -115,4 +112,7 @@ Vagrant.configure(2) do |config|
mongo /vagrant/script/vagrant/create_feed_user_db.js
SHELL

# Setup Python
config.vm.provision "shell", path: "script/vagrant/setup_python.sh"

end
18 changes: 18 additions & 0 deletions doc/db/vote.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Vote Database Specifications
============================

The vote collection holds a log of all the votes by users on specific articles.

Example Document
----------------
```js
{
"_id": ObjectId("5099803df3f4948bd2f98391"),
"username": "iandioch",
"article_url": "http://example.com/article",
"feed_url":"https://news.ycombinator.com/rss",
"positive_opinion":true
}
```

The `positive_opinion` field is `true` if a user upvoted an article, and `false` if they downvoted it.
1 change: 1 addition & 0 deletions script/ci
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
./script/aggregator
./script/topics
./script/article_getter
./script/update_opinion
3 changes: 1 addition & 2 deletions script/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
spacy
feedparser
beautifulsoup4
requests
pymongo
gearman
virtualenv
spacy
5 changes: 5 additions & 0 deletions script/update_opinion
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
set -xe
cd update_opinion
python testing.py
cd ..
9 changes: 9 additions & 0 deletions script/vagrant/setup_python.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

export WORKING_DIR=/home/python

virtualenv $WORKING_DIR
chmod +x $WORKING_DIR/bin/*
$WORKING_DIR/bin/activate
$WORKING_DIR/bin/pip install -r /vagrant/script/requirements.txt
$WORKING_DIR/bin/python -m spacy.en.download
63 changes: 63 additions & 0 deletions update_opinion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
Opinion Updater
================

Dependencies
------------

- Python 2.7
- pymongo
- gearman
- Feedlark Adder
- Feedlark Getter

What is this?
-------------

This is the gearman worker that is called whenever a user likes or dislikes an article.

Usage
-----

The worker is called `update-user-model`, and takes the following Gearman data:

```js
{
"username": "iandioch",
"feed_url": "http://news.ycombinator.com/rss",
"article_url": "http://example.com/article",
"positive_opinion": true
}
```

`username` should be the name of the user who just voted on an article. `feed_url` should be the url of the feed that article was taken from. `article_url` should be the link to that specific article. `positive_opinion` should be `true` if a user upvoted an article, and `false` if they downvoted it.


The worker just responds with the following:

```js
{
"status":"success"
}
```

or

```js
{
"status":"error":
"description":"error description"
}
```

How to do tests
---------------

The tests are written with the unittest module in python.

To run them make sure you have all the dependencies and the dbtools are running then run:

$ python testing.py


To add unit tests modify the testing.py file.
Check out the unittest docs for examples and general help.
38 changes: 38 additions & 0 deletions update_opinion/testing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import unittest
from updater import *

class TestOpinionUpdater(unittest.TestCase):
def test_update_topic_counts_positive(self):
prev_topics = {"banana":3, "split":-17}
changes = ["banana", "pie"]
new_topics = update_topic_counts(prev_topics, changes, True)
self.assertTrue("pie" in new_topics and "banana" in new_topics and "split" in new_topics)
self.assertEqual(new_topics["pie"], 1)
self.assertEqual(new_topics["banana"], 4)
self.assertEqual(new_topics["split"], -17)

def test_update_topic_counts_negative(self):
prev_topics = {"banana":3, "split":-17}
changes = ["banana", "pie"]
new_topics = update_topic_counts(prev_topics, changes, False)
self.assertTrue("pie" in new_topics and "banana" in new_topics and "split" in new_topics)
self.assertEqual(new_topics["pie"], -1)
self.assertEqual(new_topics["banana"], 2)
self.assertEqual(new_topics["split"], -17)

def test_get_user_data(self):
init_gearman_client()
data = get_user_data('sully')
self.assertFalse(data is None)
data = get_user_data(1337) # should be no results for non-string usernames
self.assertTrue(data is None)

def test_get_feed_data(self):
init_gearman_client()
data = get_feed_data('https://news.ycombinator.com/rss')
self.assertFalse(data is None)
data = get_feed_data('ftp://example.ml/feed.yaml')
self.assertTrue(data is None)

if __name__ == '__main__':
unittest.main()
119 changes: 119 additions & 0 deletions update_opinion/updater.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import gearman
import bson
from datetime import datetime

gearman_client = None

def log(level, message):
levels = ['INFO:', 'WARNING:', 'ERROR:']
time = datetime.now().strftime('%H:%M %d/%m/%Y')
print(str(time) + " " + levels[level] + " " + str(message))

def update_topic_counts(old_topics, changes, is_positive):
"""modify the user topic weights to reflect the new data"""
diff = 1 if is_positive else -1
for change in changes:
if change in old_topics:
old_topics[change] += diff
else:
old_topics[change] = diff
return old_topics

def add_update_to_db(data):
"""log the given user opinion to the vote db collection"""
req_data = {"database":"feedlark", "collection":"vote", "data":data}
bson_req = bson.BSON.encode(req_data)
gearman_client.submit_job('db-add', str(bson_req))

def update_user_data(username, updates):
"""update the db entry of the user with the given username, with the given dict of updates"""
req_data = {"database":"feedlark", "collection":"user", "data":{"selector":{"username":username}, "updates":updates}}
bson_req = bson.BSON.encode(req_data)
bson_result = bson.BSON(gearman_client.submit_job('db-update', str(bson_req)).result)
result = bson.BSON.decode(bson_result)
if result[u"status"] != u"ok":
log(2, "Error updating user data: " + str(result))

def get_user_data(username):
"""Get the data of user from database"""
req_data = {"database":"feedlark", "collection":"user", "query":{"username":username}, "projection":{}}
bson_req = bson.BSON.encode(req_data)
bson_result = bson.BSON(gearman_client.submit_job('db-get', str(bson_req)).result)
result = bson.BSON.decode(bson_result)
if result[u"status"] != u"ok":
log(2, "Error getting database entry for user " + str(username))
return None
if not "docs" in result:
log(1, "No 'docs' field in results for user " + str(username))
return None
if len(result["docs"]) == 0:
log(1, "No docs returned for user " + str(username))
return None
return result["docs"][0]

def get_feed_data(feed_url):
"""Get the data of a given feed"""
req_data = {"database":"feedlark", "collection":"feed", "query":{"url":feed_url}, "projection":{}}
bson_req = bson.BSON.encode(req_data)
bson_result = bson.BSON(gearman_client.submit_job('db-get', str(bson_req)).result)
result = bson.BSON.decode(bson_result)
if result[u"status"] != u"ok":
log(2, "Error getting database entry for feed " + str(feed_url))
return None
if not "docs" in result or len(result["docs"]) == 0:
log(1, "No docs returned for feed " + str(feed_url))
return None
return result["docs"][0]

def update_user_model(worker, job):
bson_input = bson.BSON(job.data)
job_input = bson_input.decode()
add_update_to_db(job_input)
log(0, 'update-user-model called with data ' + str(job_input))
if not ("username" in job_input and "feed_url" in job_input and "article_url" in job_input and "positive_opinion" in job_input):
log(1, 'Missing field in input: ' + str(job_input))
response = {"status":"error", "description":"Missing field in input."}
bson_response = bson.BSON.encode(response)
return str(bson_response)

log(0, "Getting user data from db")
user_data = get_user_data(job_input["username"])
if user_data is None:
response = {"status":"error", "description":"No user data received from db for user " + str(job_input["username"])}
bson_response = bson.BSON.encode(response)
return str(bson_response)

log(0, "Getting feed data from db")
feed_data = get_feed_data(job_input["feed_url"])
if feed_data is None:
response = {"status":"error", "description":"No feed data receieved from db for feed " + str(job_input["feed_url"])}
bson_response = bson.BSON.encode(response)
return str(bson_response)

log(0, "Updating topic weights")
user_words = user_data['words']
for item in feed_data['items']:
if item['link'] == job_input['article_url']:
topics = item['topics']
user_words = update_topic_counts(user_words, topics, job_input['positive_opinion'])

log(0, "Updating user db with new topic weights")
user_data['words'] = user_words
update_user_data(job_input['username'], user_data)
log(0, "Worker finished.")
response = {"status":"ok"}
bson_response = bson.BSON.encode(response)
return str(bson_response)

def init_gearman_client():
global gearman_client
log(0, "Creating gearman client.")
gearman_client = gearman.GearmanClient(['localhost:4730'])

if __name__ == '__main__':
init_gearman_client()
log(0, "Creating gearman worker 'update-user-model'")
gearman_worker = gearman.GearmanWorker(['localhost:4730'])
gearman_worker.set_client_id('update-user-model')
gearman_worker.register_task('update-user-model', update_user_model)
gearman_worker.work()

0 comments on commit 31e84e4

Please sign in to comment.