-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
261 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Vote Database Specifications | ||
============================ | ||
|
||
The vote collection holds a log of all the votes by users on specific articles. | ||
|
||
Example Document | ||
---------------- | ||
```js | ||
{ | ||
"_id": ObjectId("5099803df3f4948bd2f98391"), | ||
"username": "iandioch", | ||
"article_url": "http://example.com/article", | ||
"feed_url":"https://news.ycombinator.com/rss", | ||
"positive_opinion":true | ||
} | ||
``` | ||
|
||
The `positive_opinion` field is `true` if a user upvoted an article, and `false` if they downvoted it. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,4 @@ | |
./script/aggregator | ||
./script/topics | ||
./script/article_getter | ||
./script/update_opinion |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,6 @@ | ||
spacy | ||
feedparser | ||
beautifulsoup4 | ||
requests | ||
pymongo | ||
gearman | ||
virtualenv | ||
spacy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/bin/bash | ||
set -xe | ||
cd update_opinion | ||
python testing.py | ||
cd .. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/bin/bash | ||
|
||
export WORKING_DIR=/home/python | ||
|
||
virtualenv $WORKING_DIR | ||
chmod +x $WORKING_DIR/bin/* | ||
$WORKING_DIR/bin/activate | ||
$WORKING_DIR/bin/pip install -r /vagrant/script/requirements.txt | ||
$WORKING_DIR/bin/python -m spacy.en.download |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
Opinion Updater | ||
================ | ||
|
||
Dependencies | ||
------------ | ||
|
||
- Python 2.7 | ||
- pymongo | ||
- gearman | ||
- Feedlark Adder | ||
- Feedlark Getter | ||
|
||
What is this? | ||
------------- | ||
|
||
This is the gearman worker that is called whenever a user likes or dislikes an article. | ||
|
||
Usage | ||
----- | ||
|
||
The worker is called `update-user-model`, and takes the following Gearman data: | ||
|
||
```js | ||
{ | ||
"username": "iandioch", | ||
"feed_url": "http://news.ycombinator.com/rss", | ||
"article_url": "http://example.com/article", | ||
"positive_opinion": true | ||
} | ||
``` | ||
|
||
`username` should be the name of the user who just voted on an article. `feed_url` should be the url of the feed that article was taken from. `article_url` should be the link to that specific article. `positive_opinion` should be `true` if a user upvoted an article, and `false` if they downvoted it. | ||
|
||
|
||
The worker just responds with the following: | ||
|
||
```js | ||
{ | ||
"status":"success" | ||
} | ||
``` | ||
|
||
or | ||
|
||
```js | ||
{ | ||
"status":"error": | ||
"description":"error description" | ||
} | ||
``` | ||
|
||
How to do tests | ||
--------------- | ||
|
||
The tests are written with the unittest module in python. | ||
|
||
To run them make sure you have all the dependencies and the dbtools are running then run: | ||
|
||
$ python testing.py | ||
|
||
|
||
To add unit tests modify the testing.py file. | ||
Check out the unittest docs for examples and general help. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import unittest | ||
from updater import * | ||
|
||
class TestOpinionUpdater(unittest.TestCase): | ||
def test_update_topic_counts_positive(self): | ||
prev_topics = {"banana":3, "split":-17} | ||
changes = ["banana", "pie"] | ||
new_topics = update_topic_counts(prev_topics, changes, True) | ||
self.assertTrue("pie" in new_topics and "banana" in new_topics and "split" in new_topics) | ||
self.assertEqual(new_topics["pie"], 1) | ||
self.assertEqual(new_topics["banana"], 4) | ||
self.assertEqual(new_topics["split"], -17) | ||
|
||
def test_update_topic_counts_negative(self): | ||
prev_topics = {"banana":3, "split":-17} | ||
changes = ["banana", "pie"] | ||
new_topics = update_topic_counts(prev_topics, changes, False) | ||
self.assertTrue("pie" in new_topics and "banana" in new_topics and "split" in new_topics) | ||
self.assertEqual(new_topics["pie"], -1) | ||
self.assertEqual(new_topics["banana"], 2) | ||
self.assertEqual(new_topics["split"], -17) | ||
|
||
def test_get_user_data(self): | ||
init_gearman_client() | ||
data = get_user_data('sully') | ||
self.assertFalse(data is None) | ||
data = get_user_data(1337) # should be no results for non-string usernames | ||
self.assertTrue(data is None) | ||
|
||
def test_get_feed_data(self): | ||
init_gearman_client() | ||
data = get_feed_data('https://news.ycombinator.com/rss') | ||
self.assertFalse(data is None) | ||
data = get_feed_data('ftp://example.ml/feed.yaml') | ||
self.assertTrue(data is None) | ||
|
||
if __name__ == '__main__': | ||
unittest.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
import gearman | ||
import bson | ||
from datetime import datetime | ||
|
||
gearman_client = None | ||
|
||
def log(level, message): | ||
levels = ['INFO:', 'WARNING:', 'ERROR:'] | ||
time = datetime.now().strftime('%H:%M %d/%m/%Y') | ||
print(str(time) + " " + levels[level] + " " + str(message)) | ||
|
||
def update_topic_counts(old_topics, changes, is_positive): | ||
"""modify the user topic weights to reflect the new data""" | ||
diff = 1 if is_positive else -1 | ||
for change in changes: | ||
if change in old_topics: | ||
old_topics[change] += diff | ||
else: | ||
old_topics[change] = diff | ||
return old_topics | ||
|
||
def add_update_to_db(data): | ||
"""log the given user opinion to the vote db collection""" | ||
req_data = {"database":"feedlark", "collection":"vote", "data":data} | ||
bson_req = bson.BSON.encode(req_data) | ||
gearman_client.submit_job('db-add', str(bson_req)) | ||
|
||
def update_user_data(username, updates): | ||
"""update the db entry of the user with the given username, with the given dict of updates""" | ||
req_data = {"database":"feedlark", "collection":"user", "data":{"selector":{"username":username}, "updates":updates}} | ||
bson_req = bson.BSON.encode(req_data) | ||
bson_result = bson.BSON(gearman_client.submit_job('db-update', str(bson_req)).result) | ||
result = bson.BSON.decode(bson_result) | ||
if result[u"status"] != u"ok": | ||
log(2, "Error updating user data: " + str(result)) | ||
|
||
def get_user_data(username): | ||
"""Get the data of user from database""" | ||
req_data = {"database":"feedlark", "collection":"user", "query":{"username":username}, "projection":{}} | ||
bson_req = bson.BSON.encode(req_data) | ||
bson_result = bson.BSON(gearman_client.submit_job('db-get', str(bson_req)).result) | ||
result = bson.BSON.decode(bson_result) | ||
if result[u"status"] != u"ok": | ||
log(2, "Error getting database entry for user " + str(username)) | ||
return None | ||
if not "docs" in result: | ||
log(1, "No 'docs' field in results for user " + str(username)) | ||
return None | ||
if len(result["docs"]) == 0: | ||
log(1, "No docs returned for user " + str(username)) | ||
return None | ||
return result["docs"][0] | ||
|
||
def get_feed_data(feed_url): | ||
"""Get the data of a given feed""" | ||
req_data = {"database":"feedlark", "collection":"feed", "query":{"url":feed_url}, "projection":{}} | ||
bson_req = bson.BSON.encode(req_data) | ||
bson_result = bson.BSON(gearman_client.submit_job('db-get', str(bson_req)).result) | ||
result = bson.BSON.decode(bson_result) | ||
if result[u"status"] != u"ok": | ||
log(2, "Error getting database entry for feed " + str(feed_url)) | ||
return None | ||
if not "docs" in result or len(result["docs"]) == 0: | ||
log(1, "No docs returned for feed " + str(feed_url)) | ||
return None | ||
return result["docs"][0] | ||
|
||
def update_user_model(worker, job): | ||
bson_input = bson.BSON(job.data) | ||
job_input = bson_input.decode() | ||
add_update_to_db(job_input) | ||
log(0, 'update-user-model called with data ' + str(job_input)) | ||
if not ("username" in job_input and "feed_url" in job_input and "article_url" in job_input and "positive_opinion" in job_input): | ||
log(1, 'Missing field in input: ' + str(job_input)) | ||
response = {"status":"error", "description":"Missing field in input."} | ||
bson_response = bson.BSON.encode(response) | ||
return str(bson_response) | ||
|
||
log(0, "Getting user data from db") | ||
user_data = get_user_data(job_input["username"]) | ||
if user_data is None: | ||
response = {"status":"error", "description":"No user data received from db for user " + str(job_input["username"])} | ||
bson_response = bson.BSON.encode(response) | ||
return str(bson_response) | ||
|
||
log(0, "Getting feed data from db") | ||
feed_data = get_feed_data(job_input["feed_url"]) | ||
if feed_data is None: | ||
response = {"status":"error", "description":"No feed data receieved from db for feed " + str(job_input["feed_url"])} | ||
bson_response = bson.BSON.encode(response) | ||
return str(bson_response) | ||
|
||
log(0, "Updating topic weights") | ||
user_words = user_data['words'] | ||
for item in feed_data['items']: | ||
if item['link'] == job_input['article_url']: | ||
topics = item['topics'] | ||
user_words = update_topic_counts(user_words, topics, job_input['positive_opinion']) | ||
|
||
log(0, "Updating user db with new topic weights") | ||
user_data['words'] = user_words | ||
update_user_data(job_input['username'], user_data) | ||
log(0, "Worker finished.") | ||
response = {"status":"ok"} | ||
bson_response = bson.BSON.encode(response) | ||
return str(bson_response) | ||
|
||
def init_gearman_client(): | ||
global gearman_client | ||
log(0, "Creating gearman client.") | ||
gearman_client = gearman.GearmanClient(['localhost:4730']) | ||
|
||
if __name__ == '__main__': | ||
init_gearman_client() | ||
log(0, "Creating gearman worker 'update-user-model'") | ||
gearman_worker = gearman.GearmanWorker(['localhost:4730']) | ||
gearman_worker.set_client_id('update-user-model') | ||
gearman_worker.register_task('update-user-model', update_user_model) | ||
gearman_worker.work() |