Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rss fetch #1

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Rss fetch #1

wants to merge 9 commits into from

Conversation

xuv
Copy link

@xuv xuv commented Oct 11, 2016

Implementation of RSS fetch for Belgian news website.
Should be easy to modify for any RSS feed.

@xuv
Copy link
Author

xuv commented Oct 23, 2016

@j-e-d Not sure you might want to pull it. It's actually kind of specific to the news site Le Soir. I'm running it now on the website Le Monde. And I had to "remove" the "author" feature. Le Monde does not have that. Thought that's the only thing I had to change. RSS is pretty consitent across websites, luckily ;)

@j-e-d
Copy link
Owner

j-e-d commented Oct 23, 2016

I noticed similar problems while testing it with some other rss feeds, maybe we will need to have fields defined in a config file to make it flexible for different sites.

@xuv
Copy link
Author

xuv commented Oct 24, 2016

Yes. Good idea.

@j-e-d
Copy link
Owner

j-e-d commented Dec 5, 2016

I've been playing with this, what I noticed is that many sites that offer RSS don't "re-publish" when they make changes to the original article. Also many don't have a proper id but use the URL as id, if they change the URL when they change title/abstract it will get added as a new article. Getting the "abstract" on a couple was a bit complicated too.

I guess I will make some changes, add comments and merge it as a generic example with the proper disclaimers that some additional work/testing is needed for each site you wish to try and that no guarantees that it will work are given.

@xuv
Copy link
Author

xuv commented Dec 5, 2016

@j-e-d For sure. It's not a ready made solution for all news sources. But as you say, maybe something to adapt quickly to other situations. As for the RSS item ID, I usually either find the ID they use in the URL of the article itself or elsewhere. Even if they change the URL, they usually keep some part of it consistent. I'm also surprised that some news paper don't update their RSS feed when news change. So far, with Le Soir and Le Monde, I have not encountered that problem. Anway, thanks for having taken a look at it. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants