Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer recognizing reddit RSS feeds #83

Open
duckunix opened this issue Mar 11, 2022 · 6 comments
Open

No longer recognizing reddit RSS feeds #83

duckunix opened this issue Mar 11, 2022 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@duckunix
Copy link

This started in the last day or two for me. I am using the latest release version release-2.4.
When trying to run, I get:

error processing https://www.reddit.com/r/swaywm/.rss - error parsing https://www.reddit.com/r/swaywm/.rss contents: Failed to detect feed type
error processing https://www.reddit.com/r/OPNsenseFirewall/.rss - error parsing https://www.reddit.com/r/OPNsenseFirewall/.rss contents: Failed to detect feed type

And so on for all my reddit entries.

Any thoughts?

@skx
Copy link
Owner

skx commented Mar 12, 2022

I remember when I first put the application together that Reddit didn't like it unless I setup a custom user-agent. They're a bit strict about blocking access.

So there's an obvious suspicion that the feed-request is just getting blocked/filtered/broken at their side. Can you download the feed(s) with curl, successfully?

If it's broken for everything then it's clearly their fault. If you can download via curl, but not via the app then it might be something I can fix.

For what it's worth my own feed (of "private inbox" messages) continues to work so it might not necessarily be something that is globally broken.

@skx skx self-assigned this Mar 12, 2022
@skx skx added the question Further information is requested label Mar 12, 2022
@duckunix
Copy link
Author

Oddly, wget works just fine, but when I use curl, I get:



<!doctype html>
<html>
  <head>
    <title>Too Many Requests</title>
    <style>
      body {
          font: small verdana, arial, helvetica, sans-serif;
          width: 600px;
          margin: 0 auto;
      }

      h1 {
          height: 40px;
          background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;
      }
    </style>
  </head>
  <body>
    <h1>whoa there, pardner!</h1>
    


<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>

<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>

<p>please wait 8 second(s) and try again.</p>

    <p>as a reminder to developers, we recommend that clients make no
    more than <a href="http://github.com/reddit/reddit/wiki/API">one
    request every two seconds</a> to avoid seeing this message.</p>
  </body>
</html>

So, is there someway for me to put a sleep before/after a call to reddit?

Thanks!

@duckunix
Copy link
Author

BTW:

grep -c reddit.com ~/.rss2email/feeds.txt
19

@skx
Copy link
Owner

skx commented Mar 13, 2022

Oddly, wget works just fine,

Then I'd probably suggest they're using the User-Agent header to differentiate the two requests. You might try changing your local agent. Something like this in your feed-list:

https://reddit.com/....
  - user-agent: my-safe-bot/1.0

As for sleeping between feed-requests? I'm afraid not, though it does seem like something that could be added. I could add:

http://example.com/foo
  - sleep: 10
http://example.net/blah.rss
  - sleep: 20

That would give a ten second sleep before fetching the first feed, and a twenty-second delay before the second.

Added that in #84 - along with a simple heuristic that adds a delay automatically if the feed being fetched is from the same hostname as the previous request. So assuming your feed contains:

reddit...
reddit...
reddit..
example.com...
example.com..

you won't need to make any config-file changes, it'll delay automatically.

skx added a commit that referenced this issue Mar 13, 2022
Added support for a per-feed `sleep` setting, which can force a sleep
after making a feed-request.

This was inspired by #83.
@duckunix
Copy link
Author

So, using the version 2.5, it is still not working for me on reddit. :(
This is my test feeds.txt:

https://www.reddit.com/r/swaywm/.rss
 - template:reddit.tmpl
 - user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
https://www.reddit.com/r/OPNsenseFirewall/.rss
 - template:reddit.tmpl
 - user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1

Which hopefully would be working with the delay and the user-agent string, but still no joy:

time rss2email cron -verbose <email@rededicated>
Fetching feed: https://www.reddit.com/r/swaywm/.rss

Fetching from same host as previous feed, www.reddit.com, adding 5s delay
Fetching feed: https://www.reddit.com/r/OPNsenseFirewall/.rss

Skipping the prune-step because we saw errors processing our feed(s)

error processing https://www.reddit.com/r/swaywm/.rss - error parsing https://www.reddit.com/r/swaywm/.rss contents: Failed to detect feed type
error processing https://www.reddit.com/r/OPNsenseFirewall/.rss - error parsing https://www.reddit.com/r/OPNsenseFirewall/.rss contents: Failed to detect feed type

real    0m5.173s
user    0m0.038s
sys     0m0.018s

Any thought, or should I go look for something to build custom RSS feed for my reddit feeds?

Thanks,
d

@skx
Copy link
Owner

skx commented Mar 14, 2022

I'm sorry to hear that the recent delay didn't help, nor the user-agent switch.

Using some other wrapper, to fetch feeds from reddit, and present anew which you can then fetch locally should work - but I admit I'm not really too sure what options are out there, or how likely they are to get blocked in the future either. (Feedburner?)

But for this project I'm not sure there's any more useful changes I can make - I could add our version number to the default user-agent, but nothing else comes to mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants