Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloaded article from Google RSS News only return Google Images #1003

Open
andisoer opened this issue Jul 21, 2024 · 2 comments
Open

Downloaded article from Google RSS News only return Google Images #1003

andisoer opened this issue Jul 21, 2024 · 2 comments

Comments

@andisoer
Copy link

andisoer commented Jul 21, 2024

For the last few days, the parser using

article.download('https://news.google.com/rss/articles/CBMiTmh0dHBzOi8vd3d3Lm55dGltZXMuY29tLzIwMjQvMDcvMjEvdXMvcG9saXRpY3MvdmFuY2UtdHJ1bXAtY2FtcGFpZ24tcmFsbHkuaHRtbNIBAA?oc=5&hl=en-ID&gl=ID&ceid=ID:en')
article.parse()
print('article.title')
print('article.top_image')

only return Google RSS Images which is

https://lh3.googleusercontent.com/J6_coFbogxhRI9iM864NL_liGXvsQp2AupsKei7z0cNNfDvGUmWUy20nuUhkREQyrpY4bEeIBuc=s0-w300

and the title

Google News

instead of original articles images and titles, any issue on this parser or any update from Google RSS News?

@andisoer andisoer changed the title Downloaded article only return RSS Google Images Downloaded article from Google RSS News only return Google Images Jul 22, 2024
@sunitab55
Copy link

I just tried something with Linkedin newsetters and it doesn't capture anything :/

@Ronkiro
Copy link

Ronkiro commented Sep 20, 2024

There was a update from Google's side, the ID after /article/ used to be a base64 string representing the original website.
Since July, that changed and is not real anymore (community doesn't seems to know how to parse it btw)

Here's a reference: https://gist.github.com/huksley/bc3cb046157a99cd9d1517b32f91a99e

There's some community's member implementation of this code in Python -> https://github.com/SSujitX/google-news-url-decoder/blob/main/googlenewsdecoder/new_decoderv1.py

This requests Google for the URL though, so it may hit some 429's (which are very annoying). But i found no other solution but to do that before sending the URL to newspaper3k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants