Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to find and delete all articles? #959

Open
steeljardas opened this issue Jan 6, 2023 · 7 comments
Open

Where to find and delete all articles? #959

steeljardas opened this issue Jan 6, 2023 · 7 comments

Comments

@steeljardas
Copy link

I am using Newspaper3k on around 20k articles, where would I need to go to delete all these articles that Newspaper3k is downloading?

@johnbumgarner
Copy link

johnbumgarner commented Jan 14, 2023

If memoize_articles is not set to False then Newspaper will cache the article's urls and associated data in your system's temp directory. Here are some details on this cache in my Newspaper3k Overview Document.

@NiravJoshi33
Copy link

I believe what @steeljardas is asking is how to delete the cache?

@AndyTheFactory
Copy link

the cache folder is ANCHOR_DIRECTORY

ANCHOR_DIRECTORY = os.path.join(TOP_DIRECTORY, CF_CACHE_DIRECTORY)

normally it would be

/tmp/.newspaper_scraper/feed_category_cache

@NiravJoshi33
Copy link

Thanks @AndyTheFactory

@johnbumgarner
Copy link

@AndyTheFactory Yes, I agree that @steeljardas was looking for a way to delete all the memoize articles. The document that I mentioned contains information on the cache's location.

@AndyTheFactory
Copy link

@johnbumgarner I have read your very good documentation!

your great work inspired me to keep this software alive as a new package https://github.com/AndyTheFactory/newspaper4k

there were a lot of problems and bugs, but i have the sense it's moving in the right direction. I will release a new version pretty soon with a lot of fixes and improvements.

Have a very good new year! and many thanks for your great work!

@johnbumgarner
Copy link

@AndyTheFactory Thanks. I will reference your fork in my document. You reference that newspaper3k was last updated in September 2020. The correct date is September 2018. That is the date of the last code push to PyPI. And you are correct there are a lot of bugs in the current code base. I started a new project called NewsHound, but never pushed the code, because someone wanted to use it commercially. They lost their funding and now I have to revisit the code. The issue that I have found with OpenSource projects is that everyone wants to use them, but few people will put the effort in help someone maintain a project. Good Luck with your fork...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants