Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File downloaders rework #229

Open
EvilDrPurple opened this issue Sep 7, 2023 · 0 comments
Open

File downloaders rework #229

EvilDrPurple opened this issue Sep 7, 2023 · 0 comments
Assignees

Comments

@EvilDrPurple
Copy link
Contributor

Context

Our file downloaders could use a bit of a rework. They seem overly complex and only able to support a few different file types; with various modules calling to each other and requiring a specific order that is unclear. Not to mention all the defunct scripts littered about. I believe a much more straightforward approach is possible and will go a long way in helping people understand how and when to use our util modules. During work on #227, I found this way that will download any file type when provided with a download url:

r = requests.get(url, stream=True)
with open(file_path, 'wb') as fd:
    for chunk in r.iter_content():
        fd.write(chunk)

SEE: downloaders.py, get_files.py, muckrock_scraper.py

Requirements

  • Should be simple and easy for people to understand how to consume the module(s) and how they work
  • Should be clear what modules, in what order, and when they should be called
  • Should not break any existing functionality of scrapers or other util scripts

Docs

  • Docs related to the file downloaders and util scripts should be updated where necessary
  • New docs should be written to explain how to use and consume the file downloaders

Open questions

  • This will likely be time consuming to understand what's going on with the code, what functionality should be kept, and how to untangle it
  • Perhaps think about keeping an entire pipeline of functionality in one folder for organizational purposes
@EvilDrPurple EvilDrPurple self-assigned this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant