Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create clean/ca/sacramento_sheriffs.py #23

Open
3 tasks
Tracked by #121
svahanvaty opened this issue Apr 17, 2024 · 0 comments · May be fixed by #166
Open
3 tasks
Tracked by #121

Create clean/ca/sacramento_sheriffs.py #23

svahanvaty opened this issue Apr 17, 2024 · 0 comments · May be fixed by #166
Labels
enhancement New feature or request

Comments

@svahanvaty
Copy link

svahanvaty commented Apr 17, 2024

agency slug (proposed): ca_sac_county_sheriff
module: *clean/ca/sacramento_sheriffs.py`
url: https://www.sacsheriff.com/pages/released_cases.php
Tasks

  • Scrape links to drop box for each instance
  • Force download files in drop box links
  • Recursively explore folders within folders and download files within each of them

Scraping Plan
The Sacramento County Sheriff’s Department website should be pretty straightforward to scrape, since all the links to dropbox are provided in links on one web page.

Next steps will be to get the metadata in JSON from HTML, figure out how to download files from dropbox links and make sure we are comprehensively downloading all files in nested Dropbox folders. We hypothetically can force download files from dropbox links by modifying the urls (adding dl=1) so will need to see if that works out.

@newsroomdev newsroomdev added the enhancement New feature or request label Sep 10, 2024
@naumansharifwork naumansharifwork linked a pull request Oct 31, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants