Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

created scrape and scrape meta for Monterey County District Attorney #74

Merged
merged 7 commits into from
Aug 19, 2024

Conversation

naumansharifwork
Copy link
Contributor

Description

Created the scraper for Monterey County District Attorney

Summary of Changes

  • Added a separate file for the headers in the config folder and added created the scraper for Monterey County District Attorney

Related Issues

@naumansharifwork
Copy link
Contributor Author

Hey the screen shot of the cache folder and sample meta json is below

image
ca_monterey_county_district_attorney.json

Copy link
Collaborator

@tarakc02 tarakc02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second note (about the "if-modified-since") is just a question

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, this will break utils.get_all_scrapers, which treats everything in clean/ca as an agency. i think moving it up to clean rather than clean/ca (and updating the import statement accordingly) would fix it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it i can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarakc02 updated the config location

"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en-US,en;q=0.9,nl;q=0.8,ur;q=0.7,ru;q=0.6",
"cache-control": "max-age=0",
"if-modified-since": "Tue, 06 Aug 2024 10:41:35 GMT",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works, but i don't understand what it does. does it matter what date it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarakc02 these are just the headers used in downloading the pages, i have put them in a separate file because i have seen in another scraper it was done this way.

Copy link
Collaborator

@tarakc02 tarakc02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@newsroomdev
Copy link
Member

newsroomdev commented Aug 13, 2024

Great start @naumansharifwork! I left a few comments to help get this scraper into better shape for usage in the application layer. Let me know if I can help, and this is looking close to merging

@naumansharifwork
Copy link
Contributor Author

Hey @newsroomdev please review this one as well, Thanks a lot.

@newsroomdev newsroomdev linked an issue Aug 14, 2024 that may be closed by this pull request
Copy link
Member

@newsroomdev newsroomdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output is looking good, and the code is straightforward to read. Thank you, and please merge to dev when you have a moment

@naumansharifwork
Copy link
Contributor Author

@newsroomdev can you check i think i don't have merge access.

@newsroomdev newsroomdev merged commit 61324e3 into biglocalnews:dev Aug 19, 2024
1 check passed
@naumansharifwork naumansharifwork deleted the ca-65 branch August 19, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create clean/ca/monterey_county_district_attorney.py
3 participants