-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
created scrape and scrape meta for Monterey County District Attorney #74
Conversation
Hey the screen shot of the cache folder and sample meta json is below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second note (about the "if-modified-since") is just a question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, this will break utils.get_all_scrapers
, which treats everything in clean/ca
as an agency. i think moving it up to clean
rather than clean/ca
(and updating the import statement accordingly) would fix it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it i can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarakc02 updated the config location
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", | ||
"accept-language": "en-US,en;q=0.9,nl;q=0.8,ur;q=0.7,ru;q=0.6", | ||
"cache-control": "max-age=0", | ||
"if-modified-since": "Tue, 06 Aug 2024 10:41:35 GMT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works, but i don't understand what it does. does it matter what date it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarakc02 these are just the headers used in downloading the pages, i have put them in a separate file because i have seen in another scraper it was done this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Great start @naumansharifwork! I left a few comments to help get this scraper into better shape for usage in the application layer. Let me know if I can help, and this is looking close to merging |
…nd case_id it will have title in case_id instead of none
Hey @newsroomdev please review this one as well, Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output is looking good, and the code is straightforward to read. Thank you, and please merge to dev
when you have a moment
@newsroomdev can you check i think i don't have merge access. |
Description
Created the scraper for Monterey County District Attorney
Summary of Changes
Related Issues