Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize court call scrape #52

Open
antidipyramid opened this issue Apr 5, 2024 · 1 comment
Open

Optimize court call scrape #52

antidipyramid opened this issue Apr 5, 2024 · 1 comment

Comments

@antidipyramid
Copy link
Collaborator

Since we started all fetching calendar values while scraping court calls, the scrape has slow down to the point where we're unable to scrape all available court calls in under 6 hours.

We could try a some things to make the scrapes more efficient:

  1. Avoiding duplicate calendar requests-- on the results page, there are usually at least two court calls listed for a single case. Caching calendar values should reduce the number of case detail requests by at least half.
  2. If (1) isn't enough, we could also limit the dates we're scraping every day. We could try only scraping the court calls for the current or next day.
@antidipyramid
Copy link
Collaborator Author

What do you think @fgregg?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant