Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MO QA needed, with optimization potential #606

Open
stucka opened this issue Jan 31, 2024 · 5 comments
Open

MO QA needed, with optimization potential #606

stucka opened this issue Jan 31, 2024 · 5 comments

Comments

@stucka
Copy link
Contributor

stucka commented Jan 31, 2024

There may be an undocumented endpoint in Missouri that allows all years to be scraped on a single hit:
https://jobs.mo.gov/warn/all

This would need a modicum of testing to ensure we're getting identical output to the per-year scrapes. Hitting this endpoint might reduce the chance we get snared by anti-abuse systems flagged in #597 by @Kirkman because we're not hitting all the pages all the time.

@stucka
Copy link
Contributor Author

stucka commented Jan 31, 2024

Endpoint shows 49,397 layoffs from 2019.

BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel.

This is a great opportunity for some extra QA!

@stucka stucka changed the title MO optimization potential MO QA needed, with optimization potential Jan 31, 2024
@stucka
Copy link
Contributor Author

stucka commented Jan 31, 2024

QA needed.

BLN version seems to show 364 entries, including combined rows for at least some of the revision entries.

/all endpoint seems to show 327 entries with separate rows for at least some of the revision entries.

@stucka
Copy link
Contributor Author

stucka commented Jan 31, 2024

Flagging @Kirkman instead of the other person I flagged by accident. I need sleep.

@cephillips
Copy link
Member

cephillips commented Jan 31, 2024 via email

@stucka
Copy link
Contributor Author

stucka commented Jan 31, 2024

Lotsa duplicates for some reason in the BLN data. If I drop the obvious duplicates I get back to 52,379 layoffs among 256 entries, so it's close to the state's sheet but not quite there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants