-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MO QA needed, with optimization potential #606
Comments
Endpoint shows 49,397 layoffs from 2019. BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel. This is a great opportunity for some extra QA! |
QA needed. BLN version seems to show 364 entries, including combined rows for at least some of the revision entries. /all endpoint seems to show 327 entries with separate rows for at least some of the revision entries. |
Flagging @Kirkman instead of the other person I flagged by accident. I need sleep. |
Couldn’t a lot of that be amendments?
…Sent from my iPhone
On Jan 31, 2024, at 6:47 AM, Mike Stucka ***@***.***> wrote:
Endpoint shows 49,397 layoffs from 2019.
BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel.
This is a great opportunity for some extra QA!
—
Reply to this email directly, view it on GitHub<#606 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAEFU3TVUTJQP4YQQ7UN7BTYRJKOTAVCNFSM6AAAAABCTF3VTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZGI2TGMBWGI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Lotsa duplicates for some reason in the BLN data. If I drop the obvious duplicates I get back to 52,379 layoffs among 256 entries, so it's close to the state's sheet but not quite there. |
There may be an undocumented endpoint in Missouri that allows all years to be scraped on a single hit:
https://jobs.mo.gov/warn/all
This would need a modicum of testing to ensure we're getting identical output to the per-year scrapes. Hitting this endpoint might reduce the chance we get snared by anti-abuse systems flagged in #597 by @Kirkman because we're not hitting all the pages all the time.
The text was updated successfully, but these errors were encountered: