Users should be able to initiate a COPA Scraping job #56

adesca · 2019-08-09T13:45:45Z

Overarching goal: A user should be able to trigger a process in the server that pulls data from the COPA website and imports new Allegations to the database.

Things to keep in mind:

The copa website is: https://data.cityofchicago.org/Public-Safety/COPA-Cases-Summary/mft5-nfa8
A data summary of copa complaints can be found here: https://data.cityofchicago.org/Public-Safety/COPA-Cases-Summary-Dashboard/uei2-mi82
COPA has an API that we should use
Further information on our goals is at the bottom of this issue.
If any of the steps fail, that should not end the pipeline. Perhaps parallelize these steps on a per row basis.

Goals:

The business need:

From Rajiv:
The primary purpose of this COPA Data Portal data capture step is to create incomplete/phantom complaint records in our database (for new complaints since our last successful FOIA response) so that we can have some matching data for the new documents that are being picked up by our crawlers/scrapers ( https://cpdp.co/crawlers and https:// cpdp.co/documents ).

The second purpose is to compare against the data that we have received via FOIA responses to whether we are missing any records (i.e., were any responsive complaint records omitted from our original dataset and if so which ones).

The third purpose is to compare different versions/snapshots of it over time and see what’s changing (is it just new records being added on to the end, or are older records being added, or removed, or altered).

From Basecamp:
The Civilian Office of Police Accountability (COPA) has just posted a new live data feed to the City's Open Data Portal that goes back 10 years. Here are a few early questions to investigate.

Are there CRs that appear here during the comparable time period (i.e., before October 2016) that don't appear in our FOIA'd datasets (which were produced in October 2016)? If so, how many and are there any revealing common characteristics amongst them to suggest why they may have been excluded from the dataset we received in response to our FOIA requests but not excluded from this public release on the City's public data portal. More likely is the inverse, i.e., complaints that we know of through our FOIA request, but that were excluded from the City's public data portal even during the overlapping time period of November 2007 – November 2016.
For all the CRs that exist both in the City Data Portal and in our FOIA'd datasets, how many rows have conflicting values for the dynamic data fields, such as CURRENT_STATUS (which we expect to change over time for open cases), and for data fields that we might not expect to change, such as COMPLAINT_DATE? What can we learn from any patterns amongst these kinds of unexpected discrepancies, particularly when they occur in cases that are already closed?
Are there any reasons not to import all these data and overwrite the conflicting fields in our existing dataset with more "up-to-date" information from the City's data portal (of course, any new CRs would be missing all officer-identifying data and other fields that are not being published to the data portal, until our next FOIA request)? The City Data Portal has a relatively robust API and supports numerous open standards for public APIs. Can we do all this importing and merging programmatically and run it on the Civis Platform on a routine basis? Is there any equivalent to cron built into the Platform? Apart from sanity checks, what kinds of issues will we run into that require human intervention/judgment (no officer-identifying data also means no officer profile matching challenges)?

The text was updated successfully, but these errors were encountered:

colin-parsons · 2019-08-09T16:35:00Z

Status indicators

Stage names should be unordered list items
Stage names should be green when the stage successfully completed for all rows
Stage names should be red when the stage experienced an error for one or more rows
- Stage names should show the number of errored rows next to the stage name in the format "([errors]/[total rows])"
- When a stage error a button should appear beside the name that says "show errors" and when clicked, the button should show an error summary right beneath the stage name
Stage names should flash when the stage is in progress for one or more rows

colin-parsons · 2019-08-09T16:38:21Z

Tasks done in e2e:

A user can start the copa process
A user can show errors on errored transformations
Check that the files are stored with the proper data
Check the data validation screen

* [56] WIP: transform complete and tested, ready for augmentation step * [56] WIP: return None if there is no file to read * [56] WIP: use try except for trying to open file for reading, return None otherwise * [56] WIP: mock.call_count too high in build, attempt to unpatch functions between tests to ensure correct count * [56] WIP: Update transform test to ensure correct data is being passed to store_string calls * Update .travis.yml * Update .travis.yml * Update .travis.yml * [#56] completed transformation * [56] fixed flake8 issues * Update .travis.yml, fix parsing issue on travis-ci.com

…ieve copa instead

tw-jeff-burroughs · 2019-08-29T21:14:54Z

AUGMENTATION:

Table of data_allegationcategory

Id category_name
1 category1
2 category2
3 category3

Table data_allegation (pre-augment)

cr_id … current_category
123123 category2
123124 category3
123125 category1

augment()
for each row in data_allegation look up the id of the category listed under current_category
replace value of current_category with looked up id

Table data_allegation (post augment)

cr_id … current_category
123123 2
123124 3
123125 1

…ranch version of endpoints

…se codes

…object, copa scrape transformer add api error handling and update tests

…oud if error happens

…ts in relation to the added functions in copa_scrape_transformer

…and flake8 fixes

…not-copa data; and storing errors. Removed commented code. Added tests for the above.

… (1) copa scrape yields error; (2) not-copa scrape yields error: (3) both scrapes yield errors; and (4) no scrape contain errors

…s while the other succeeds

…sts fails while the other succeeds" This reverts commit e1b000d.

* [#56] WIP: Add test for adding augments copa recxord to db * Reformated test within test_augment.py * [Daisy] Debugging commi * [#56][Thalia/Everyone] Fix mypy commit error * [#56A] [Clari and Daisy] added react components header, and tab * [#56A][Clari, Thalia and Daisy] Added CSS style sheet for tabs and header. * [Daisy Octavia and Jovanka][#56A] Cleaned up CSS and finished applying proper front end design to header * [Octavia, Jovanka, and Daisy][#56A] Added components and css styling for button * [#56A][JK]WIP: add bg image and footer * [#56A][JK] WIP: bg image fix; styling * [#56A][Jole] WIP: add FOIA tab/placeholder; route; header included in status page and FOIA placeholder page for navigation * [#56][Jole] fix: failing test due to unwrapped link in Tab component

colin-parsons added a commit that referenced this issue Aug 19, 2019

[#56] Implemented the upload function for the clean stage

86baec4

j0vanka pushed a commit that referenced this issue Aug 26, 2019

[#56] completed transformation

8abec37

tw-jeff-burroughs added a commit that referenced this issue Aug 26, 2019

[#56] BUGFIX: scrape not in entity was getting no copa, fixed to retr…

67220ff

…ieve copa instead

tw-jeff-burroughs added a commit that referenced this issue Sep 16, 2019

[#56] WIP: add rows_added to db property to loader for use in UI

c817422

tw-jeff-burroughs added a commit that referenced this issue Sep 16, 2019

[#56] WIP: add rows_added to db property to loader for use in UI (#75)

f92a260

colin-parsons added a commit that referenced this issue Sep 23, 2019

[#56] WIP: Got troublesome test working

3f36a26

colin-parsons added a commit that referenced this issue Sep 23, 2019

[#56] WIP: Added test file for copa_scrape, called test_copa_scrape

f0a7f85

tw-jeff-burroughs added a commit that referenced this issue Sep 23, 2019

[#56] WIP: Update test copa scrape mocked endpoints to match master b…

6e4bb63

…ranch version of endpoints

j0vanka pushed a commit that referenced this issue Sep 23, 2019

[#56] WIP: correct duplicate env in travis.yml

93ca1f7

j0vanka pushed a commit that referenced this issue Sep 23, 2019

[#56] WIP: correct travis.yml

00c3bbe

j0vanka pushed a commit that referenced this issue Sep 23, 2019

[#56] moved env definition to after deploy definition

b995170

j0vanka pushed a commit that referenced this issue Sep 23, 2019

[#56] fix foia_upload_test error

e638cec

j0vanka pushed a commit that referenced this issue Sep 24, 2019

[#56] flake8 fix

cfb40b1

j0vanka pushed a commit that referenced this issue Sep 24, 2019

[#56] parameterized copa scrape transformer test for different respon…

b5eefea

…se codes

tw-jeff-burroughs added a commit that referenced this issue Oct 1, 2019

[#56] WIP: refactor copa scrape_data_csv function to return response …

6ae5fc7

…object, copa scrape transformer add api error handling and update tests

tw-jeff-burroughs added a commit that referenced this issue Oct 3, 2019

[#56]: update split function to handle api errors, write to google cl…

8924f52

…oud if error happens

colin-parsons added a commit that referenced this issue Oct 4, 2019

[#56] WIP: added error handling for no_copa api calls, refactored tes…

9e4fef5

…ts in relation to the added functions in copa_scrape_transformer

j0vanka pushed a commit that referenced this issue Oct 7, 2019

[#56] WIP: refactored split function, updated downstream code, tests …

ead98dd

…and flake8 fixes

KyleDolezal added a commit that referenced this issue Oct 8, 2019

[#56] WIP: Created wrapper function for handling copa data; handling …

a4e9182

…not-copa data; and storing errors. Removed commented code. Added tests for the above.

KyleDolezal added a commit that referenced this issue Oct 10, 2019

[#56] Proof of Concept: Test #data_retrieval_wrapper for cases where:…

fc38d6e

… (1) copa scrape yields error; (2) not-copa scrape yields error: (3) both scrapes yield errors; and (4) no scrape contain errors

KyleDolezal added a commit that referenced this issue Oct 10, 2019

[#56] Add tests for when either one of the two scraping requests fail…

e1b000d

…s while the other succeeds

colin-parsons closed this as completed Oct 15, 2019

colin-parsons reopened this Oct 15, 2019

KyleDolezal added a commit that referenced this issue Oct 15, 2019

[#56] WIP: Add test for adding augments copa recxord to db

273c303

colin-parsons added a commit that referenced this issue Oct 15, 2019

Revert "[#56] Add tests for when either one of the two scraping reque…

008d3f1

…sts fails while the other succeeds" This reverts commit e1b000d.

veronica-blackwell-tw closed this as completed Oct 22, 2019

veronica-blackwell-tw reopened this Oct 22, 2019

testrella added a commit that referenced this issue Oct 28, 2019

[#56][Thalia/Everyone] Fix mypy commit error

f122829

KyleDolezal added a commit that referenced this issue Nov 11, 2019

[#56] WIP: Add test for adding augments copa recxord to db

9dc2af8

KyleDolezal pushed a commit that referenced this issue Nov 11, 2019

[#56][Thalia/Everyone] Fix mypy commit error

f4d2f4e

j0vanka pushed a commit that referenced this issue Nov 25, 2019

[#56][Jole] fix: failing test due to unwrapped link in Tab component

2c2d93e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Users should be able to initiate a COPA Scraping job #56

Users should be able to initiate a COPA Scraping job #56

adesca commented Aug 9, 2019 •

edited by j0vanka

Loading

colin-parsons commented Aug 9, 2019

colin-parsons commented Aug 9, 2019 •

edited by tw-jeff-burroughs

Loading

tw-jeff-burroughs commented Aug 29, 2019

Users should be able to initiate a COPA Scraping job #56

Users should be able to initiate a COPA Scraping job #56

Comments

adesca commented Aug 9, 2019 • edited by j0vanka Loading

Overarching goal: A user should be able to trigger a process in the server that pulls data from the COPA website and imports new Allegations to the database.

The business need:

colin-parsons commented Aug 9, 2019

colin-parsons commented Aug 9, 2019 • edited by tw-jeff-burroughs Loading

tw-jeff-burroughs commented Aug 29, 2019

adesca commented Aug 9, 2019 •

edited by j0vanka

Loading

colin-parsons commented Aug 9, 2019 •

edited by tw-jeff-burroughs

Loading