Add initial scripts to cleanup data #1

mrcnc · 2020-04-15T01:38:42Z

I've added 2 scripts that will clean data for 2 of the VIA LINK dashboards. It's rough, but it's a start. The main goal is to create something runnable that gets to the format they are looking for from the input files they are starting with.

Also, I used black to format the code for now so I don't have to decide on style...but I'm not committed to the style just yet so I didn't check it in as a dependency

cleanup_all_covid_calls.py

nihonjinrxs · 2020-04-16T05:11:56Z

Fine with whatever formatter & standard you're comfortable with. I just follow what the linter/formatter says.

cleanup_all_covid_calls.py

nihonjinrxs · 2020-04-16T05:26:23Z

Overall, looks really good. I left a few comments, and will wait to do a full review and testing until you feel like it's ready. Thanks for all the work on this!

mrcnc · 2020-04-18T18:42:33Z

Great suggestion on using click! I hadn't used it before and it was a little easier than argparse imo.

utils.py

mrcnc · 2020-04-25T15:03:37Z

@nihonjinrxs I think this is sufficient for the initial version. If you can follow the readme to get it setup and running locally, I believe they will be able to as well.

nihonjinrxs · 2020-04-30T07:16:02Z

Sorry it's taken me so long to get to this. I'll look at this first thing in the morning Thursday 4/30.

mrcnc · 2020-05-01T00:01:46Z

No apology necessary! Any help is greatly appreciated any time 😄

Also I'm thinking we'll probably do some "real" QA with "real" data soon enough and more PRs will follow.

nihonjinrxs

Hey, so far, this seems like it's doing the correct things, based on my testing with dummy data. I'd like to test it on real data and have VIA LINK folks validate that it produces what they expect.

I had a bit of trouble with the setup/installation since I'm using Anaconda, and apparently the uszipcode library is not resolvable via conda's installer. Not sure how to go about fixing that, but may be worth adding a note to the README regarding that. The workaround there looks like:

remove the uszipcode==0.2.4 line from the requirements.txt file (temporarily)
create the conda environment: conda create -f=./requirements.txt -n c4nola-211-data-wrangling
activate the environment: conda activate c4nola-211-data-wrangling
install uszipcode via pip: pip install uszipcode==0.2.4
restore requirements.txt, either manually, or via git checkout -- requirements.txt

The only thing that makes me a bit nervous is the lack of tests. Do you think you'd have time to add some tests? I can also add some tomorrow and this weekend if that helps.

For the Excel file, I've found that mocking the file with an in-memory Excel file works best for testing these kinds of things. An example I found that shows pretty clearly how to do that is here: https://coderbook.com/@marcus/how-to-mock-and-unit-test-with-pandas/. See the "Mocking Pandas in Unit Tests" > "Mock Files with In-Memory File Objects" section.

That said, I think we're close to ready to merge. And I'm also OK with merging this to master now, and adding tests in a separate PR if you would prefer that.

mrcnc · 2020-05-02T19:54:14Z

I wasn't going to add tests for this initial PR but think it's a good idea to add them so I created #3.

mrcnc · 2020-05-02T20:03:25Z

With respect to Conda, I'm thinking it's best to use Python's built in virtual environment capabilities instead of introducing another tool. Then the end users only have to install 1 thing. I definitely see the value in using Conda for other projects but I don't think it would benefit this project....especially since it seems you have to still use pip to download the uszipcode dependency 😉

cleanup_all_covid_calls.py

Add initial scripts to cleanup data

3312dff

mrcnc commented Apr 15, 2020

View reviewed changes

cleanup_all_covid_calls.py Outdated Show resolved Hide resolved

nihonjinrxs reviewed Apr 16, 2020

View reviewed changes

cleanup_all_covid_calls.py Outdated Show resolved Hide resolved

mrcnc added 3 commits April 16, 2020 08:56

Refactoring, adding more text replacements

d2a6a2b

Fix bug and suppress SettingWithCopyWarning

6cd0864

Adding CLI using click

2b7db9e

mrcnc marked this pull request as ready for review April 18, 2020 18:42

mrcnc added 2 commits April 25, 2020 09:56

Adding additional need replacements

4003395

Reformatting

a437ca4

mrcnc force-pushed the develop branch from 33824aa to a437ca4 Compare April 25, 2020 14:58

mrcnc commented Apr 25, 2020

View reviewed changes

utils.py Outdated Show resolved Hide resolved

Remove cross symbol

39ff2f9

nihonjinrxs approved these changes May 1, 2020

View reviewed changes

mrcnc mentioned this pull request May 2, 2020

Add unit tests #3

Open

mrcnc commented May 3, 2020

View reviewed changes

cleanup_all_covid_calls.py Outdated Show resolved Hide resolved

mrcnc commented May 3, 2020

View reviewed changes

cleanup_all_covid_calls.py Outdated Show resolved Hide resolved

mrcnc added 4 commits May 5, 2020 20:34

Remove main method from cleanup modules

9311cff

Update to read iCarol formatted csv file

163056c

Update all-covid-calls to read 2 csv files

3d3aba1

Update usage in readme

9752f0d

mrcnc merged commit 7dc3c60 into master May 9, 2020

mrcnc deleted the develop branch May 9, 2020 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial scripts to cleanup data #1

Add initial scripts to cleanup data #1

mrcnc commented Apr 15, 2020 •

edited

Loading

nihonjinrxs commented Apr 16, 2020

nihonjinrxs commented Apr 16, 2020

mrcnc commented Apr 18, 2020

mrcnc commented Apr 25, 2020

nihonjinrxs commented Apr 30, 2020

mrcnc commented May 1, 2020

nihonjinrxs left a comment

mrcnc commented May 2, 2020

mrcnc commented May 2, 2020

Add initial scripts to cleanup data #1

Add initial scripts to cleanup data #1

Conversation

mrcnc commented Apr 15, 2020 • edited Loading

nihonjinrxs commented Apr 16, 2020

nihonjinrxs commented Apr 16, 2020

mrcnc commented Apr 18, 2020

mrcnc commented Apr 25, 2020

nihonjinrxs commented Apr 30, 2020

mrcnc commented May 1, 2020

nihonjinrxs left a comment

Choose a reason for hiding this comment

mrcnc commented May 2, 2020

mrcnc commented May 2, 2020

mrcnc commented Apr 15, 2020 •

edited

Loading