AutoDocish: Data Profile/Documentation tool

This is a tool created as a project for the Spring 2016 GSLIS Data Cleaning course.

The products of this tool will sit somewhere between auto-documentaiton and data profiling.

PyData presentation

This tool was presented at PyData Chicago 2016. Talk recording: https://www.youtube.com/watch?v=Hb7nvHbwNAw&t=4s

Basic purpose

Point the tool at a folder of files and it will create a markdown file with basic statistics about each column along with template areas for you to write a narrative about each column. You can then render that into HTML or simply include it in your data package as documentation.

Python 3 is in progress

data_profilepy3.py is the version updated for python 3 and contains the most up to date code. Has the same use.

Consider the python 2 version deprecated.

Core caveats

Still mostly a proof of concept.

Path issues for windows. This was hopefully fixed.

Unknown bugs.

Basic use

This was written using Python 2.7. Maybe it would work with Python 3 if I updated the print statements. Anyhow. Run it on the command line.

General use:

python data_profilepy3.py source output missing_code

Working example that will run within this directory:

python data_profilepy3.py vagrants/ vagrant-profiles/ [missing]

This works out to:

python data_profile.py runs the script
vagrants/
- Provide single file path or folder with many files
- Currently only built to work with CSV data
vagrant-profiles/
- This is the destination folder for the profile files
- Will either create the folder or overwrite the named contents
- Will create:
  - one JSON file with all profile data
  - one md file per source file with profile data
[missing]
- this is the missing code, use '' for empty
- optional, but presumes empty string if not provided
- cannot currently specify multiple missing values for single files

License

CC-BY

Fork, whack, republish, whatever. Just cite.

Developers

Feel free to work on functions or add ons that would work with your kind of data or another format.

Contributing

This is github, afterall. Feel free to put in requests or issues and I'll take them into consideration. Let me know if you'd like to collaborate on the project as well. This is my first formal tool, so there are obvious limitations, etc.

Keep in mind, however, that this tool will be meant for an average researcher who would just want to download something and run it. They wouldn't necessary want to use pip or conda to install. This tool is in proof of concept mode, so criticisms are expected to be substantive and move the conversation forward.

References

The vagrant data used as example is from:

Crymble, Adam et al.. (2015). Vagrant Lives: 14,789 Vagrants Processed by Middlesex County, 1777-1786 (version 1.1). Zenodo. 10.5281/zenodo.31026.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.ipynb_checkpoints		.ipynb_checkpoints
fakedata		fakedata
fakedataprofiles		fakedataprofiles
fakes		fakes
vagrant-profiles		vagrant-profiles
vagrants		vagrants
.DS_Store		.DS_Store
.Rhistory		.Rhistory
.gitignore		.gitignore
MiddlesexVagrants1777-1786v1.1.csv		MiddlesexVagrants1777-1786v1.1.csv
data_profile.py		data_profile.py
data_profilepy3.py		data_profilepy3.py
data_profilepy3_no_cmd_line.py		data_profilepy3_no_cmd_line.py
make_fake_data.ipynb		make_fake_data.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoDocish: Data Profile/Documentation tool

PyData presentation

Basic purpose

Python 3 is in progress

Core caveats

Basic use

License

Developers

Contributing

References

About

Releases 1

Packages

Contributors 2

Languages

elliewix/data-profile-tool

Folders and files

Latest commit

History

Repository files navigation

AutoDocish: Data Profile/Documentation tool

PyData presentation

Basic purpose

Python 3 is in progress

Core caveats

Basic use

License

Developers

Contributing

References

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages