Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meet Great Expectations (WIP) #252

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chriszs
Copy link
Contributor

@chriszs chriszs commented Mar 17, 2024

This PR is a work-in-progress draft of a potential command to validate raw data using Great Expectations. It creates an expectation suite that checks if each raw CSV has three or more rows and then opens an HTML report listing the results.

This is very much a first effort, and we would probably want to factor it a little differently if we decided to use it.

Usage

The following should validate CSVs in the default raw directory used by warn-transformer, verifying that each has three or more rows, creating a data quality report in a temporary directory and opening it in a browser (obviously we'd want to persist it somewhere and/or alert off of it in production):

pipenv install
python -m warn_transformer.cli validate -l DEBUG

Screenshots

In this example, I hand-edited ak.csv to fail the check:

Great Expectations validation results, the ak raw data source has failed while al has succeeded

Detail on the failure:

Detail on the ak failure, showing which check it failed

Related to #236

@chriszs chriszs mentioned this pull request Mar 17, 2024
@chriszs
Copy link
Contributor Author

chriszs commented Mar 17, 2024

Great Expectations is apparently only compatible with Python 3.8 and up, so I removed 3.7 from the CI matrix for demonstration purposes.

Also, believe updating Pipfile.lock when I added GE may have also upgraded some non-pinned deps. Flake8 is now at 7.0, which has at least one incompatibility with current version in pre-commit (so probably should upgrade the version in pre-commit or pin the Pipfile version).

There's a 1.0 version of GE now in pre-release, which seems like it will move stuff around (but isn't well-documented yet), so I locked it to the current point release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant