New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

`modified` column is inverted? #2

Open

wetchler opened this issue May 3, 2018 · 0 comments

wetchler commented May 3, 2018

In the cleaned_hm.csv file, I believe the modified column is the opposite of what it should be. You can see this by example with:

> df.loc[[50, 995],:]
	original_hm	cleaned_hm		modified
50	I went shopping	I went shopping		True
995	I ate chikfila	I ate chik-fil-a	False

And confirmed it by recreating this column like so:

> (df.modified == (df.cleaned_hm != df.original_hm)).sum()
0

And seems reasonable, since currently modified is True > 99% of the time!

> df.modified.value_counts()
True     98329
False     2206
Name: modified, dtype: int64

Or am I misunderstanding the data?

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment