Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modified column is inverted? #2

Open
wetchler opened this issue May 3, 2018 · 0 comments
Open

modified column is inverted? #2

wetchler opened this issue May 3, 2018 · 0 comments

Comments

@wetchler
Copy link

wetchler commented May 3, 2018

In the cleaned_hm.csv file, I believe the modified column is the opposite of what it should be. You can see this by example with:

> df.loc[[50, 995],:]
	original_hm	cleaned_hm		modified
50	I went shopping	I went shopping		True
995	I ate chikfila	I ate chik-fil-a	False

And confirmed it by recreating this column like so:

> (df.modified == (df.cleaned_hm != df.original_hm)).sum()
0

And seems reasonable, since currently modified is True > 99% of the time!

> df.modified.value_counts()
True     98329
False     2206
Name: modified, dtype: int64

Or am I misunderstanding the data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant