Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous or Categorical #23

Open
DataFighter opened this issue Dec 9, 2015 · 3 comments
Open

Continuous or Categorical #23

DataFighter opened this issue Dec 9, 2015 · 3 comments

Comments

@DataFighter
Copy link

DataFighter commented Dec 9, 2015

Trinket should have some feature to determine if data is continuous or categorical.

This should be somewhat guessed on behalf on the user, by the system. However, ultimately the user should have control.

@rebeccabilbro
Copy link
Member

@doctorf72 is working on this in a fork during the sprints

@doctorf72
Copy link

This is taken from Messy Tables Documentation.
type_guess method from types class trying to guess column type by accounting number of successful conversions. Unfortunately, no Categorical data type defined in Messy Tables. The most suitable candidate is String type:

types.type_guess(rows, types=[<class 'messytables.types.StringType'>, <class 'messytables.types.DecimalType'>, <class 'messytables.types.IntegerType'>, <class 'messytables.types.DateType'>, <class 'messytables.types.BoolType'>], strict=False)

The type guesser aggregates the number of successful conversions of each column to each type, weights them by a fixed type priority and select the most probable type for each column based on that figure. It returns a list of CellType. Empty cells are ignored.

Strict means that a type will not be guessed if parsing fails for a single cell in the column.

Continue to Pandas.

1 similar comment
@doctorf72
Copy link

This is taken from Messy Tables Documentation.
type_guess method from types class trying to guess column type by accounting number of successful conversions. Unfortunately, no Categorical data type defined in Messy Tables. The most suitable candidate is String type:

types.type_guess(rows, types=[<class 'messytables.types.StringType'>, <class 'messytables.types.DecimalType'>, <class 'messytables.types.IntegerType'>, <class 'messytables.types.DateType'>, <class 'messytables.types.BoolType'>], strict=False)

The type guesser aggregates the number of successful conversions of each column to each type, weights them by a fixed type priority and select the most probable type for each column based on that figure. It returns a list of CellType. Empty cells are ignored.

Strict means that a type will not be guessed if parsing fails for a single cell in the column.

Continue to Pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants