Data analysis with no compromises. Efficiency, Accuracy, Assurance - pick any three.
- pandasboost.dataframe
- pandasboost.formatter
- pandasboost.install_boosters
- pandasboost.pandas_flavor
- pandasboost.registration
- pandasboost.series
def check_keep( frame, query, desc )
Filter a dataframe with query
and report the number of rows affected.
query
: str
: Query for filtering the dataframe. Will be passed to pandas.DataFrame.query.
desc
: str
: Description of the filter.
def levels( dataframe, show_values=True )
Report the number of unique values (levels) for each variable. Useful to inspect categorical variables.
show_values
: bool
: Whether to report a short sample of level values.
def nmissing( dataframe, show_all=False )
Evaluate the number of missing values in columns in the dataframe
show_all
: bool
: Whether to report all columns. False
to show only columns with
one or more missing values.
def bignum( n, precision=0 )
Transform a big number into a business style representation.
>>> bignum(123456)
Output: 123K
def format_percentage( n, precision='auto' )
Display a decimal number in percentage.
n
: float
: The number to format.
percision
: int, str
, default 'auto'
: The precision of outcome. Default 'auto' to automatically
choose the least precision on which the outcome is not zero.
format_percentage(0.001) ==> '0.1%' format_percentage(-0.0000010009) ==> '-0.0001%' format_percentage(0.001, 4) ==> '0.1000%'
def cut_groups( srs, rules, right=True, missing='missing' )
def frequency( srs, business=True, ascending=None, by_index=False )
Report frequency of values.
ascending
: boolean
, default None
: Whether to sort in ascending order. If none, will use ascending when sorted by index,
and descending when sorted by frequency.
by_index
: boolean
, default True
: Whether sort result by index.