04 Aug 20:02

lshpaner

fcff969

EDA Toolkit 0.0.5

Ensure Consistent Font Size and Text Wrapping Across Plot Elements

Description

This PR addresses inconsistencies in font sizes and text wrapping across various plot elements in the stacked_crosstab_plot function. The following updates have been implemented to ensure uniformity and improve the readability of plots:

Title Font Size and Text Wrapping:
- Added a text_wrap parameter to control the wrapping of plot titles.
- Ensured that title font sizes are consistent with axis label font sizes by explicitly setting the font size using ax.set_title() after plot generation.
Legend Font Size Consistency:
- Incorporated label_fontsize into the legend font size by directly setting the font size of the legend text using plt.setp(legend.get_texts(), fontsize=label_fontsize).
- This ensures that the legend labels are consistent with the title and axis labels.

Testing

Verified that titles now wrap correctly and match the specified label_fontsize.
Confirmed that legend text scales according to label_fontsize, ensuring consistent font sizes across all plot elements.

Outcome

These changes improve the visual consistency of plots generated by the stacked_crosstab_plot function, making the plots more professional and easier to read. This PR should be reviewed and merged to standardize font sizing and text presentation across the codebase.

Assets 2

02 Aug 21:30

lshpaner

0.0.4

d1f8957

EDA Toolkit 0.0.4

Changelog

[0.0.4] - 2024-08-02

Stable release:
- No new updates to codebase
- Updated project description variable in setup.py to re-emphasize key elements of library
- minor readme cleanup:
  - added icons for sections that did not have them

[0.0.3] - 2024-08-02

Stable release:
- Updated logo size, fixed citation title, and some minor readme cleanup:
  - added additional section for documentation, cleaned up verbiage, moved acknowledgements section before licensing and support

[0.0.2] - 2024-08-01

First stable release:
- No new updates to codebase; just minimal documentation updates to readme and setup.py files
- Added logo, badges, and Zenodo-certified citation to readme

[0.0.1rc0] - 2024-08-01

No new updates to codebase; just minimal documentation updates to readme and setup.py files

[0.0.1b0] - 2024-08-01

New `scatter_fit_plot()` and additional updayes

Added new scatter_fit_plot(), removed unused data_types(), added comment section headers

Added `xlim` and `ylim` inputs in `kde_distribution()`

Added xlim and ylim inputs to allow user to customize axes limits in kde_distribution()

Added `xlim` and `ylim` params to `stacked_crosstab_plot()`

Added xlim and ylim input parameters to stacked_crosstab_plot() to give users more flexibility on controlling axes limits

Added `x` and `y` limits to `box_violin_plot()`

Changed function name from metrics_box_violin() to box_violion_plot()
Added xlim and ylim inputs to control x and y-axis limits of box_violion_plot() (formerly known as metrics_box_violin)

Added ability to remove stacks from plots, plot all or one at a time

Key Changes

plot_type Parameter:
This parameter allows the user to choose between "regular", "normalized", or "both" plot types.
remove_stacks Parameter:
This parameter, when set to True, generates a regular bar plot using only the col parameter instead of a stacked bar plot. It only works when plot_type is set to "regular". If remove_stacks is set to True while plot_type is anything other than "regular", the function will raise an exception.

Explanation of Changes:

plot_type Parameter:
- This parameter provides flexibility to the user, allowing them to specify the type of plot to generate. The options are:
  - "regular": Generates a standard bar plot.
  - "normalized": Generates a normalized bar plot.
  - "both": Generates both regular and normalized bar plots.
remove_stacks Parameter:
- This parameter, when set to True, will generate a regular bar plot using only the col parameter. It effectively removes the stacking of the bars. This parameter is only applicable when plot_type is set to "regular". If used with any other plot_type, an exception will be raised to ensure proper usage.

These changes enhance the flexibility and functionality of the stacked_crosstab_plot function, allowing for more customizable and specific plot generation based on user requirements.

[0.0.1b0] - 2024-07-31

Refined `kde_distributions()`

Key Changes

Alpha Transparency for Histogram Fill:
- Added a fill_alpha parameter to control the transparency of the histogram bars' fill color.
- The default value is 0.6. An exception is raised if fill=False and fill_alpha is specified.
Custom Font Sizes:
- Introduced label_fontsize and tick_fontsize parameters to allow control over the font size of axis labels and tick marks independently.
Scientific Notation Toggle:
- Added a disable_sci_notation parameter to enable or disable scientific notation on axes.
Improved Error Handling:
- Added validation for the stat parameter to ensure that only valid options are accepted.
- Added checks to ensure proper usage of fill_alpha and hist_edgecolor when fill is set to False.
General Enhancements:
- Updated the function's docstring to reflect the new parameters and provide comprehensive guidance on its usage.

[0.0.1b0] - 2024-07-30

Enhance `kde_distributions` Function

Added Parameters

grid_figsize and single_figsize:
- Control the size of the overall grid figure and individual figures separately.
hist_color and kde_color:
- Allow customization of histogram and KDE plot colors.
hist_edgecolor:
- Allows customization of the histogram bar edges.
hue:
- Allows grouping data by a column.
fill:
- Controls whether to fill the histogram bars with color.
y_axis_label:
- Customizable y-axis label.
log_scale_vars:
- Specifies which variables to apply log scale.
bins and binwidth:
- Control the number and width of bins.
stat:
- Allows different statistics for the histogram (count, density, frequency, probability, proportion, percent).

Improvements

Validation and Error Handling:
- Checks for invalid log_scale_vars and throws a ValueError if any are found.
- Throws a ValueError if edgecolor is changed while fill is set to False.
- Issues a PerformanceWarning if both bins and binwidth are specified, warning of potential performance impacts.
Customizable y-axis label:
- Allows users to specify custom y-axis labels.
Warning for KDE with Count:
- Issues a warning if KDE is used with stat='count', as it may produce misleading plots.

Updated `add_ids` to ensure unique ids and idx check

This pull request updates the add_ids() function to enhance its functionality by:

Ensuring that each generated ID starts with a non-zero digit.
Adding a check to verify that the DataFrame index is unique.
Printing a warning message if duplicate index entries are found.

These changes improve the robustness of the function, ensuring that the IDs generated are always unique and valid, and provide necessary feedback when the DataFrame index is not unique.

Check for Unique Indices:

Before generating IDs, the function now checks if the DataFrame index is unique.
If duplicates are found, a warning is printed along with the list of duplicate index entries.

Generate Non-Zero Starting IDs:

The ID generation process is updated to ensure that the first digit of each ID is always non-zero.

Ensure Unique IDs:

A set is used to store the generated IDs, ensuring all IDs are unique before adding them to the DataFrame.

Updated the `add_ids()` function to enhance its functionality by:

Ensuring that each generated ID starts with a non-zero digit.
Adding a check to verify that the DataFrame index is unique.
Printing a warning message if duplicate index entries are found.

These changes improve the robustness of the function, ensuring that the IDs generated are always unique and valid, and provide necessary feedback when the DataFrame index is not unique.

Check for Unique Indices:

Before generating IDs, the function now checks if the DataFrame index is unique.
If duplicates are found, a warning is printed along with the list of duplicate index entries.

Generate Non-Zero Starting IDs:

The ID generation process is updated to ensure that the first digit of each ID is always non-zero.

Ensure Unique IDs:

A set is used to store the generated IDs, ensuring all IDs are unique before adding them to the DataFrame.

Fix int conversion for numeric cols, reset decimal_places=0

This PR fixes the integer conversion issue for numeric columns when decimal_places=0 in the save_dataframes_to_excel function. Additionally, it resets decimal_places to 0 as the default value.

Changes include:

Convert only numeric columns to integers when decimal_places=0.
Reset decimal_places default value to 0.

This ensures correct formatting and avoids errors during conversion.

Contingency Table Updates

Error Handling for Columns:
- Added a check to ensure at least one column is specified.
- Updated the function to accept a single column as a string or multiple columns as a list.
- Updated the function to raise a ValueError if no columns are provided or if cols is not correctly specified.
Function Parameters:
- Changed the parameters from col1 and col2 to a single parameter cols which can be either a string or a list.
Error Handling for SortBy:
- Renamed SortBy to sort_by to standardize nomenclature.
- Added a check to ensure sort_by is either 0 or 1.
- Updated the function to raise a ValueError if sort_by is not 0 or 1.
Sorting Logic:
- Updated the sorting logic to handle the new cols parameter structure.
Handling Categorical Data:
- Modified the code to convert categorical columns to strings to avoid issues with fillna("").
Handling Missing Values:
- Added df = df.fillna('') to fill NA values within the function to account for missing data.
Improved Function Documentation:
- Updated the function documentation to reflect the new parameters and error handling.

[0.0.1b0] - 2024-07-29

Contingency Table Updates

fillna('') to output so that null values come through, removed 'All' col name from output, sort options 0 and 1, updated docstring documentation. Tested successfully on Python 3.7.3

Updated `datatime` Imports to accomodate different Python versions

Compatibility Enhancement:

Added a version check for Python 3.7 and above.
- Conditional import of datetime to handle different Python versions.

if sys.version_info >= (3, 7):
    from datetime import ...

Assets 2

02 Aug 20:27

lshpaner

0.0.3

2b0f4b9

EDA Toolkit 0.0.3

Changelog

[0.0.3] - 2024-08-02

Stable release:
- Updated logo size, fixed citation title, and some minor readme cleanup:
  - added additional section for documentation, cleaned up verbiage, moved acknowledgements section before licensing and support

[0.0.2] - 2024-08-01

First stable release:
- No new updates to codebase; just minimal documentation updates to readme and setup.py files
- Added logo, badges, and Zenodo-certified citation to readme

[0.0.1rc0] - 2024-08-01

No new updates to codebase; just minimal documentation updates to readme and setup.py files

[0.0.1b0] - 2024-08-01

New `scatter_fit_plot()` and additional updayes

Added new scatter_fit_plot(), removed unused data_types(), added comment section headers

Added `xlim` and `ylim` inputs in `kde_distribution()`

Added xlim and ylim inputs to allow user to customize axes limits in kde_distribution()

Added `xlim` and `ylim` params to `stacked_crosstab_plot()`

Added xlim and ylim input parameters to stacked_crosstab_plot() to give users more flexibility on controlling axes limits

Added `x` and `y` limits to `box_violin_plot()`

Changed function name from metrics_box_violin() to box_violion_plot()
Added xlim and ylim inputs to control x and y-axis limits of box_violion_plot() (formerly known as metrics_box_violin)

Added ability to remove stacks from plots, plot all or one at a time

Key Changes

plot_type Parameter:
This parameter allows the user to choose between "regular", "normalized", or "both" plot types.
remove_stacks Parameter:
This parameter, when set to True, generates a regular bar plot using only the col parameter instead of a stacked bar plot. It only works when plot_type is set to "regular". If remove_stacks is set to True while plot_type is anything other than "regular", the function will raise an exception.

Explanation of Changes:

plot_type Parameter:
- This parameter provides flexibility to the user, allowing them to specify the type of plot to generate. The options are:
  - "regular": Generates a standard bar plot.
  - "normalized": Generates a normalized bar plot.
  - "both": Generates both regular and normalized bar plots.
remove_stacks Parameter:
- This parameter, when set to True, will generate a regular bar plot using only the col parameter. It effectively removes the stacking of the bars. This parameter is only applicable when plot_type is set to "regular". If used with any other plot_type, an exception will be raised to ensure proper usage.

These changes enhance the flexibility and functionality of the stacked_crosstab_plot function, allowing for more customizable and specific plot generation based on user requirements.

[0.0.1b0] - 2024-07-31

Refined `kde_distributions()`

Key Changes

Alpha Transparency for Histogram Fill:
- Added a fill_alpha parameter to control the transparency of the histogram bars' fill color.
- The default value is 0.6. An exception is raised if fill=False and fill_alpha is specified.
Custom Font Sizes:
- Introduced label_fontsize and tick_fontsize parameters to allow control over the font size of axis labels and tick marks independently.
Scientific Notation Toggle:
- Added a disable_sci_notation parameter to enable or disable scientific notation on axes.
Improved Error Handling:
- Added validation for the stat parameter to ensure that only valid options are accepted.
- Added checks to ensure proper usage of fill_alpha and hist_edgecolor when fill is set to False.
General Enhancements:
- Updated the function's docstring to reflect the new parameters and provide comprehensive guidance on its usage.

[0.0.1b0] - 2024-07-30

Enhance `kde_distributions` Function

Added Parameters

grid_figsize and single_figsize:
- Control the size of the overall grid figure and individual figures separately.
hist_color and kde_color:
- Allow customization of histogram and KDE plot colors.
hist_edgecolor:
- Allows customization of the histogram bar edges.
hue:
- Allows grouping data by a column.
fill:
- Controls whether to fill the histogram bars with color.
y_axis_label:
- Customizable y-axis label.
log_scale_vars:
- Specifies which variables to apply log scale.
bins and binwidth:
- Control the number and width of bins.
stat:
- Allows different statistics for the histogram (count, density, frequency, probability, proportion, percent).

Improvements

Validation and Error Handling:
- Checks for invalid log_scale_vars and throws a ValueError if any are found.
- Throws a ValueError if edgecolor is changed while fill is set to False.
- Issues a PerformanceWarning if both bins and binwidth are specified, warning of potential performance impacts.
Customizable y-axis label:
- Allows users to specify custom y-axis labels.
Warning for KDE with Count:
- Issues a warning if KDE is used with stat='count', as it may produce misleading plots.

Updated `add_ids` to ensure unique ids and idx check

This pull request updates the add_ids() function to enhance its functionality by:

Ensuring that each generated ID starts with a non-zero digit.
Adding a check to verify that the DataFrame index is unique.
Printing a warning message if duplicate index entries are found.

These changes improve the robustness of the function, ensuring that the IDs generated are always unique and valid, and provide necessary feedback when the DataFrame index is not unique.

Check for Unique Indices:

Before generating IDs, the function now checks if the DataFrame index is unique.
If duplicates are found, a warning is printed along with the list of duplicate index entries.

Generate Non-Zero Starting IDs:

The ID generation process is updated to ensure that the first digit of each ID is always non-zero.

Ensure Unique IDs:

A set is used to store the generated IDs, ensuring all IDs are unique before adding them to the DataFrame.

Updated the `add_ids()` function to enhance its functionality by:

Ensuring that each generated ID starts with a non-zero digit.
Adding a check to verify that the DataFrame index is unique.
Printing a warning message if duplicate index entries are found.

These changes improve the robustness of the function, ensuring that the IDs generated are always unique and valid, and provide necessary feedback when the DataFrame index is not unique.

Check for Unique Indices:

Before generating IDs, the function now checks if the DataFrame index is unique.
If duplicates are found, a warning is printed along with the list of duplicate index entries.

Generate Non-Zero Starting IDs:

The ID generation process is updated to ensure that the first digit of each ID is always non-zero.

Ensure Unique IDs:

A set is used to store the generated IDs, ensuring all IDs are unique before adding them to the DataFrame.

Fix int conversion for numeric cols, reset decimal_places=0

This PR fixes the integer conversion issue for numeric columns when decimal_places=0 in the save_dataframes_to_excel function. Additionally, it resets decimal_places to 0 as the default value.

Changes include:

Convert only numeric columns to integers when decimal_places=0.
Reset decimal_places default value to 0.

This ensures correct formatting and avoids errors during conversion.

Contingency Table Updates

Error Handling for Columns:
- Added a check to ensure at least one column is specified.
- Updated the function to accept a single column as a string or multiple columns as a list.
- Updated the function to raise a ValueError if no columns are provided or if cols is not correctly specified.
Function Parameters:
- Changed the parameters from col1 and col2 to a single parameter cols which can be either a string or a list.
Error Handling for SortBy:
- Renamed SortBy to sort_by to standardize nomenclature.
- Added a check to ensure sort_by is either 0 or 1.
- Updated the function to raise a ValueError if sort_by is not 0 or 1.
Sorting Logic:
- Updated the sorting logic to handle the new cols parameter structure.
Handling Categorical Data:
- Modified the code to convert categorical columns to strings to avoid issues with fillna("").
Handling Missing Values:
- Added df = df.fillna('') to fill NA values within the function to account for missing data.
Improved Function Documentation:
- Updated the function documentation to reflect the new parameters and error handling.

[0.0.1b0] - 2024-07-29

Contingency Table Updates

fillna('') to output so that null values come through, removed 'All' col name from output, sort options 0 and 1, updated docstring documentation. Tested successfully on Python 3.7.3

Updated `datatime` Imports to accomodate different Python versions

Compatibility Enhancement:

Added a version check for Python 3.7 and above.
- Conditional import of datetime to handle different Python versions.

if sys.version_info >= (3, 7):
    from datetime import datetime
else:
    import datetime

In dataframe_columns():

start_time = (datetime.now() if sys.version_info >= (3, 7)
              else datetime.datetime.now())

stop_time = (datetime.now() if sys.version_info >= (3, 7)
      ...

Assets 2

02 Aug 05:31

lshpaner

0.0.2

36830e6

eda_toolkit 0.0.2

First Stable Release - `EDA Toolkit 0.0.2`

added zenodo citation to PyPI
added logo to PyPI
added acknowledgements and references to PyPI

Assets 2

02 Aug 04:47

lshpaner

0.0.1c

1d9487a

eda_toolkit 0.0.1c Pre-release

Pre-release

Release Notes

Changes from `main_old.py` to `main.py`

1. Imports and Dependencies:

Removed:
- import datetime.
Added:
- import matplotlib.ticker as mticker for formatting.
- import sys for conditional imports.
- Conditional import of datetime depending on Python version (sys.version_info).

2. Function Changes:

Function add_ids:
- Parameter Modifications:
  - column_name renamed to id_colname.
  - New parameters num_digits to specify ID digit length, and set_as_index to optionally set the new ID column as the index.
- Functionality Update:
  - Allows setting the new ID column as an index and customizes the number of digits in generated IDs.
- Documentation Update: Expanded to describe new parameters and functionality.
Function scatter_plots_grid:
- New Functionality:
  - Added options for customizing axis limits (xlim, ylim).
  - Enhanced options for saving plots as PNG or SVG.
  - Added support for showing or hiding plots based on user preference (show_plot parameter).
  - Improved legend management with the option to show or remove the legend.
  - Introduced parameters to control tick labels and axis label font sizes.
Plot Saving and Displaying Enhancements:
- The code for saving and displaying plots has been refactored:
  - Saving individual plots: Can now be conditionally saved based on user input (save_individual).
  - Saving and displaying grids: Improved support for saving grids of plots as a single image and displaying them conditionally.
Additional Enhancements:
- Introduced new function for fitting and plotting best-fit lines in scatter plots.

3. New Functions:

add_best_fit:
- Description: Adds a best-fit line to a scatter plot with customizable line style and color.
- Parameters: ax, x_data, y_data, line_style, line_color.

4. Code Structural Changes:

Comment and Documentation Updates:
- Improved and expanded docstrings and inline comments.
- Added detailed explanations for new parameters and functionalities.
Refactoring:
- Plotting-related code was reorganized for better readability and maintainability.

5. Bug Fixes and Enhancements:

Plotting Enhancements:
- Improved handling of axis visibility and edge cases.
- Enhanced plot saving mechanisms to ensure correct file naming and prevent overwriting.
Error Handling:
- Improved error handling and warnings related to file saving and plotting.

Assets 2

25 Jul 04:05

lshpaner

0.0.1b

0492e66

0.0.1b Pre-release

Pre-release

Initial release via GitHub after manual release of 0.0.1a

Assets 2

Releases: lshpaner/eda_toolkit

EDA Toolkit 0.0.5

Ensure Consistent Font Size and Text Wrapping Across Plot Elements

Description

Testing

Outcome

EDA Toolkit 0.0.4

Changelog

[0.0.4] - 2024-08-02

[0.0.3] - 2024-08-02

[0.0.2] - 2024-08-01

[0.0.1rc0] - 2024-08-01

[0.0.1b0] - 2024-08-01

New scatter_fit_plot() and additional updayes

Added xlim and ylim inputs in kde_distribution()

Added xlim and ylim params to stacked_crosstab_plot()

Added x and y limits to box_violin_plot()

Added ability to remove stacks from plots, plot all or one at a time

Key Changes

Explanation of Changes:

[0.0.1b0] - 2024-07-31

Refined kde_distributions()

Key Changes

[0.0.1b0] - 2024-07-30

Enhance kde_distributions Function

Added Parameters

Improvements

Updated add_ids to ensure unique ids and idx check

Check for Unique Indices:

Generate Non-Zero Starting IDs:

Ensure Unique IDs:

Updated the add_ids() function to enhance its functionality by:

Check for Unique Indices:

Generate Non-Zero Starting IDs:

Ensure Unique IDs:

Fix int conversion for numeric cols, reset decimal_places=0

Contingency Table Updates

[0.0.1b0] - 2024-07-29

Contingency Table Updates

Updated datatime Imports to accomodate different Python versions

EDA Toolkit 0.0.3

Changelog

[0.0.3] - 2024-08-02

[0.0.2] - 2024-08-01

[0.0.1rc0] - 2024-08-01

[0.0.1b0] - 2024-08-01

New scatter_fit_plot() and additional updayes

Added xlim and ylim inputs in kde_distribution()

Added xlim and ylim params to stacked_crosstab_plot()

Added x and y limits to box_violin_plot()

Added ability to remove stacks from plots, plot all or one at a time

Key Changes

Explanation of Changes:

[0.0.1b0] - 2024-07-31

Refined kde_distributions()

Key Changes

[0.0.1b0] - 2024-07-30

Enhance kde_distributions Function

Added Parameters

Improvements

Updated add_ids to ensure unique ids and idx check

Check for Unique Indices:

Generate Non-Zero Starting IDs:

Ensure Unique IDs:

Updated the add_ids() function to enhance its functionality by:

Check for Unique Indices:

Generate Non-Zero Starting IDs:

Ensure Unique IDs:

Fix int conversion for numeric cols, reset decimal_places=0

Contingency Table Updates

[0.0.1b0] - 2024-07-29

Contingency Table Updates

Updated datatime Imports to accomodate different Python versions

eda_toolkit 0.0.2

First Stable Release - EDA Toolkit 0.0.2

eda_toolkit 0.0.1c

Release Notes

Changes from main_old.py to main.py

1. Imports and Dependencies:

2. Function Changes:

New `scatter_fit_plot()` and additional updayes

Added `xlim` and `ylim` inputs in `kde_distribution()`

Added `xlim` and `ylim` params to `stacked_crosstab_plot()`

Added `x` and `y` limits to `box_violin_plot()`

Refined `kde_distributions()`

Enhance `kde_distributions` Function

Updated `add_ids` to ensure unique ids and idx check

Updated the `add_ids()` function to enhance its functionality by:

Updated `datatime` Imports to accomodate different Python versions

New `scatter_fit_plot()` and additional updayes

Added `xlim` and `ylim` inputs in `kde_distribution()`

Added `xlim` and `ylim` params to `stacked_crosstab_plot()`

Added `x` and `y` limits to `box_violin_plot()`

Refined `kde_distributions()`

Enhance `kde_distributions` Function

Updated `add_ids` to ensure unique ids and idx check

Updated the `add_ids()` function to enhance its functionality by:

Updated `datatime` Imports to accomodate different Python versions

First Stable Release - `EDA Toolkit 0.0.2`

Changes from `main_old.py` to `main.py`