Skip to content

The DualDataDistributionAnalysis repo offers a Python script for comparing two datasets via statistical plots: boxplots, violin plots, histograms, KDEs, CDFs, and swarm plots. It leverages pandas, Matplotlib, and Seaborn to visualize and analyze data distributions, ideal for exploratory data analysis and insights.

Notifications You must be signed in to change notification settings

Msoltaninezhad/DualDataDistributionAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

DualDataDistributionAnalysis

The DualDataDistributionAnalysis repository contains a Python script designed to analyze and visualize the distributions of two distinct datasets labeled as 'Data 1' and 'Data 2'. The script employs the powerful data manipulation capabilities of pandas along with the robust plotting functionalities of Matplotlib and Seaborn to create a series of plots that provide insights into the statistical properties of the datasets.

The repository includes code to generate:

  • Boxplots: These show the median, quartiles, and outliers for each dataset, providing a quick visual summary of the distributions.
  • Violin Plots: These combine box plots with kernel density estimation to give a richer depiction of the data density around the values.
  • Histograms: These illustrate the frequency distributions of the datasets, allowing for the observation of data groupings and patterns.
  • Kernel Density Estimation (KDE) Plots: These smooth histograms to summarize the data's distribution with a continuous line.
  • Cumulative Distribution Function (CDF) Plots: These indicate the probability of a data point falling below a particular value.
  • Swarm Plots: These plot each individual data point and are useful for showing data clustering and spotting outliers without any data binning.

Each visualization is saved as an SVG file for high-quality representations, and all plots are displayed inline for immediate review.

This comprehensive suite of visualizations serves as a foundational tool for statistical analysis, suitable for exploratory data analysis (EDA), quality control, and comparison of data from two different conditions or sources. It's an invaluable resource for data analysts, scientists, and statisticians looking to understand and present their data distributions effectively.

About

The DualDataDistributionAnalysis repo offers a Python script for comparing two datasets via statistical plots: boxplots, violin plots, histograms, KDEs, CDFs, and swarm plots. It leverages pandas, Matplotlib, and Seaborn to visualize and analyze data distributions, ideal for exploratory data analysis and insights.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published