This repository has Jupyter Notebooks showcasing code examples for the purpose of Data Analysis and Machine Learning in Python. The notebooks contain code blocks that can be used as reference for required tasks.
- 1_Data_operations.ipynb: Covers major data operations required before getting into any analysis or model building
- 2_Pandas_apply_optimization.ipynb: Shows the comparison between various ways of applying functions to a pandas df. Helps in optimizing pandas codes
- 3_Clustering_kmeans.ipynb: Showcases the flow of a clustering exercise using customer sales data
- pyspark/1_Clustering_kmeans.ipynb: Showcases the flow of a clustering exercise using customer sales data
- Linear regression
- Logistic regression
- Decision Trees and Random forests
- More nbs for PySpark
I plan to keep updating existing notebooks as well along the way.