In this project we would be predicting customer churn from bank data. We would be adopting 8 different classification models and apply stratifiedKfold cross validation to check the performance of the models. Stratified K Fold is used because it is best for classification problems. The main challenge would be the imbalanced data which would be handled using SMOTE before training the model. After cross validation of the models, three top performing models are taken and finally hyperparameter tuning is done using grid search to identify ideal hyperparameters for best performance
-
churn modeling.csv
The dataset used for this project. Data Source: Kaggle.
-
main.ipynb
Implementation of the project - Jupyter notebook file.
-
Research.pdf
Detailed information about all preprocessing, implementation and the research work is available in this document.