Skip to content

Churn Prediction on Kaggle telecom dataset, Full EDA on Local Telecom Dataset

Notifications You must be signed in to change notification settings

Dipankar1997161/Predictive-Modeling-for-Churn-prediction

Repository files navigation

Predictive Modeling for Churn prediction

  • Churn Prediction on Kaggle telecom dataset
  • Full EDA on Local Telecom Dataset
  • Churn Prediction App using Streamlit and Docker(docker deployment in progress)

Exploratory Data Analysis

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, find interesting relations among the variables.

Why is EDA so important?

  • Identifying data quality issues: EDA helps us to identify missing or incorrect values, outliers, and inconsistencies in the data, which can affect the accuracy of our models.

  • Feature selection: EDA helps us to identify the most important features in our data that are relevant to our problem. This can save us time and resources by focusing on the most important features instead of analyzing all features.

  • Understanding the data: EDA helps us to understand the distribution and characteristics of the data. This is important for selecting appropriate statistical models and machine learning algorithms.

  • Hypothesis testing: EDA helps us to test hypotheses about the relationships between variables in our data. This can help us to identify causal relationships and make predictions about future trends.

Overview:

Customer churn (or customer attrition) refers to the loss of customers or subscribers for any reason at all. Businesses measure and track churn as a percentage of lost customers compared to total number of customers over a given time period. For example - If Company ADG had 500 customers at the beginning of the month and only 450 customers at the end of the month, its customer churn rate would be 10%.

In this repo, there are 2 sections.

My main aim was to perform EDA

  • A churn prediction on a Telecom Dataset from Kaggle [Web-App is in development]
  • A Full EDA on a local telecom dataset [You can implement the model if you want]

Results:

Since the kaggle dataset is available, I will present my results on the local dataset I did [All these graphs are available in the notebook]

These are just a few. More in the notebook

final churn heatmap

total charge churn

heatmap churn 1

churn level

customer sevice

vmail churn

customer by state

Conclusion:

I believe that for a churn prediction task, EDA plays the most important role. Without EDA and necessary Feature Engineering, it might be hard to make an accurate model.

Sometimes, such Data Analysis can help the business realize the errors and work on it

About

Churn Prediction on Kaggle telecom dataset, Full EDA on Local Telecom Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published