A machine learning model to accurately predict the creditworthiness of customers
Banks and credit card companies face the challenge of determining the creditworthiness of individuals to minimize the risk of financial losses. Data mining is employed to analyze patterns and trends in customer data, aiding in better decision-making and reducing the likelihood of lending to high-risk individuals.
The dataset comprises 1000 entries with 21 columns, featuring both numerical and categorical attributes relevant to consumer credit risk assessment. The 'status' column serves as the target variable, indicating creditworthiness (Good: 1, Bad: 2).
- Isolation of Target Variable: The 'status' variable was isolated.
- Dummy Variable Creation: Categorical attributes were transformed into binary columns for modeling.
- Binary Labeling: 'Good' (0) and 'Bad' (1) labels were assigned to the 'status' variable.
- Data Partitioning: The dataset was split into a 90-10 training-testing set.
- Scaling: StandardScaler was used to standardize the data for equal contribution in algorithms.
- Feature Selection: RandomForestClassifier aided in selecting 35 features based on importance scores.
Two models were employed: Logistic Regression and Random Forest Classifier.
- Accuracy: 78%
- Precision, Recall, F1-Score (Bad Credits): 0.62, indicating balanced performance.
- Accuracy: 72%
- Precision, Recall (Bad Credits): Lower than Logistic Regression.
- GridSearchCV tuned Logistic Regression hyperparameters with no change in accuracy.
- Confusion Matrix: Identified model mistakes (11 false positives, 11 false negatives).
- ROC Curve: Balanced true positive and false positive rates.
- Precision-Recall Curve: Balanced trade-off between precision and recall.
Logistic Regression demonstrated more balanced performance. Hyperparameter tuning refined the model. Visualizations highlighted areas for improvement.
- False Positives and Negatives: Both present, requiring careful consideration.
- Precision and Recall: Balanced, crucial for business success.
- Robustness: Further evaluation needed for diverse customer profiles and economic changes.
- Automated Decision-Making: Efficient credit scoring process.
- Risk Mitigation: Identifying potential bad loans, reducing bad debt.
- Model Interpretability: Stakeholder understanding and trust are crucial.
- Data Privacy and Ethics: Secure and ethical handling of customer data.
- Changing Economic Environments: Continuous monitoring and tuning required.
- Importance of Data Preparation: Vital for improving model performance.
- Model Evaluation: Holistic evaluation beyond accuracy is essential.
- Continuous Improvement: Ongoing monitoring, evaluation, and improvement are necessary for adapting to evolving conditions and enhancing decision-making.