credit-approval

A machine learning model to accurately predict the creditworthiness of customers

Academic Project - Credit Scoring

Author: Lakshman Palli

Business Understanding

Banks and credit card companies face the challenge of determining the creditworthiness of individuals to minimize the risk of financial losses. Data mining is employed to analyze patterns and trends in customer data, aiding in better decision-making and reducing the likelihood of lending to high-risk individuals.

Data Understanding

The dataset comprises 1000 entries with 21 columns, featuring both numerical and categorical attributes relevant to consumer credit risk assessment. The 'status' column serves as the target variable, indicating creditworthiness (Good: 1, Bad: 2).

Data Preparation

Isolation of Target Variable: The 'status' variable was isolated.
Dummy Variable Creation: Categorical attributes were transformed into binary columns for modeling.
Binary Labeling: 'Good' (0) and 'Bad' (1) labels were assigned to the 'status' variable.
Data Partitioning: The dataset was split into a 90-10 training-testing set.
Scaling: StandardScaler was used to standardize the data for equal contribution in algorithms.
Feature Selection: RandomForestClassifier aided in selecting 35 features based on importance scores.

Modeling

Two models were employed: Logistic Regression and Random Forest Classifier.

Logistic Regression Results:

Accuracy: 78%
Precision, Recall, F1-Score (Bad Credits): 0.62, indicating balanced performance.

Random Forest Classifier Results:

Accuracy: 72%
Precision, Recall (Bad Credits): Lower than Logistic Regression.

Hyperparameter Tuning:

GridSearchCV tuned Logistic Regression hyperparameters with no change in accuracy.

Visualizations:

Confusion Matrix: Identified model mistakes (11 false positives, 11 false negatives).
ROC Curve: Balanced true positive and false positive rates.
Precision-Recall Curve: Balanced trade-off between precision and recall.

Conclusion on Modeling

Logistic Regression demonstrated more balanced performance. Hyperparameter tuning refined the model. Visualizations highlighted areas for improvement.

Evaluations

False Positives and Negatives: Both present, requiring careful consideration.
Precision and Recall: Balanced, crucial for business success.
Robustness: Further evaluation needed for diverse customer profiles and economic changes.

Deployment

Benefits:

Automated Decision-Making: Efficient credit scoring process.
Risk Mitigation: Identifying potential bad loans, reducing bad debt.

Concerns and Issues:

Model Interpretability: Stakeholder understanding and trust are crucial.
Data Privacy and Ethics: Secure and ethical handling of customer data.
Changing Economic Environments: Continuous monitoring and tuning required.

Wisdom from the Project

Importance of Data Preparation: Vital for improving model performance.
Model Evaluation: Holistic evaluation beyond accuracy is essential.
Continuous Improvement: Ongoing monitoring, evaluation, and improvement are necessary for adapting to evolving conditions and enhancing decision-making.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data_Mining_Academic_Project.ipynb		Data_Mining_Academic_Project.ipynb
LICENSE		LICENSE
Project Report.pdf		Project Report.pdf
README.md		README.md
german-credit-card.csv		german-credit-card.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credit-approval

Academic Project - Credit Scoring

Author: Lakshman Palli

Business Understanding

Data Understanding

Data Preparation

Modeling

Logistic Regression Results:

Random Forest Classifier Results:

Hyperparameter Tuning:

Visualizations:

Conclusion on Modeling

Evaluations

Deployment

Benefits:

Concerns and Issues:

Wisdom from the Project

About

Releases

Packages

Languages

License

LakshmanPalli/credit-approval

Folders and files

Latest commit

History

Repository files navigation

credit-approval

Academic Project - Credit Scoring

Author: Lakshman Palli

Business Understanding

Data Understanding

Data Preparation

Modeling

Logistic Regression Results:

Random Forest Classifier Results:

Hyperparameter Tuning:

Visualizations:

Conclusion on Modeling

Evaluations

Deployment

Benefits:

Concerns and Issues:

Wisdom from the Project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages