Credit Card Fraud Detection problem for the XIII Modelling Week, held in the Faculty of Mathematics of the Universidad Complutense de Madrid (UCM), during 10-14 June 2019. The Modelling Week is open to the students of the Master in Mathematical Engineering at UCM, as well as to participants from other mathematically oriented master programs worldwide. The purpose is to teach and guide the students to solve a realistic industry problem.
- Problem in Kaggle Credit Card Fraud
- Link to the data
The problem can be approached in three ways: supervised, unsupervised and mixed. We are going to start using a supervised approach, since it is simpler. If time permits, we'll explore unsupervised methods (a really interesting field).
jupyter
,pandas
,matplotlib
,seaborn
,sklearn
,tensorflow
,keras
,imblearn
,xgboost
- Basic programming with python and
jupyter
- Exploratory data analysis, cleaning and preprocessing. Feature engineering.
- Overfitting. Validation scheme. Difference between train, validation and test sets.
- Metrics: precision, recall, ROC curve, AUC (ROC), F1, confusion matrix. Focus on unbalanced datasets.
- Classification algorithms in
sklearn
. Comments on hyperparameter tuning. xgboost
in Python using xgboost.sklearn API.- Combination of models. Calibration. Ensembling and Stacking.
- Neural Networks in
keras
:- Feed Forward Neural Network for classification.
- Autoencoder as an anomaly detector (semi and unsupervised)
- Autoencoder as a feature builder (unsupervised)
- Combination of unsupervised and supervised methods.
- Jupyter (Datacamp)
- Numpy (Datacamp)
- Pandas (Datacamp)
- Scikit-learn (Datacamp)
- Matplotlib (Datacamp)
- Seaborn (Datacamp)
- Using
xgboost
in Python (Datacamp) - Combine
xgboost
andsklearn
(GitHub) - Videos and slides ISLR
- Machine Learning cheatsheets
- 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset (machinelearningmastery)
- Comparison of the different over-sampling algorithms (imblearn)
- Comparison of the different under-sampling algorithms (imblearn)