modelling_week_2019

Credit Card Fraud Detection problem for the XIII Modelling Week, held in the Faculty of Mathematics of the Universidad Complutense de Madrid (UCM), during 10-14 June 2019. The Modelling Week is open to the students of the Master in Mathematical Engineering at UCM, as well as to participants from other mathematically oriented master programs worldwide. The purpose is to teach and guide the students to solve a realistic industry problem.

Problem in Kaggle Credit Card Fraud
Link to the data

The problem can be approached in three ways: supervised, unsupervised and mixed. We are going to start using a supervised approach, since it is simpler. If time permits, we'll explore unsupervised methods (a really interesting field).

Python libraries

jupyter,pandas,matplotlib,seaborn,sklearn,tensorflow,keras,imblearn,xgboost

Outline

Basic programming with python and jupyter
Exploratory data analysis, cleaning and preprocessing. Feature engineering.
Overfitting. Validation scheme. Difference between train, validation and test sets.
Metrics: precision, recall, ROC curve, AUC (ROC), F1, confusion matrix. Focus on unbalanced datasets.
Classification algorithms in sklearn. Comments on hyperparameter tuning.
xgboost in Python using xgboost.sklearn API.
Combination of models. Calibration. Ensembling and Stacking.
Neural Networks in keras:
- Feed Forward Neural Network for classification.
- Autoencoder as an anomaly detector (semi and unsupervised)
- Autoencoder as a feature builder (unsupervised)
Combination of unsupervised and supervised methods.

Cheatsheets

Jupyter (Datacamp)
Numpy (Datacamp)
Pandas (Datacamp)
Scikit-learn (Datacamp)
Matplotlib (Datacamp)
Seaborn (Datacamp)

Resources

Using xgboost in Python (Datacamp)
Combine xgboost and sklearn (GitHub)
Videos and slides ISLR
Machine Learning cheatsheets
8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset (machinelearningmastery)
Comparison of the different over-sampling algorithms (imblearn)
Comparison of the different under-sampling algorithms (imblearn)

Bibliography

Leo Breiman "Statistical Modeling: The Two Cultures" (2001) (Breiman)
Elements of Statistical Learning (ESL)
Introduction to Statistical Learning with R (ISLR)
Pattern Recognition and Machine Learning (Bishop)
Bayesian Data Analysis (BDA)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modelling_week_2019

Python libraries

Outline

Cheatsheets

Resources

Bibliography

About

Releases

Packages

Languages

License

davidggphy/modelling_week_2019

Folders and files

Latest commit

History

Repository files navigation

modelling_week_2019

Python libraries

Outline

Cheatsheets

Resources

Bibliography

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages