Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.63 KB

README.md

File metadata and controls

30 lines (19 loc) · 1.63 KB

Fraud-Detection

A machine learning model to detect the fraudulent transactions.

Overview

This repository contains the detailed analysis on a dataset containing credit card transactions. The target variable has two classes (Normal and Fraud). The dataset is challenging because it is highly imbalanced. More than 99% data points belong to Normal class.

Download the data

Download the dataset (csv format) from here.

Installation

Anaconda is highly recommended for executing any data science projects. It comes with a lots of pre-installed packages for data analysis and machine learning. Two packages needs to be manually installed beside installing Anaconda.

  • Seaborn (pip install seaborn or conda install seaborn)
  • Imbalanced-learn (pip install -U imbalanced-learn)

Summary

This notebook can be devided into the following sections:

  • Data exploration
  • Feature engineering
  • Evaluation metrics
  • Modeling
  • Parameter tuning

After initial exploration, the dataset turns out to be highly imbalanced. Normal machine learning algorithms are biased towards the majority class. Resampling technique has been used to handle this problem. New features are generated based on the distribution of variables with in class. The accuracy metric is not useful for imbalanced class, so f1 ( harmonic mean of precision and recall ) and auc ( area under the roc curve) are used to evaluate the model performance. The usual threshold (probability = 0.5) is not used for classification. It has been tuned using cross-validation strategy.