Fraud-Detection

A machine learning model to detect the fraudulent transactions.

Overview

This repository contains the detailed analysis on a dataset containing credit card transactions. The target variable has two classes (Normal and Fraud). The dataset is challenging because it is highly imbalanced. More than 99% data points belong to Normal class.

Download the data

Download the dataset (csv format) from here.

Installation

Anaconda is highly recommended for executing any data science projects. It comes with a lots of pre-installed packages for data analysis and machine learning. Two packages needs to be manually installed beside installing Anaconda.

Seaborn (pip install seaborn or conda install seaborn)
Imbalanced-learn (pip install -U imbalanced-learn)

Summary

This notebook can be devided into the following sections:

Data exploration
Feature engineering
Evaluation metrics
Modeling
Parameter tuning

After initial exploration, the dataset turns out to be highly imbalanced. Normal machine learning algorithms are biased towards the majority class. Resampling technique has been used to handle this problem. New features are generated based on the distribution of variables with in class. The accuracy metric is not useful for imbalanced class, so f1 ( harmonic mean of precision and recall ) and auc ( area under the roc curve) are used to evaluate the model performance. The usual threshold (probability = 0.5) is not used for classification. It has been tuned using cross-validation strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fraud-Detection

Overview

Download the data

Installation

Summary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fraud-Detection

Overview

Download the data

Installation

Summary