Credit Risk MLOPs

Introduction

Credit risk is something that can generate large profits or large losses depending on the size of the risk. So, to try to mitigate the risk of investors losing money, a machine learning model was created that predicts whether a person will pay back the loan on time or not. Obviously, the model is not 100% accurate, and is even a problem involving ethical issues, since a person could pay the loan strictly on time, even if the algorithm has predicted that he or she would not pay. So the model created is only for didactic and scientific purposes.

The data consists of approved loans from 2007 to 2011 from Lending Club, a personal loan company that matches borrowers with people who want to lend money to get a financial return on it. The dataset contains 42537 rows and 52 columns and is available on both the github directory and the Kaggle website.

Model Card

The model was deployed to the web using the FastAPI package and API tests were created. The API tests will be embedded in a CI/CD framework using GitHub Actions. After we built our API locally and tested it, we deployed it to Heroku and tested it again live. Weights and Biases were used to manage and track all artifacts.

So, in general, the notebooks used were divided into 7 parts:

The search for data
Exploratory analysis
Pre-Processing
Tests
Splitting the data between training and testing.
Training
Test

You can read more about the notebook walkthrough in our Medium article

Anaconda Environment

Create a conda environment with environment.yml:

conda env create --file environment.yml

To remove an environment in your terminal window run:

conda remove --name myenv --all

To list all available environments run:

conda env list

To activate the environment, use

conda activate myenv

Fast API

The API is implemented in the source/api/main.py whereas tests are on source/api/test_main.py.

For the sake of understanding and during the development, the API was constanly tested using:

uvicorn source.api.main:app --reload

and using these addresses:

http://127.0.0.1:8000/
http://127.0.0.1:8000/docs

The screenshot below show a view of the API docs.

For test the API, please run:

pytest source/api -vv -s

Heroku

Sign up for free and experience Heroku.
Now, it's time to create a new app. It is very important to connect the APP to our Github repository and enable the automatic deploys.
Install the Heroku CLI following the instructions.
Sign in to heroku using terminal

heroku login

In the root folder of the project check the heroku projects already created.

heroku apps

Check buildpack is correct:

heroku buildpacks --app risk-credit-mlops

Update the buildpack if necessary:

heroku buildpacks:set heroku/python --app risk-credit-mlops

When you're running a script in an automated environment, you can control Wandb with environment variables set before the script runs or within the script. Set up access to Wandb on Heroku, if using the CLI:

heroku config:set WANDB_API_KEY=xxx --app risk-credit-mlops

The instructions for launching an app are contained in a Procfile file that resides in the highest level of your project directory. Create the Procfile file with:

web: uvicorn source.api.main:app --host=0.0.0.0 --port=${PORT:-5000}

Configure the remote repository for Heroku:

heroku git:remote --app risk-credit-mlops

Push all files to remote repository in Heroku. The command below will install all packages indicated in requirements.txt to Heroku VM.

git push heroku main

Check the remote files run:

heroku run bash --app risk-credit-mlops

If all previous steps were done with successful you will see the message below after open: https://risk-credit-mlops.herokuapp.com/.
For debug purposes whenever you can fetch your app’s most recent logs, use the heroku logs command:

heroku logs

About Us

I (Morsinaldo Medeiros) and Alessandro Neto are students of the Postgraduate Program in Electrical and Computer Engineering (PPgEEC) at the Federal University of Rio Grande do Norte (UFRN). As the first project of the EEC1509 — Machine Learning course taught by Ivanovitch Silva, we took a classic machine learning model and adapted it to a pipeline, which contains good standardization practices in order to put the created model into production.

References

Main reference - Ivanovitch's git repo

Kaggle

Dataquest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Credit Risk MLOPs

Introduction

Model Card

Anaconda Environment

Fast API

Heroku

About Us

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Credit Risk MLOPs

Introduction

Model Card

Anaconda Environment

Fast API

Heroku

About Us

References