Skip to content

Latest commit

 

History

History
138 lines (102 loc) · 5.44 KB

README.md

File metadata and controls

138 lines (102 loc) · 5.44 KB

Credit Risk MLOPs

Introduction

Credit risk is something that can generate large profits or large losses depending on the size of the risk. So, to try to mitigate the risk of investors losing money, a machine learning model was created that predicts whether a person will pay back the loan on time or not. Obviously, the model is not 100% accurate, and is even a problem involving ethical issues, since a person could pay the loan strictly on time, even if the algorithm has predicted that he or she would not pay. So the model created is only for didactic and scientific purposes.

The data consists of approved loans from 2007 to 2011 from Lending Club, a personal loan company that matches borrowers with people who want to lend money to get a financial return on it. The dataset contains 42537 rows and 52 columns and is available on both the github directory and the Kaggle website.

Model Card

The model was deployed to the web using the FastAPI package and API tests were created. The API tests will be embedded in a CI/CD framework using GitHub Actions. After we built our API locally and tested it, we deployed it to Heroku and tested it again live. Weights and Biases were used to manage and track all artifacts.

So, in general, the notebooks used were divided into 7 parts:

  1. The search for data
  2. Exploratory analysis
  3. Pre-Processing
  4. Tests
  5. Splitting the data between training and testing.
  6. Training
  7. Test

You can read more about the notebook walkthrough in our Medium article

Anaconda Environment

Create a conda environment with environment.yml:

conda env create --file environment.yml

To remove an environment in your terminal window run:

conda remove --name myenv --all

To list all available environments run:

conda env list

To activate the environment, use

conda activate myenv

Fast API

The API is implemented in the source/api/main.py whereas tests are on source/api/test_main.py.

For the sake of understanding and during the development, the API was constanly tested using:

uvicorn source.api.main:app --reload

and using these addresses:

http://127.0.0.1:8000/
http://127.0.0.1:8000/docs

The screenshot below show a view of the API docs.

For test the API, please run:

pytest source/api -vv -s

Heroku

  1. Sign up for free and experience Heroku.
  2. Now, it's time to create a new app. It is very important to connect the APP to our Github repository and enable the automatic deploys.
  3. Install the Heroku CLI following the instructions.
  4. Sign in to heroku using terminal
heroku login
  1. In the root folder of the project check the heroku projects already created.
heroku apps
  1. Check buildpack is correct:
heroku buildpacks --app risk-credit-mlops
  1. Update the buildpack if necessary:
heroku buildpacks:set heroku/python --app risk-credit-mlops
  1. When you're running a script in an automated environment, you can control Wandb with environment variables set before the script runs or within the script. Set up access to Wandb on Heroku, if using the CLI:
heroku config:set WANDB_API_KEY=xxx --app risk-credit-mlops
  1. The instructions for launching an app are contained in a Procfile file that resides in the highest level of your project directory. Create the Procfile file with:
web: uvicorn source.api.main:app --host=0.0.0.0 --port=${PORT:-5000}
  1. Configure the remote repository for Heroku:
heroku git:remote --app risk-credit-mlops
  1. Push all files to remote repository in Heroku. The command below will install all packages indicated in requirements.txt to Heroku VM.
git push heroku main
  1. Check the remote files run:
heroku run bash --app risk-credit-mlops
  1. If all previous steps were done with successful you will see the message below after open: https://risk-credit-mlops.herokuapp.com/.
  2. For debug purposes whenever you can fetch your app’s most recent logs, use the heroku logs command:
heroku logs

About Us

I (Morsinaldo Medeiros) and Alessandro Neto are students of the Postgraduate Program in Electrical and Computer Engineering (PPgEEC) at the Federal University of Rio Grande do Norte (UFRN). As the first project of the EEC1509 — Machine Learning course taught by Ivanovitch Silva, we took a classic machine learning model and adapted it to a pipeline, which contains good standardization practices in order to put the created model into production.

References

Main reference - Ivanovitch's git repo

Kaggle

Dataquest