This Node/Python library builds a model to predict if a particular Pull Request (PR) will be accepted when it is created, by learning information about a Github Project. The aim of this library is to aid Project integrator in managing PRs for a particular project. You can find more information about the model and how in this article.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
What things you need to install the software and how to install them
You will need the following:
- Python 3.6 or newer
- Node 8 or newer
- MongoDB 3.2 or newer
- Git
- A Github Access Token for using the Github API. This post explains how to get yours.
-
Choose a project to predict. In this document I will use https://github.com/Netflix/pygenie, because it is smaller, but you can use any, like the Node project
-
Clone this repository into your machine:
git clone https://github.com/sophilabs/pullreq-ml.git
-
(Optional) Install your local copy into a virtual environment. For example using the venv library you can do the following.
python -m venv venv source venv/bin/activate
-
Install dependencies
cd pullreq-ml # or pullreq-ml-master npm install pip install -r requirements.txt
-
(Optional) Create a user for your MongoDB instance
echo "db.createUser({ user: 'github', 'pwd': 'github', roles: ['readWrite'] })" | mongo github
-
Replace the contents of
config.js
with the actual repo and database authentication. For examplemodule.exports = { // Local Mongo DB MONGO_DB_URL: 'mongodb://github:github@localhost:27017/github', // Token GITHUB_ACCESS_TOKEN: '<your token here>', // Repo Information for example for https://github.com/Netflix/pygenie you should put REPO_OWNER: 'Netflix', REPO_NAME: 'pygenie' }
-
Clone the target repo inside the
targetrepo
foldergit clone https://github.com/Netflix/pygenie.git targetrepo
-
Start fetching Repo information
node fetch.js
-
Train and evaluate Pull Request Acceptance for your repository
python evaluate.py
You should see an output like the following one
Report on Test data precision recall f1-score support not merged 0.76 0.22 0.34 264 merged 0.78 0.98 0.87 753 avg / total 0.78 0.78 0.73 1017 Dumped classifier data to classifier.pkl
This command generates a
classifier.pkl
binary file which can be used to predict any PR on the target Project.
- Build a file to predict a particular PR against the trained model. A command like:
> python classify.py https://github.com/nodejs/node/pull/11107 Will not be merged!
- scikit-learn - Used their algorithms to estimate PR merge predictions.
- MongoDB - Used to store Github downloaded project data.
- Git - Used to compute diffs and analyze PR commit deltas.
Feel free to make a Pull Request if you find a bug or want to implement a feature. We welcome any help.
- Ignacio Avas - Initial work - igui
- Pablo Grill for his insight and knowledge over Machine Learning
pullreq-ml is Copyright (c) 2018 sophilabs, inc. It is free software, and may be redistributed under the terms specified in the license file.
pullreq-ml is maintained and funded by sophilabs, inc. The names and logos for sophilabs are trademarks of sophilabs, inc.