Our solution to Kaggle's Amazon Access Challenge.
- The screenshot of our highest score is under
report/submission_screnshot.png
- Our code is in the
.ipynb
and.py
files in the top level directory. Instructions on how to run are below. - Our submission CSV is under
output/xgb_logreg_rf.csv
- Our report is under
report/report.pdf
You'll need Python 3.5 with the PyData stack installed to run this code. The easiest way to achieve this is to install Anaconda for Python 3.5.
Note: You must have XGBoost installed in order to re-run the models. If you just wish to run the final ensemble, you do not need the library installed.
Since we have saved the output of all of our individual models, it is easy to run the ensemble by itself. From the top level directory, simply run:
$ python rankedavg.py submission
The submission
command line argument is the name of the file. The result will be saved to output/submission.csv
.
If you wish to re-train the models, use the following steps:
1. Run the starter code logistic regressions
First move into the reference-code
directory.
$ cd reference-code/
Then, run the starter code.
$ python starter.py
When you are prompted to enter a name for the submission file, enter: starter_submission
2. Run our models
Go back to the top level directory.
$ cd ../
Then, run the file.
$ python models.py
3. Now you can run rankedavg.py
again to get the final ensembled submission