This repository will contain the material for the Strata tutorial
Machine learning in Python with scikit-learn
by Andreas Mueller.
9:00am–12:30pm Tuesday, 12/01/2015
Data Science and Advanced Analytics
Location: 331
It is recommended that you update the materials before the course, as they might change in the days leading up to the conference.
Please bring a laptop with a working installation of Python (2.7, 3.4 or 3.5). The following packages are required:
- scikit-learn >= 0.16
- matplotlib >= 1.3
- numpy >= 1.5
- IPython >= 4.0
- Jupyter Notebook >= 4.0
The easiest way to install all requirements is to install the free Anaconda Python distribution: https://www.continuum.io/downloads (OS X, Windows, Linux)
Please download the material prior to arriving to the tutorial, and make sure you can run the notebooks. To run a notebook, start Jupyter Notebook and browse to the folder to which you downloaded it.
Part 01 - Introduction to Scikit-learn Part 02 - Unsupervised Transformers Part 03 - Cross-validation Part 04 - Grid Searches for Hyper Parameters Part 05 - Preprocessing and Pipelines Part 06 - Working With Text Data Part 07 - Feature Union Part 08 - Out Of Core Learning Part 09 - Out Of Core Learning for Text