CSX460

This is the repository for Practical Machine Learning with R (CSX460) at the University of California, Berkeley. The most recent class is/was Spring 2017.

Course Description

This course provides an introduction to machine learning using R, the open source, statistical programming language. Once a niche set of tools for statisticians, programmers and quants, machine learning (sometimes also called statistical learning or data mining) has spread in popularity to become an indispensable tool to a wide variety of applications and disciplines. This course teaches the fundamentals of machine learning without delving into too much mathematics or code. The course will teach practical aspects of machine learning. Upon completion of the course students will be able to apply lessons to solve problems using machine learning in their own work.

Course Learning Objectives

Students of this class will learn:

Fundamental concepts in ML
The differnece between supervised, unsupervised, semi-supervised, adaptive/reinforcement learning
The three prerequisites of ML algorithms/models
- Loss function
- Restricted class of functions
- Search methodology for training
How to evaluate and compare ML model performance
How to pre-process data and build features
How to train ML models for prediction, categorization and recommendations
How to apply ML models on new data
How to use resampling techniques to calculate model performance
What the bootstrap is and how it works
What Bagging is and how and why it improves model performance
What Boosting is and how and why it improves model performance
How to implement/deploy ML models for use by a wider audience
How to frame questions to be answered using ML techniques
Collaborate in a group using tools for collaborative/social programming
Generate high quality, graphical and textual results

Intended Audience

Anyone who wishes to learn the fundamentals of machine learning
Anyone who wants to learn about using R to build, evaluate or deploy machine learning models.
Scientists, engineers, business analysts, research who explore and analyze data and wish to present their findings in well-formatted textual and graphical forms.
Anyone wishing to get hands-on experience building machine learning models.

Prerequisites

Experience programming in at least one high-level programming language such as BASIC, PASCAL, C, Java, Python, Perl, or Ruby.
Familiarity with R such as that gained through the Programming with R course.
Basic knowledge of statistics as covered in a first-semester undergraduate statistics course. There will be some coverage of basic statistical techniques as part of covering core elements of the Machine Learning.
Personal laptop for class assignments.

Text/Required Reading

Text for the Course:

***Applied Predictive Modeling***
ISBN-13: 978-1461468486
ISBN-10: 1461468485
Kuhn, Max and Johnson, Kjell
Springer Science+Business
2013

Supplementary:

***Machine Learning with R, 2nd Edition***
ISBN: 978-1-78439-390-8
Lantz, Brett
Packt Publishing 
2015

Assignments

All assignments are due midnight the day before the lecture.

Google Group

There is an google group for this class: CSX460. Contact the professor for entrance to the group.

Class Syllabus

Current Term: Spring 2017

This provides a session by session overview of CSX460 (Practical Machine Learning).

1. Introduction to R, setting up the ML developers environment

Topics:

Welcome and Introductions
Class Book, Materials, etc.
Course Overview
Setting up your environment
- Installing R/R Studio
- Installing git and using Github
Installing packages from CRAN and Github
Introduction to Maching Learning

Reading:

APM, Chapters 1-2

Supplementary Reading (Optional):

Machine Learning with R (MLR), Chapters 1-2

Exercise(s):

R Warmups

2. Machine Learning Fundamentals

Topics:

Supervised, unsupervised, and semi-supervised
Regression and classification
Measuring model error(s)
Machine learning prerequisites
Algorithm types
First Models

Reading:

Supplementary Reading (Optional):

MLR Chapter 3
Introducting magrittr

Exercise(s):

NYC FLights (Data Munging)

R Packages Introduced:

magrittr
Data Manipulation
- dplyr
data.table
Reading data:
- readr
- data.table::fread

3. Linear Regression

Topics:

Modeling Process Revisited
Data Pre-processing
Linear Regression
Model Formula
- Spline Fitting
Modeling Process
Naive Model

Reading:

APM Chapter 6.1 - 6.3, 6.5
Introduction to R Graphics with ggplot2
?formula
?MASS::stepAIC

Supplementary Reading (Optional):

MLwR, Chapter 6 pp.171-200
[ProjectTemplate](https://cran.r-project.org/package=ProjectTemplate
devtools

Exercise(s):

Model Arrival Delay from previous data

R Packages Introduced:

4. Introduction to classification / binary classification / logistic regression

Reading:

APM Chapter 12.1 - 12.2
Introduction to Statistical Learning, Classification 4.1-4.3 "Logistic Regression"

Supplementary Reading (Optional):

MLwR Chapter 10
AIC:Akaike Information Criterion

Exercise(s):

NYC Flights Logicstic Regression

R Packages Introduced:

stats::glm
MASS::stepAIC

5. Classification Metrics and Resampling

Topics:

Classification Metrics
Resampling

Reading:

APM Chapter 4.1 - 4.4 "Over Fitting and Model Tuning"
APM Chapter 5 "Measuring Performance in Regression Models" (~6 pages)
APM Chapter 11 "Measuring Performance in Classification Models" (~20 pages)

Supplementary Reading (Optional):

MLwR Chapter 3-4

Exercise(s):

classification-metrics
resampling

R Packages Introduced:

caret
gmodels
ROCR, pROC

6. Recursive Partitioning / Bias-Variance Trade-off

Topics:

Decision Trees/Recursive Partitioning
Bias-Variance Trade-off
Introduction to Caret

Reading:

APM Chapter 8.1-8.3 "Regression Trees and Rule-Based Models"
APM Chapter 14.1-14.2 "Classification and Rule-Based Models"
APM Chapter 5.2 "The Variance Bias Trade-Off" and 5.3 "Computing" (5 pages)
Caret Website There is a lot in the caret website, it is most important to familiarize yourself with the use of the models.

Supplementary Reading (Optional):

MLwR
- Chapter 5 "Divide and Conquer - Classification Using Decision Trees and Rules"
- Chapter 11 "Improving Model Performance" (First Part)
- Tuning Stock Models (pp. 347-358)

Exercise(s):

Caret and Recursive Partitioning

R Packges introduced:

caret
rpart
ctree
C50

7. Improving Model Performance

Topics:

Bagging
- Bagged Trees / Random Forests
Boosting

Reading:

APM Chapter 8.4-8.6 "Bagged Trees", "Random Forests", "Boosting"
APM Chapter 14.3-14.5 "Bagged Trees", "Random Forests", "Boosting"

Supplementary Reading (Optional):

MLwR Chapter 11 Improving Model Performance (Second Part)
- Tuning Stock Models (pp. 359-376)

Exercises:

exercise-caret-models.Rmd

R Packages introduced:

caret (cont)

8. Advanced Models

Topics:

Support Vector Machines
Neural Networks
- Deep Learning

Reading:

Chapter 7.1, 13.2 "Neural Networks"
Chapter 7.3, 13.4 "Support Vector Machines"

Exercise(s):

9. Time Series Modeling

Topics:

Survival Models
Time Series Models
- Regression
- [S][AR][I][MA] and variants
Forecast Window Models

Reading:

Forecasting Principals and Practice
- Chapters 1 "Getting Started"
- Chapter 2 "The Forecaster's Toolbox"
- Chapter 6 "Time Series Decomposition"

Exercise(s):

10. Deployment

Topics:

Delivery and Production
Patterns

Reading:

TBD

Supplementary Reading (Optional):

TBD

Exercise(s):

TBD

R Packages Introduced:

devtools

10. Special Topics

Topics:

Reading:

Supplementary Reading (Optional):

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
01-introduction		01-introduction
02-building-blocks		02-building-blocks
03-linear-regression		03-linear-regression
04-logistic-regression		04-logistic-regression
05-classification-metrics		05-classification-metrics
05-resampling		05-resampling
06-caret-and-recursive-partitioning		06-caret-and-recursive-partitioning
07-recursive-partition-bias-variance-trade-off		07-recursive-partition-bias-variance-trade-off
08-model-ensembles		08-model-ensembles
.gitignore		.gitignore
CSX460.Rproj		CSX460.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSX460

Course Description

Course Learning Objectives

Intended Audience

Prerequisites

Text/Required Reading

Assignments

Google Group

Class Syllabus

1. Introduction to R, setting up the ML developers environment

2. Machine Learning Fundamentals

3. Linear Regression

4. Introduction to classification / binary classification / logistic regression

5. Classification Metrics and Resampling

6. Recursive Partitioning / Bias-Variance Trade-off

7. Improving Model Performance

8. Advanced Models

9. Time Series Modeling

10. Deployment

10. Special Topics

About

Releases

Packages

Languages

saqib-ali/CSX460

Folders and files

Latest commit

History

Repository files navigation

CSX460

Course Description

Course Learning Objectives

Intended Audience

Prerequisites

Text/Required Reading

Assignments

Google Group

Class Syllabus

1. Introduction to R, setting up the ML developers environment

2. Machine Learning Fundamentals

3. Linear Regression

4. Introduction to classification / binary classification / logistic regression

5. Classification Metrics and Resampling

6. Recursive Partitioning / Bias-Variance Trade-off

7. Improving Model Performance

8. Advanced Models

9. Time Series Modeling

10. Deployment

10. Special Topics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages