Skip to content

Data Science Course Materials for Aalto University in Mikkeli

License

Notifications You must be signed in to change notification settings

dustywhite7/pythonMikkeli

Repository files navigation

Tools for Data Analysis - Syllabus

A Data Science Course for Aalto University in Mikkeli

Instructor: Dr. Dustin White

Course Description

This course is intended to introduce the student to programming languages as tools for conducting data analysis, focusing on Python in particular. The course will cover basic principles of programming languages, as well as libraries useful in collecting, cleaning and analyzing data in order to answer research questions. Students will learn to use Python to apply forecasting tools and predictive models to business settings. The course will be divided between lecture and lab time, and labs will be focused on teaching students how to implement the programming techniques and statistical models discussed in lectures.

Required Texts

  1. Numsense! Data Science for the Layman: No Math Added (by Annalyn Ng and Kenneth Soo)
  2. Python Data Science Handbook (by Jake VanderPlas)

Learning outcomes

  1. Understand principles of programming using the Python programming language,
  2. Use Python to collect data from various sources for analysis,
  3. Employ Python for data cleaning,
  4. Implement statistical and predictive models in Python using business data,
  5. Understand how to choose the correct statistical or predictive model based on the available data and business context, and 6) understand how the information resulting from data analysis leads to improved business decision-making.

Assignment Values

Assignment Point Value
Homework (Combined) 60
Project 25
Project Proposal 5
Discussion & Participation 10

Grade Scale

Grade Threshold
90 - 100 5
80 - 89 4
70 - 79 3
60 - 69 2
50 - 59 1
0 - 49 0

Class Schedule

Day 1

Introduction to using Python. We will cover opening notebooks, and basic functions in Python.

Day 2

Loops and Conditions. We will focus on creating logical conditions for our programs to meet, as well as looping through code to streamline repeated processes.

Day 3

Functions. Creating functions in a programming language allows us to reuse code in many contexts and to solve new problems. We will explore how to do this in Python so that we better understand the code we will be using moving forward.

Day 4

Data Frames and Pandas. We will practice importing and utilizing data in Python. This is the basis for being able to conduct analysis in Python.

Day 5

Regular Expression and text analysis. Sometimes it is advantageous to be able to process text into quantifiable information. Regex provides us the capability to transform text and quickly extract patterns from raw data.

Day 6

Plotting in Python. We will create visuals using Python to be able to supplement the stories that we tell with data through visual media.

Day 7

Introducing Linear Regression and its implementation in Python. Linear regression provides a jumping-off point for statistical analysis, and gives us a chance to prepare our data for analysis.

Day 8

Classification and Regression Trees. Decision trees will give us a chance to discuss machine learning and why it differs from regression analysis.

Day 9

Random Forests and ensemble methods. Ensemble methods provide improved accuracy and robustness relative to single machine learning models. We will explore these properties through random forest models.

Day 10

Clustering models. We will explore unsupervised learning through the k-means clustering algorithm, and learn about trying to identify various groups of observations within data, both as a tool for prediction, as well as for better understanding the available data.

Day 11

Cross-Validation. We want our models to work in the real world. Using cross-validation, we can use our data to mimic the real-world and ensure that, to the best of our ability, our data practices represent the events that we expect to encounter as we implement our models.

Day 12

Web scraping allows an analyst to collect data from nearly any resource that can be accessed online. This powerful tool allows for the examination of complex problems and the creative collection of resources to address many different needs.

Day 13

Where possible, the use of Web APIs to streamline data collection is a valuable tool. Data collected by API is typically clean and standardized, unlike the data that is collected through web scraping. We will explore the Twitter API as an introduction to using APIs.

Day 14

Project Workday. We will use the time that we have today to finalize our projects and presentations for the last day of class.

Day 15

Project presentations. Each student will present a brief summary of a research question they have answered during the term, and policy implications from the results that they have uncovered.

Assignments

Assignments will be completed in Mimir, with one assignment corresponding to each of the topics covered in class. This makes for 13 assignments in total.

Project

Each student will be asked to find a research question based on data from the European Data Portal. Using the tools covered in class, each student will address their research question in a brief written report, and prepare a short presentation to be given on the final day of class. This project is intended to provide students the opportunity to showcase their learning through this course in a way that can be discussed in job interviews and other contexts where data analysis is a valuable skill.

Academic Integrity

If I find that you have plagiarized, been dishonest in completing your assignments, or cheated an an exam or assignment, then I reserve the right to award you no points on the entire exam, project, or assignment and to report the behavior to the university. I also reserve the right to award a failing grade, independent of your score on other assignments. Academic integrity is essential to education, and I take it very seriously.

Students with Disabilities

Reasonable accommodations are provided for students who are registered with Disability Services and make their requests sufficiently in advance. Please contact me so we can make arragements that suit your needs.

About

Data Science Course Materials for Aalto University in Mikkeli

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published