Instructor: Dr. Dustin White
This course is intended to introduce the student to programming languages as tools for conducting data analysis, focusing on Python in particular. The course will cover basic principles of programming languages, as well as libraries useful in collecting, cleaning and analyzing data in order to answer research questions. Students will learn to use Python to apply forecasting tools and predictive models to business settings. The course will be divided between lecture and lab time, and labs will be focused on teaching students how to implement the programming techniques and statistical models discussed in lectures.
- Numsense! Data Science for the Layman: No Math Added (by Annalyn Ng and Kenneth Soo)
- Link: Amazon
- Python Data Science Handbook (by Jake VanderPlas)
- Link: GitHub
- Understand principles of programming using the Python programming language,
- Use Python to collect data from various sources for analysis,
- Employ Python for data cleaning,
- Implement statistical and predictive models in Python using business data,
- Understand how to choose the correct statistical or predictive model based on the available data and business context, and 6) understand how the information resulting from data analysis leads to improved business decision-making.
Assignment | Point Value |
---|---|
Homework (Combined) | 60 |
Project | 25 |
Project Proposal | 5 |
Discussion & Participation | 10 |
Grade | Threshold |
---|---|
90 - 100 | 5 |
80 - 89 | 4 |
70 - 79 | 3 |
60 - 69 | 2 |
50 - 59 | 1 |
0 - 49 | 0 |
Introduction to using Python. We will cover opening notebooks, and basic functions in Python.
Loops and Conditions. We will focus on creating logical conditions for our programs to meet, as well as looping through code to streamline repeated processes.
Functions. Creating functions in a programming language allows us to reuse code in many contexts and to solve new problems. We will explore how to do this in Python so that we better understand the code we will be using moving forward.
Data Frames and Pandas. We will practice importing and utilizing data in Python. This is the basis for being able to conduct analysis in Python.
Regular Expression and text analysis. Sometimes it is advantageous to be able to process text into quantifiable information. Regex provides us the capability to transform text and quickly extract patterns from raw data.
Plotting in Python. We will create visuals using Python to be able to supplement the stories that we tell with data through visual media.
Introducing Linear Regression and its implementation in Python. Linear regression provides a jumping-off point for statistical analysis, and gives us a chance to prepare our data for analysis.
Classification and Regression Trees. Decision trees will give us a chance to discuss machine learning and why it differs from regression analysis.
Random Forests and ensemble methods. Ensemble methods provide improved accuracy and robustness relative to single machine learning models. We will explore these properties through random forest models.
Clustering models. We will explore unsupervised learning through the k-means clustering algorithm, and learn about trying to identify various groups of observations within data, both as a tool for prediction, as well as for better understanding the available data.
Cross-Validation. We want our models to work in the real world. Using cross-validation, we can use our data to mimic the real-world and ensure that, to the best of our ability, our data practices represent the events that we expect to encounter as we implement our models.
Web scraping allows an analyst to collect data from nearly any resource that can be accessed online. This powerful tool allows for the examination of complex problems and the creative collection of resources to address many different needs.
Where possible, the use of Web APIs to streamline data collection is a valuable tool. Data collected by API is typically clean and standardized, unlike the data that is collected through web scraping. We will explore the Twitter API as an introduction to using APIs.
Project Workday. We will use the time that we have today to finalize our projects and presentations for the last day of class.
Project presentations. Each student will present a brief summary of a research question they have answered during the term, and policy implications from the results that they have uncovered.
Assignments will be completed in Mimir, with one assignment corresponding to each of the topics covered in class. This makes for 13 assignments in total.
Each student will be asked to find a research question based on data from the European Data Portal. Using the tools covered in class, each student will address their research question in a brief written report, and prepare a short presentation to be given on the final day of class. This project is intended to provide students the opportunity to showcase their learning through this course in a way that can be discussed in job interviews and other contexts where data analysis is a valuable skill.
If I find that you have plagiarized, been dishonest in completing your assignments, or cheated an an exam or assignment, then I reserve the right to award you no points on the entire exam, project, or assignment and to report the behavior to the university. I also reserve the right to award a failing grade, independent of your score on other assignments. Academic integrity is essential to education, and I take it very seriously.
Reasonable accommodations are provided for students who are registered with Disability Services and make their requests sufficiently in advance. Please contact me so we can make arragements that suit your needs.