papersize | documentclass | classoption | colorlinks |
---|---|---|---|
a4 |
scrartcl |
DIV=14 |
true |
Welcome!
Lectures: There is a two-hour lecture each week during the term on Wednesdays from 13:00 to 15:00 in CLM.2.02.
Seminars: There is a one-hour "lab-style" seminar each week during the term. See the LSE Timetable for the schedule and locations for the seminars.
There are no lectures or seminars during week 6, which is LSE's reading week.
Office hour slots with all instructors should be booked via LSE's StudentHub.
- Ryan Hübert, Department of Methodology. Course convenor.
- Dan de Kadt, Department of Methodology.
- Charlotte Kuberka, Department of Government.
Type | Due date | |
---|---|---|
1 | Formative in-class exercises | during seminars |
2 | Formative practice problem set | Friday, 1 November 2024, 5pm |
3 | Summative mid-term problem set | Friday, 22 November 2024, 5pm |
4 | Summative final take-home assessment | Wednesday, 15 January 2025, 5pm |
Important note: There may be some small changes to and/or reorganisation of the course topics during the first weeks of the course.
Week | Topic | Lecturer |
---|---|---|
1 | Introduction | Ryan Hübert |
2 | Tabular data | Ryan Hübert |
3 | Data visualisation | Ryan Hübert |
4 | Textual data | Ryan Hübert |
5 | HTML, CSS, and scraping static pages | Ryan Hübert |
6 | Reading week | |
7 | XML, RSS, and scraping non-static pages | Ryan Hübert |
8 | Working with APIs | Ryan Hübert |
9 | Other data types | Ryan Hübert |
10 | Creating and managing databases | Ryan Hübert |
11 | Interacting with online databases | Ryan Hübert |
Important note: Links to slides and code scripts will be updated/added in advance of each week's teaching. There may also be minor adjustments/updates to the weekly readings posted below, so please monitor regularly.
In the first week, we will introduce some basic concepts of how data is recorded and stored, and we will also review R fundamentals. Because the course relies fundamentally on GitHub, a collaborative code and data sharing platform, we will also discuss the use of git and GitHub.
- Lecture
- Code: A plain R script, a first R markdown example, and a recap on vectors, lists, data frames
- Seminar
- Review of Git/GitHub basics discussed in lecture
- Branches, merges, and pull requests
- Wickham, Hadley. Nd. Advanced R, 2nd ed. Ch 3, Names and values, Chapter 4, Vectors, and Chapter 5, Subsetting. (Ch. 2-3 of the print edition),
- GitHub Docs, especially: "About GitHub and Git", "Hello World", and "GitHub flow".
- GitHub. "Basic formatting syntax" (a markdown cheatsheet).
- Markdown Guide. "Markdown Cheat Sheet."
- Lake, P. and Crowther, P. 2013. Concise guide to databases: A Practical Introduction. London: Springer-Verlag. Chapter 1, Data, an Organizational Asset
- Nelson, Meghan. 2015. "An Intro to Git and GitHub for Beginners (Tutorial)."
- Jim McGlone, "Creating and Hosting a Personal Site on GitHub A step-by-step beginner's guide to creating a personal website and blog using Jekyll and hosting it for free using GitHub Pages.". (See also https://docs.github.com/en/pages/quickstart.)
This week discusses processing tabular data in R with functions from the tidyverse
after some further review of R fundamentals.
- Slides
- Code: Conditionals, loops, and functions, data processing in R, industrial production dataset, and industrial production and unemployment dataset
- Code: Dplyr exercises, solution
- Wickham, Hadley and Garett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O'Reilly. Part II Wrangle, Tibbles, Data Import, Tidy Data (Ch. 7-9 of the print edition).
- The Tidyverse collection of packages for R.
Note: there is a newer version of the Wickham and Grolemund text from 2023, which is available at https://r4ds.hadley.nz/.
The lecture this week will offer an overview of the principles of exploratory data analysis through (good) data visualization. In the coding session and seminars, we will practice producing our own graphs using ggplot2.
- Slides
- Lecture code: Anscombe, ggplot2 walkthrough
- Data: Congressional Facebook posts, unemployment data
- Further reference code: ggplot2 basics, ggplot2 scales, axes, and legends
- Code: Exercises in visualistion, solution
- Graphic to replicate: Unemployment rates
- Wickham, Hadley and Garett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O'Reilly. Data visualization, Graphics for communication (Ch. 1 and 22 of the print edition).
- Hughes, A. (2015) "Visualizing inequality: How graphical emphasis shapes public opinion" Research and Politics.
- Tufte, E. (2002) "The visual display of quantitative information".
This is a formative assessment, and is due 1 November 2024 by 5pm. You must submit your response as a knitted .html file via the Moodle page.
Feedback on the Practice Problem Set will be returned by 15th November (if submitted by the deadline).
More details to be made available later in the term.
We will learn how to work with unstructured data in the form of text and discuss character encoding, search and replace with regular expressions, and elementary quantitative textual analysis.
- Slides
- Code: Regular expressions in R, text analysis, parsing pdfs
- Data: Sample texts, Keynes' "General Theory" cover
- Code: Exercises in text analysis, solution
- Data: UoL institutions
- Kenneth Benoit. July 16, 2019. "Text as Data: An Overview" Forthcoming in Cuirini, Luigi and Robert Franzese, eds. Handbook of Research Methods in Political Science and International Relations. Thousand Oaks: Sage.
- Wickham, Hadley and Garett Grolemund. 2017, Chapter 14
- Regular expressions cheat sheet
- Regular expressions in R vignette
This week we cover the basics of web scraping for tables and unstructured data from static pages. We will also discuss the client-server model.
- Lazer, David, and Jason Radford. 2017. “Data Ex Machina: Introduction to Big Data.” Annual Review of Sociology 43(1): 19–39.
- Howe, Shay. 2015. Learn to Code HTML and CSS: Develop and Style Websites. New Riders. Chs 1-8.
- Kingl, Arvid. 2018. Web Scraping in R: rvest Tutorial.
- Munzert, Simon, Christian Rubba, Peter Meissner, and Dominic Nyhuis D. 2014. Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Hoboken, NJ/Chichester, UK:Wiley & Sons. Ch. 2-4, 9.
- Severance, Charles Russell. 2015. Introduction to Networking: How the Internet Works. Charles Severance, 2015.
- Duckett, Jon. 2011. HTML and CSS: Design and Build Websites. New York: Wiley.
This is a summative assessment worth 50% of your final mark. It is due 22 November 2024 by 5pm. You must submit your response as a knitted .html file via the Moodle page.
Feedback on the Mid-term problem set will be returned as per the ASDS/SRM handbook.
More details to be made available later in the term.
Continuing from the material covered in Week 5, we will learn the advanced topics in scraping the web. The topics include the scraping documents in XML (such as RSS), and scraping websites with non-static components with Selenium.
- Mozilla Developer Web Docs. What is JavaScript.
- Web Scraping with R and PhantomJS.
- Mozilla Developer Web Docs. A First Splash into JavaScript.
This week discusses how to work with Application Programming Interfaces (APIs) that offer developers and researchers access to data in a structured format.
- Barberá & Steinert-Threlkeld. 2018. "How to use social media data for political science research". In The Sage handbook of research methods in political science and international relations, pages 404-423.
- Ruths and Pfeffer. 2014. Social media for large studies of behavior. Science.
We will learn how to work with other data types, such as spatial data. Time permitting, we will also briefly discuss compute constraints and parallelization.
- Code: to be posted
- To be posted
- To be posted
This session will offer an introduction to relational databases: structure, logic, and main types. We will learn how to write SQL code, a language designed to query this type of databases that is currently employed by many companies; and how to use it from R using the DBI package.
- Code: SQL exercises, solution
- Beaulieu. 2009. Learning SQL. O'Reilly. (Chapters 1, 3, 4, 5, 8)
- Stephens et al. 2009. Teach yourself SQL in one hour a day. Sam's Publishing.
This week covers how to set up and use relational databases in the cloud and fundamentals of a document based NoSQL database.
- Code: Exercises BigQuery, SQL joins, SQL subqueries, solution BigQuery, solution joins, solution subqueries
- Beaulieu. 2009. Learning SQL. O'Reilly. (Chapters 2)
- Hows, Membrey, and Plugge. 2014. MongoDB Basics. Apress. (Chapter 1)
- Tigani and Naidu. 2017. Google BigQuery Analytics. Weily. (Chapters 1-3)
This is a summative assessment worth 50% of your final mark. It is due Wednesday, 15 January 2025 by 5pm.
More details to be made available later in the term.