This is an introduction to the R statistical programming language, focusing on essential skills needed to perform data analysis from entry, to preparation, analysis, and finally presentation. During the course, you will not only learn basic R functionality, but also how to leverage the extensive community-driven package ecosystem, as well as how to write your own functions in R.
Course content is broken up into 7 seminars, each covering one content module except for the final review seminar. The length of each seminar may vary from module to module, but should generally be less than 3 hours. The first hour or so will be used to introduce new information, while the remainder of the time will be spent doing hands-on practice. The content for each module will be posted the day before the seminar, so that you can familiarize yourself with the material ahead of time if you like (though this isn’t required).
This module introduces the R programming language and the RStudio software. R programming topics will include coverage of basic operations and data object types, especially vectors, matrices, and data frames.
Seminar Exercises: Setup Instructions DataCamp Exercises
This module introduces a series of tools for data manipulation/preparation collectively known as the “Tidyverse.” Specifically, this module covers how to subset data, arrange it, transform it, and aggregate it. Students will also learn convenient tools to import and export data.
Seminar Exercises: Exercise nlsy97.zip Solutions (PDF) Solutions (R Script)
Seminar Exercise 2018: Exercise Solutions (PDF) Solutions (R Script)
This module introduces more advanced programming techniques to adapt R functionality to your own specific problems. Contents include how to perform loops, use conditional statements, and write basic functions. In addition, this module will cover how to join data sets in R using Tidyverse functions, manipulating strings, and scrape tables from the web.
Seminar Exercises: Exercise Solutions (PDF) Solutions (R Script)
Seminar Exercises 2018: Exercise Solutions (PDF) Solutions (R Script)
This module provides a few major enhancements to the workflow process of data analysis in R. Fist, Knitr and RMarkdown are introduced as a means to create dynamic reports from R using a variety of formats, such as HTML pages, PDF documents, and beamer presentations. Then, RStudio Projects are introduced as means of organizing folders for empirical projects. Finally, Git and GitHub are introduced to perform version control.
Seminar Exercises: Exercise Solutions (PDF) Solutions (Beamer) Solutions (html) Solutions (Rmd)
In this module, standard linear regression in R is introduced, as well as coverage of common diagnostics and post-estimation procedures. In addition, further methods of regression analysis are covered, with special emaphasis on methods for panel and instrumental variables data. Finally, the ggplot2 package is introduced as a means of creating compelling graphs in R.
Seminar Exercises: Exercise Exercise - Part B nlsy97.rds Solutions (PDF) Solutions (Rmd)
Seminar Exercises 2018: Exercise wdi_data.rds Solutions (PDF) Solutions (Rmd)
This module introduces the basic intuition of Bayesian statistical methods and how to perform Bayesian analysis in R, primarily using the rstanarm package.
In this module, an extended empirical exercise is utilized to review the skills developed over the preceding seminars. The review will function as preparation for the capstone project, in which students individually replicate results from a recent economics paper.
For the capstone project, you will be replicating results from “Intergenerational Mobility and Preferences for Redistribution” (AER 2018) by Alberto Alesina, Stefanie Stantcheva, and Edoardo Teso. The capstone project is due on March 28^{th}.
Course Teacher: Andrew Proctor
Office: A 711 (Arrange by email or stop in if I’m there.)