Preface

These lab exercises supplement the third edition of OpenIntro Statistics textbook. Each lab steps through the process of using the R programming language for collecting, analyzing, and using statistical data to make inferences and conclusions about real world phenomena.

This version of the labs have been modified by Arend Kuyper to include new datasets and examples for introductory statistics courses at Northwestern University. Visit Labs for R at OpenIntro for the original materials. The chapters in this book were adapted from labs originally written by Mark Hansen and adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.

Using these labs

All of the labs on this website are made available under a Creative Commons Attribution-ShareAlike license. You are free to copy, redistribute, and modify the material so long as you provide attribution. Any derivative versions of the content must be distributed under the same license.

Creative Commons Attribution ShareAlike

Creative Commons Attribution ShareAlike

0.1 Working Efficiently with RStudio

0.1.1 Why RStudio?

RStudio is an extremely powerful tool that is intended to optimize how we interact with the statistical software known as R. We could use R’s basic interface, but RStudio is designed to streamline and organize statistical and analytic work with R. Like any tool we must learn how to use it properly, which is the focus of this lab.

While it might seem clunky or cumbersome at first, it is important to discipline yourself and adhere to sound workflow practices. Doing this from the very beginning will payoff immensely in later labs and beyond — whether or not you intend to work with RStudio in the future. Exercising and expanding your mind to preform analytic coding will make you a better critical thinker and problem solver.


0.1.2 Setting Up an R Project

It is important to recognize that quality analytic work requires that your work be easy to follow, replicate, and reference by others or by the future you. Therefore it is imperative that you strive to be as organized as possible. RStudio helps you organize all your work on a given data analysis/project through the creation of R projects.

There are several ways one could go about creating an R project, but we would suggest following the steps outlined below. These steps outline how to get setup for the first lab, but should, with obvious alterations, be followed for each lab.

Step 1

Create a folder somewhere on your computer, say on your desktop, and give it a descriptive name (e.g. STAT 202). This folder is where you will keep all of your work for each lab. You could save all your electronic notes here too.

Step 2

Next you will want to create a subfolder for an individual task or sub-project (e.g. Lab 01). The graphic below displays an example folder structure.

Example Folder Structure

Example Folder Structure

Step 3

Open RStudio.

Step 4

  1. Create a project by navigating to the upper right-hand corner of the program and clicking Project (None) » New Project.
Create a New Project in RStudio

Create a New Project in RStudio

Step 5

Select Existing Directory. Recall that in step 1, you created a file location for lab 1 — our data analysis project.

Select Existing Directory

Select Existing Directory

Step 6

Click Browse and navigate to where you created the Lab 01 folder. Select this folder and then click Create Project.

Create Project

Create Project

Your R project has now been created. Note that in the upper right-hand corner, the program indicates that you are working on the project named Lab 01.

Brand new R Project

Brand new R Project

Step 7

Creating an R script is the next MAJOR step in this process. Using a script file is key to organizing your code for quick reference. You should think of an R script file as a thorough record of how to conduct an analysis or solve the problem at hand. You should strive to make your R script as organized as possible so that someone else could work through your code and reproduce the same output/answers/results.

To create an R script, go to the upper left-hand corner and click the white box icon then select R Script.

Create an R Script in R Studio

Create an R Script in R Studio

Step 8

Now you can proceed to write your code in the R Script. You can also save your progress by clicking on the save icon located in the icon bar. Notice that it is saved within your R project folder — Lab 01 in this case. This is why we created an R project, so that all of our work for an analysis/project is kept in one central location.

Now, let’s practice writing some R code. Good analytic code requires good comments. Comments are meant for human consumption and to explain what the executable code is doing. Therefore we need to let the program know that it is a comment and that it should not attempt to run it. In R, # is used to signal a comment. In RStudio, this will turn the line green, which indicates that it will be read as a comment — see the figure below. Also notice how we use white space (empty lines) to make it easier on our eyes to navigate and read the script file. Practice by typying the code that is pictured.

We can use R as a calculator, as shown below. To run the line of code 2+2, you place your cursor anywhere on the that line of code and click Run, which is located in the top right corner of the script pane. Alternatively, you could have used the keyboard short cut of Command + return (Mac) or ctrl + enter (PC).

You also have the option of running multiple lines of code by highlighting the lines of code and clicking Run or using the keyboard short cut.

After running the command, the result will show up in your Console pane, which is located beneath the Script pane. In the screenshot below, you can see that our 2+2 command has generated an answer of 4.

Basic R Script

Basic R Script

Continue to practice writting an R script by reproducing the code depicted below. As the comments in the pictured code indicate, we are loading/reading-in the arbuthnot dataset. Then we are taking a look at some of the observations from the dataset by using the functions head() and tail(). Make sure to run the lines of code in order, otherwise the the software will return error messages. We suggest running one line at a time to ensure your code is typed correctly and to see what each line is doing.

Look at Data

Look at Data

Notice the copious usage of comments in our analytic code. In general, analytic code should make liberal use of comments. The length and specificity of comments depends on a person’s experience with a coding language. With experience comes the understanding and ability to write concise comments that cut directly to what information is absolutely necessary to communicate. It never hurts to have more comments than actual executable code, especially for those new to coding. Keep in mind that in the future you might want to share analytic code with co-workers or peers, or go back to reference code months or years after you’ve written it.


0.1.3 Summary

The essential workflow that should be followed for each lab:

  1. Create and work in an R project to ensure all work is kept in a single location.
  2. Organize your work within an R script.
    • Use comments to clearly communicate what the code is doing.
    • Use white space (empty lines and spaces) to make the document easier to read and navigate.

There are many other features of RStudio that you’ll find useful, but are not covered in this document. RStudio strives to provide help in many different ways:

  1. The help tab within the lower right-hand pane.
  2. Help automatically appears when typing a function (auto-complete).
  3. You may begin by typing the name of function and it will suggest several options.