Activity 1 Part 1: Loading Data in RStudio

The Goal

The first step in most statistics competitions is loading the data into R. Typically, the data is in the form of a CSV file. Today, we are going to talk about how to load data from such a file into R.

The Data

The data we will be using for many of our activities was created by Climate Change AI (CCAI) and Lawrence Berkeley National Laboratory (Berkeley Lab). This is a large data set and was used in March 2022 for the 5th Annual Women in Data Science (WiDS) Datathon. We encourage you to explore the resources offered by WiDS!

Data Citation

Climate Change AI (CCAI) and Lawrence Berkeley National Laboratory (Berkeley Lab). (2022 January). WiDS Datathon 2022, Version 1. Retrieved January 10, 2022 from https://www.kaggle.com/competitions/widsdatathon2022/overview.

Getting the Data

The first thing we need to do is get the data. Because the data set is very large, it is contained in a Google Drive file linked here. It is very common during DataFest for data to be shared in this way, so it is good to get some practice downloading data from Google Drive.

If the Drive link does not work, you can copy and paste the following into your browser: https://drive.google.com/file/d/1hM08da1DH21P5I126FyEoKJ0OwIElFUo

Because the data set is so large, it is housed in a ZIP file. Download the ZIP file, and then extract the files that are inside of it. The one your are looking for is called train.csv

Create a folder on your computer called DataFestPrep, and store this csv file inside the folder.

Moving the data into RStudio

There are two ways that you can choose to access RStudio - on your own computer, or on RStudio Cloud. There are instructions below for each of these two options.

Option 1: For Folks Using RStudio on their Laptops

  • Step 1 Open RStudio.
  • Step 2 Look at the upper right hand panel of your screen.
  • Step 3 Find “Import Dataset” or “Import” and click on it.
  • Step 4 Choose “Text (base)” or “From CSV” (it will depend on your computer).
  • Step 5 Find your data (train.csv) in the list that comes up. Choose it!

The above 5 steps will allow you to load the data into RStudio. However, if you are using Markdown, you need a few more steps.

  • Step 6 Now, look the bottom right hand panel of your screen. You should see a line of code with something like train <- read.csv(“train.csv”) or train <- read_csv(“train.csv”)`.
  • Step 7 Copy that ENTIRE line of code.
  • Step 8 Open a Markdown file.
  • Step 9 Insert a code chunk (Look to the upper right corner of the file and find the little green C icon. Find “R” on the drop down menu, and click it!)
  • Step 10 You will notice a gray box appears on your Markdown file. This is called a chunk, and is basically a spot where we can type code.
  • Step 11 Paste the line from Step 7 into this gray code chunk, and press the green arrow (the play button).

And you are ready to go!! You now have the data loaded and you can work with it!

Option 2: For Folks Using RStudio Cloud

  • Step 1 Open RStudio Cloud.
  • Step 2 Look at the lower right hand quadrant of your RStudio screen./li>
  • Step 3 At the top of that square, look for “Upload”.
  • Step 4 Click Upload, and then browse for where you have stored the train.csv data on your computer.
  • Step 5 Once you have found the file, hit “Import”. This moves the data into RStudio Cloud, but you can’t work with it yet.
  • Step 6 Now, look at the upper right hand panel of your screen.
  • Step 7 Find “Import Dataset” and click on it.
  • Step 8 - Choose “Text (base)” or “From CSV” (it will depend on your computer).
  • Step 9 Find the data you want (train.csv) in the list that comes up. Choose it!

The above 5 steps will allow you to load the data into RStudio. However, if you are using Markdown, you need a few more steps.

  • Step 10 Now, look the bottom right hand panel of your screen. You should see a line of code with something like train <- read.csv(“train.csv”) or train <- read_csv(“train.csv”)`.
  • Step 11 Copy that ENTIRE line of code.
  • Step 12 Open a Markdown file.
  • Step 13 Insert a code chunk (Look to the upper right corner of the file and find the little green C icon. Find “R” on the drop down menu, and click it!)
  • Step 14 You will notice a gray box appears on your Markdown file. This is called a chunk, and is basically a spot where we can type code.
  • Step 15 Paste the line from Step 11 into this gray code chunk, and press the green arrow (the play button).