2 Review

🚩 Pre-Class Learnings

To prepare for this lesson, do the followings:

🔥 Data Story Critique

Go to https://rhythm-of-food.net/#explore-foods then answer the following questions:

  • What is the data story?
  • What is effective?
  • What could be improved?
ImportantICA Instructions

Before starting, review the ICA Instructions ⭐ for details on pair programming and activity procedures.

🧩 Learning Goals

By the end of this lesson, you should be able to:

  • Navigate the file system of your portfolio and homework repositories
  • Explain the difference between absolute and relative file paths and why relative file paths are preferred when referencing files
  • Construct relative file paths to read in data
  • Review data wrangling from COMP/STAT 112
  • Use the git verbs (via Github.com and Github Desktop): clone, add, commit, push, and pull to interact with your repositories
  • Practice using keyboard shortcuts

Exercise

Download the code and data files linked in the Code Links area at the end of the table of contents of the page. Move the code file to ica folder in your portfolio repository and the data file to ica\data\raw folder. Open the code file and follow the instructions.

Solutions

  1. Load packages and read in data.
Solution
library(tidyverse)
weather <- read_csv("data/raw/weather.csv")
  1. Clean the PrecipYr by replacing 99999 with NA.
Solution
weather_clean <- weather |> 
    mutate(PrecipYr = na_if(PrecipYr, 99999))
  1. Add dateInYear variable.
Solution
#Option 1
weather_clean <- weather_clean |> 
    arrange(Month, Day) |> 
    mutate(dateInYear = 1:365)
#Option 2
weather_clean <- weather_clean |> 
    mutate(dateInYear = yday(mdy(date)))
  1. Add in 3-letter month abbreviations.
Solution
# Option 1: via joins
months <- tibble(
    Month = 1:12,
    month_name = month.abb
)
weather_clean <- weather_clean |> 
    left_join(months)

# Option 2: via vector subsetting
weather |> 
    mutate(month_name = month.abb[Month]) |> head()
# A tibble: 6 × 19
  Month   Day   Low  High NormalLow NormalHigh RecordLow LowYr RecordHigh HiYear
  <dbl> <dbl> <dbl> <dbl>     <dbl>      <dbl>     <dbl> <dbl>      <dbl>  <dbl>
1    11    20    48    55        48         62        35  1964         69   2005
2     6    16    52    68        53         70        46  1952         90   1961
3     5     9    47    63        50         66        41  1950         88   1993
4    10    26    47    69        52         69        39  1954         89   2003
5     9    27    55    82        55         73        47  1955         96   2010
6     7     6    52    70        54         71        47  1953         86   1957
# ℹ 9 more variables: Precip <dbl>, RecordPrecip <dbl>, PrecipYr <dbl>,
#   date <chr>, Record <lgl>, RecordText <chr>, RecordP <lgl>, CulmPrec <dbl>,
#   month_name <chr>
  1. Write out clean data to a CSV file.
Solution
write_csv(weather_clean, file = "data/processed/weather_clean.csv")