2 Review

🚩 Pre-Class Learnings

To prepare for this lesson, do the followings:

Read File Organization Appendix
Read Git & GitHub Appendix
Read Keyboard Shortcut Appendix
Open your portfolio GitHub repository and note how the files are organized

🔥 Data Story Critique

Go to https://rhythm-of-food.net/#explore-foods then answer the following questions:

What is the data story?
What is effective?
What could be improved?

ICA Instructions

Before starting, review the ICA Instructions ⭐ for details on pair programming and activity procedures.

🧩 Learning Goals

By the end of this lesson, you should be able to:

Navigate the file system of your portfolio and homework repositories
Explain the difference between absolute and relative file paths and why relative file paths are preferred when referencing files
Construct relative file paths to read in data
Review data wrangling from COMP/STAT 112
Use the git verbs (via Github.com and Github Desktop): clone, add, commit, push, and pull to interact with your repositories
Practice using keyboard shortcuts

Exercise

Download the code and data files linked in the Code Links area at the end of the table of contents of the page. Move the code file to ica folder in your portfolio repository and the data file to ica\data\raw folder. Open the code file and follow the instructions.

Solutions

Load packages and read in data.

Solution

library(tidyverse)
weather <- read_csv("data/raw/weather.csv")

Clean the PrecipYr by replacing 99999 with NA.

Solution

weather_clean <- weather |> 
    mutate(PrecipYr = na_if(PrecipYr, 99999))

Add dateInYear variable.

Solution

#Option 1
weather_clean <- weather_clean |> 
    arrange(Month, Day) |> 
    mutate(dateInYear = 1:365)

#Option 2
weather_clean <- weather_clean |> 
    mutate(dateInYear = yday(mdy(date)))

Add in 3-letter month abbreviations.

Solution

# Option 1: via joins
months <- tibble(
    Month = 1:12,
    month_name = month.abb
)
weather_clean <- weather_clean |> 
    left_join(months)

# Option 2: via vector subsetting
weather |> 
    mutate(month_name = month.abb[Month]) |> head()

# A tibble: 6 × 19
  Month   Day   Low  High NormalLow NormalHigh RecordLow LowYr RecordHigh HiYear
  <dbl> <dbl> <dbl> <dbl>     <dbl>      <dbl>     <dbl> <dbl>      <dbl>  <dbl>
1    11    20    48    55        48         62        35  1964         69   2005
2     6    16    52    68        53         70        46  1952         90   1961
3     5     9    47    63        50         66        41  1950         88   1993
4    10    26    47    69        52         69        39  1954         89   2003
5     9    27    55    82        55         73        47  1955         96   2010
6     7     6    52    70        54         71        47  1953         86   1957
# ℹ 9 more variables: Precip <dbl>, RecordPrecip <dbl>, PrecipYr <dbl>,
#   date <chr>, Record <lgl>, RecordText <chr>, RecordP <lgl>, CulmPrec <dbl>,
#   month_name <chr>

Write out clean data to a CSV file.

Solution

write_csv(weather_clean, file = "data/processed/weather_clean.csv")

--- title: "2 Review" logo: "../images/mac.png" format: html: code-links: - text: Code File icon: file-code href: file_org_activity/code/cleaning.qmd - text: Data File icon: database href: data/raw/weather.csv --- {{< include _prep-front_matter.qmd >}} - Read [File Organization Appendix](../notes/file-organization.qmd) - Read [Git & GitHub Appendix](../notes/gitgithub.qmd) - Read [Keyboard Shortcut Appendix](../notes/keyboard.qmd) - Open your portfolio GitHub repository and note how the files are organized {{< include _data_story_critique-front_matter.qmd >}} Go to <https://rhythm-of-food.net/#explore-foods> then answer the following questions: - What is the data story? - What is effective? - What could be improved? {{< include _ics_instructions-reminder.qmd >}} {{< include _learning_goals-front_matter.qmd >}} - Navigate the file system of your portfolio and homework repositories - Explain the difference between absolute and relative file paths and why relative file paths are preferred when referencing files - Construct relative file paths to read in data - Review data wrangling from COMP/STAT 112 - Use the `git` verbs (via Github.com and Github Desktop): `clone`, `add`, `commit`, `push`, and `pull` to interact with your repositories - Practice using [keyboard shortcuts](../notes/keyboard.qmd) ## Exercise Download the code and data files linked in the **Code Links** area at the end of the table of contents of the page. Move the code file to `ica` folder in your portfolio repository and the data file to `ica\data\raw` folder. Open the code file and follow the instructions. ## Solutions 1. Load packages and read in data. <details> <summary>Solution</summary> ```{r solutions_read} #| eval: true #| echo: true #| warning: false #| message: false library(tidyverse) weather <- read_csv("data/raw/weather.csv") ``` </details> 2. Clean the `PrecipYr` by replacing 99999 with `NA`. <details> <summary>Solution</summary> ```{r solutions_clean} #| eval: true #| echo: true weather_clean <- weather |> mutate(PrecipYr = na_if(PrecipYr, 99999)) ``` </details> 3. Add `dateInYear` variable. <details> <summary>Solution</summary> ```{r solutions_wrangle1} #| eval: true #| echo: true #Option 1 weather_clean <- weather_clean |> arrange(Month, Day) |> mutate(dateInYear = 1:365) ``` ```{r solutions_wrangle1-5} #| eval: true #| echo: true #Option 2 weather_clean <- weather_clean |> mutate(dateInYear = yday(mdy(date))) ``` </details> 4. Add in 3-letter month abbreviations. <details> <summary>Solution</summary> ```{r solutions_wrangle2} #| eval: true #| echo: true # Option 1: via joins months <- tibble( Month = 1:12, month_name = month.abb ) weather_clean <- weather_clean |> left_join(months) # Option 2: via vector subsetting weather |> mutate(month_name = month.abb[Month]) |> head() ``` </details> 5. Write out clean data to a CSV file. <details> <summary>Solution</summary> ```{r solutions_write} #| eval: true #| echo: true write_csv(weather_clean, file = "data/processed/weather_clean.csv") ``` </details>