1 Introduction

🚩 Pre-Class Learnings

To prepare for this lesson, do the followings:

🧩 Learning Goals

By the end of this lesson, you should be able to:

  • Use and evaluate different sources of information to find potential answers
  • Distinguish between high- and low-level of understanding
  • Determine which source of information is suitable for each level of understanding
  • Determine which level of understanding is suitable for this class
  • Review data wrangling verbs introduced in COMP/STAT 112

Sources of Information

Let’s practice how to use different sources of information to find potential answers. Use the sources below to find (a) how a data frame in R is defined and (b) how the sources differ in their definitions.

A data frame in R is a named list with elements of all the same length.

Levels of Understanding

There are two levels of understanding:

  • high-level (intuitive) understanding necessary to implement common tasks
  • low-level (foundational) understanding necessary to tackle new problems and come up with new solutions

In this class, We are going to strive for low-level (foundational) understanding. You should start with a high-level (intuitive) understanding, eg, using LLM, and then dig deeper to get the details, eg, using documentation and more advanced textbooks.

Data Wrangling Verbs

Below are some of the wrangling verbs introduced in COMP/STAT 112. Please, spend few minutes to review them.

  • mutate(): creates/changes columns/elements in a data frame/tibble
  • select(): keeps subset of columns/elements in a data frame/tibble
  • filter(): keeps subsets of rows in a data frame/tibble
  • arrange(): sorts rows in a data frame/tibble
  • group_by(): internally groups rows in data frame/tibble by values in 1 or more columns/elements
  • summarize(): collapses/combines information across rows using functions such as n(), sum(), mean(), min(), max(), median(), sd()
  • count(): shortcut for group_by() |> summarize(n = n())
  • left_join(): mutating join of two data frames/tibbles keeping all rows in left data frame
  • full_join(): mutating join of two data frames/tibbles keeping all rows in both data frames
  • inner_join(): mutating join of two data frames/tibbles keeping rows in left data frame that find match in right
  • semi_join(): filtering join of two data frames/tibbles keeping rows in left data frame that find match in right
  • anti_join(): filtering join of two data frames/tibbles keeping rows in left data frame that do not find match in right
  • pivot_wider(): rearrange values from two columns to many(one column becomes the names of new variables, one column becomes the values of the new variables)
  • pivot_longer(): rearrange values from many columns to two (the names of the columns go to one new variable, the values of the columns go to a second new variable)