Practical Biological Data Analysis: Class 3: Data wrangling with the tidyverse

The Rmarkdown for this class is on github

Introduction to the tidyverse

The tidyverse is a collection of packages that share similar design philosophy, syntax, and data structures. The packages are largely developed by the same team that builds Rstudio.

Some key packages that we will touch on in this course:

readr: functions for data import and export
ggplot2: plotting based on the “grammar of graphics”
dplyr: functions to manipulate tabular data
tidyr: functions to help reshape data into a tidy format
stringr: functions for working with strings
tibble: a redesigned data.frame

loading R packages

To use an R package in an analysis we need to load the package using the library() function. This needs to be done once in each R session and it is a good idea to do this at the beginning of your Rmarkdown. For teaching purposes I will however sometimes load a package when I introduce a function from a package.

library(readr)
library(dplyr)
library(tibble)

tibble versus data.frame

A tibble is a re-imagining of the base R data.frame. It has a few differences from the data.frame.The biggest differences are that it doesn’t have row.names and it has an enhanced print method. If interested in learning more, see the tibble vignette.

Compare data_df to data_tbl.

data_df <- data.frame(a = 1:3, 
                      b = letters[1:3], 
                      c = c(TRUE, FALSE, TRUE), 
                      row.names = c("ob_1", "ob_2", "ob_3"))
data_df

data_tbl <- as_tibble(data_df)
data_tbl

When you work with tidyverse functions it is a good practice to convert data.frames to tibbles. In practice many functions will work interchangeably with either base data.frames or tibble, provided that they don’t use row names.

Converting a base R data.frame to a tibble

If a data.frame has row names, you can preserve these by moving them into a column before converting to a tibble using the rownames_to_column() from tibble.

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars_tbl <- rownames_to_column(mtcars, "vehicle")
mtcars_tbl <- as_tibble(mtcars_tbl)
mtcars_tbl

# A tibble: 32 × 12
   vehicle       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2 Mazda RX4 …  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5 Hornet Spo…  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7 Duster 360   14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ 22 more rows

If you don’t need the rownames, then you can use the as_tibble() function directly.

mtcars_tbl <- as_tibble(mtcars)

Data import

So far we have only worked with built in or hand generated datasets, now we will discuss how to read data files into R.

The readr package provides a series of functions for importing or writing data in common text formats.

read_csv(): comma-separated values (CSV) files
read_tsv(): tab-separated values (TSV) files
read_delim(): delimited files (CSV and TSV are important special cases)
read_fwf(): fixed-width files
read_table(): whitespace-separated files

These functions are quicker and have better defaults than the base R equivalents (e.g. read.table or read.csv). These functions also directly output tibbles rather than base R data.drames

The readr checksheet provides a concise overview of the functionality in the package.

To illustrate how to use readr we will load a .csv file containing information about airline flights from 2014.

First we will download the data files. You can download this data manually from github. However we will use R to download the dataset using the download.file() base R function.

# test if file exists, if it doesn't then download the file.
if(!file.exists("flights14.csv")) {
  file_url <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv" 
  download.file(file_url, "flights14.csv")
}

You should now have a file called “flights14.csv” in your working directory (the same directory as the Rmarkdown). To read this data into R, we can use the read_csv() function. The defaults for this function often work for many datasets.

flights <- read_csv("flights14.csv")
flights

# A tibble: 253,316 × 11
    year month   day dep_delay arr_delay carrier origin dest  air_time distance
   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2014     1     1        14        13 AA      JFK    LAX        359     2475
 2  2014     1     1        -3        13 AA      JFK    LAX        363     2475
 3  2014     1     1         2         9 AA      JFK    LAX        351     2475
 4  2014     1     1        -8       -26 AA      LGA    PBI        157     1035
 5  2014     1     1         2         1 AA      JFK    LAX        350     2475
 6  2014     1     1         4         0 AA      EWR    LAX        339     2454
 7  2014     1     1        -2       -18 AA      JFK    LAX        338     2475
 8  2014     1     1        -3       -14 AA      JFK    LAX        356     2475
 9  2014     1     1        -1       -17 AA      JFK    MIA        161     1089
10  2014     1     1        -2       -14 AA      JFK    SEA        349     2422
# ℹ 253,306 more rows
# ℹ 1 more variable: hour <dbl>

There are a few commonly used arguments:

col_names: if the data doesn’t have column names, you can provide them (or skip them).

col_types: set this if the data type of a column is incorrectly inferred by readr

comment: if there are comment lines in the file, such as a header line prefixed with #, you want to skip, set this to #.

skip: # of lines to skip before reading in the data.

n_max: maximum number of lines to read, useful for testing reading in large datasets.

The readr functions will also automatically uncompress gzipped or zipped datasets, and additionally can read data directly from a URL.

read_csv("https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv")

There are equivalent functions for writing data.frames from R to files: write_csv, write_tsv, write_delim.

Data import/export for excel files

The readxl package can read data from excel files and is included in the tidyverse. The read_excel() function is the main function for reading data.

The openxlsx package, which is not part of tidyverse but is on CRAN, can write excel files. The write.xlsx() function is the main function for writing data to excel spreadsheets.

Data import/export of R objects

Often it is useful to store R objects as files on disk so that the R objects can be reloaded into R. These could be large processed datasets, intermediate results, or complex data structures that are not easily stored in rectangular text formats such as csv files.

R provides the saveRDS() and readRDS() functions for storing and retrieving data in binary formats.

saveRDS(flights, "flights.rds") # save single object into a file
df <- readRDS("flights.rds") # read object back into R
df

# A tibble: 253,316 × 11
    year month   day dep_delay arr_delay carrier origin dest  air_time distance
   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2014     1     1        14        13 AA      JFK    LAX        359     2475
 2  2014     1     1        -3        13 AA      JFK    LAX        363     2475
 3  2014     1     1         2         9 AA      JFK    LAX        351     2475
 4  2014     1     1        -8       -26 AA      LGA    PBI        157     1035
 5  2014     1     1         2         1 AA      JFK    LAX        350     2475
 6  2014     1     1         4         0 AA      EWR    LAX        339     2454
 7  2014     1     1        -2       -18 AA      JFK    LAX        338     2475
 8  2014     1     1        -3       -14 AA      JFK    LAX        356     2475
 9  2014     1     1        -1       -17 AA      JFK    MIA        161     1089
10  2014     1     1        -2       -14 AA      JFK    SEA        349     2422
# ℹ 253,306 more rows
# ℹ 1 more variable: hour <dbl>

If you want to save/load multiple objects you can use save() and load().

save(flights, df, file = "robjs.rda")  # save flight_df and df

load() will load the data into the environment with the same objects names used when saving the objects.

rm(flights, df)
load("robjs.rda")

Exploring data

View() can be used to open an excel like view of a data.frame. This is a good way to quickly look at the data. glimpse() or str() give an additional view of the data.

View(flights)
str(flights)
glimpse(flights)

Additional R functions to help with exploring data.frames (and tibbles):

dim(flights) # of rows and columns
nrow(flights)
ncol(flights)

head(flights) # first 6 lines
tail(flights) # last 6 lines

colnames(flights) # column names
rownames(flights) # row names (not present in tibble)

Useful base R functions for exploring values

summary(flights$distance) # get summary stats on column

unique(flights$carrier) # find unique values in column cyl

table(flights$carrier) # get frequency of each value in column cyl
table(flights$origin, flights$dest) # get frequency of each combination of values

dplyr, a grammar for data manipulation

Base R versus dplyr

In the first two lectures we introduced how to subset vectors, data.frames, and matrices using base R functions. These approaches are flexible, succinct, and stable, meaning that these approaches will be supported and work in R in the future.

Some criticisms of using base R are that the syntax is hard to read, it tends to be verbose, and it is difficult to learn. dplyr, and other tidyverse packages, offer alternative approaches which many find easier to use.

Some key differences between base R and the approaches in dplyr (and tidyverse)

Use of the tibble version of data.frame
dplyr functions operate on data.frame/tibbles rather than individual vectors
dplyr allows you to specify column names without quotes
dplyr uses different functions (verbs) to accomplish the various tasks performed by the bracket [ base R syntax
dplyr and related functions recognized “grouped” operations on data.frames, enabling operations on different groups of rows in a data.frame

dplyr function overview

dplyr provides a suite of functions for manipulating data in tibbles.

Operations on Rows:
- filter() chooses rows based on column values
- arrange() changes the order of the rows
- distinct() selects distinct/unique rows
- slice() chooses rows based on location

Operations on Columns:
- select() changes whether or not a column is included
- rename() changes the name of columns
- mutate() changes the values of columns and creates new columns

Operations on groups of rows:
- summarise() collapses a group into a single row

Filter rows

Returning to our flights data. Let’s use filter() to select certain rows.

filter(tibble, <expression that produces a logical vector>, ...)

filter(flights, dest == "LAX") # select rows where the `dest` column is equal to `LAX

# A tibble: 14,434 × 11
    year month   day dep_delay arr_delay carrier origin dest  air_time distance
   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2014     1     1        14        13 AA      JFK    LAX        359     2475
 2  2014     1     1        -3        13 AA      JFK    LAX        363     2475
 3  2014     1     1         2         9 AA      JFK    LAX        351     2475
 4  2014     1     1         2         1 AA      JFK    LAX        350     2475
 5  2014     1     1         4         0 AA      EWR    LAX        339     2454
 6  2014     1     1        -2       -18 AA      JFK    LAX        338     2475
 7  2014     1     1        -3       -14 AA      JFK    LAX        356     2475
 8  2014     1     1       142       133 AA      JFK    LAX        345     2475
 9  2014     1     1        -4        11 B6      JFK    LAX        349     2475
10  2014     1     1         3       -10 B6      JFK    LAX        349     2475
# ℹ 14,424 more rows
# ℹ 1 more variable: hour <dbl>

filter(flights, arr_delay > 200) # flights with arr_delay > 200
filter(flights, distance < 100) # flights less than 100 miles
filter(flights, year != 2014) # if no rows satisfy condition, then an empty tibble

Multiple conditions can be used to select rows. For example we can select rows where the dest column is equal to LAX and the origin is equal to EWR. You can either use the & operator, or supply multiple arguments.

filter(flights, dest == "LAX", origin == "EWR")
filter(flights, dest == "LAX" & origin == "EWR")

We can select rows where the dest column is equal to LAX or the origin is equal to EWR using the | operator.

filter(flights, dest == "LAX" | origin == "EWR")

The %in% operator is useful for identifying rows with entries matching those in a vector of possibilities.

filter(flights, dest %in% c("LAX", "SLC", "SFO"))
filter(flights, !dest %in% c("LAX", "SLC", "SFO")) # ! will negate

Try it out:

Use filter to find flights to DEN with a delayed departure (dep_delay).

...

arrange rows

arrange() can be used to sort the data based on values in a single column or multiple columns

arrange(tibble, <columns_to_sort_by>)

For example, let’s find the flight with the shortest amount of air time by arranging the table based on the air_time (flight time in minutes).

arrange(flights, air_time, distance) # sort first on air_time, then on distance

 # to sort in decreasing order, wrap the column name in `desc()`.
arrange(flights, desc(air_time), distance)

Try it out:

Use arrange to determine which flight has the shortest distance?

Column operations

select columns

select() is a simple function that subsets the tibble to keep certain columns.

select(tibble, <columns_to_keep>)

select(flights, origin, dest)

# A tibble: 253,316 × 2
   origin dest 
   <chr>  <chr>
 1 JFK    LAX  
 2 JFK    LAX  
 3 JFK    LAX  
 4 LGA    PBI  
 5 JFK    LAX  
 6 EWR    LAX  
 7 JFK    LAX  
 8 JFK    LAX  
 9 JFK    MIA  
10 JFK    SEA  
# ℹ 253,306 more rows

the : operator can select a range of columns, such as the columns from air_time to hour. The ! operator selects columns not listed.

select(flights, air_time:hour)
select(flights, !(air_time:hour))

There is a suite of utilities in the tidyverse to help with select columns with names that: matches(), starts_with(), ends_with(), contains(), any_of(), and all_of(). everything() is also useful as a placeholder for all columns not explicitly listed. See help ?select

# keep columns that have "delay" in the name
select(flights, contains("delay"))

# select all columns except carrier
select(flights, -carrier)

# reorder columns so that distance and hour are first columns
select(flights, starts_with("di"), ends_with("ay"))

When to quote or not quote?

In general, when working with the tidyverse, you don’t need to quote the names of columns. In the example above, we needed quotes because “delay” is not a column name in the flights tibble.

Adding new columns with mutate

mutate() allows you to add new columns to the tibble.

mutate(tibble, new_column_name = expression, ...)

mutate(flights, total_delay = dep_delay + arr_delay)

# A tibble: 253,316 × 12
    year month   day dep_delay arr_delay carrier origin dest  air_time distance
   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2014     1     1        14        13 AA      JFK    LAX        359     2475
 2  2014     1     1        -3        13 AA      JFK    LAX        363     2475
 3  2014     1     1         2         9 AA      JFK    LAX        351     2475
 4  2014     1     1        -8       -26 AA      LGA    PBI        157     1035
 5  2014     1     1         2         1 AA      JFK    LAX        350     2475
 6  2014     1     1         4         0 AA      EWR    LAX        339     2454
 7  2014     1     1        -2       -18 AA      JFK    LAX        338     2475
 8  2014     1     1        -3       -14 AA      JFK    LAX        356     2475
 9  2014     1     1        -1       -17 AA      JFK    MIA        161     1089
10  2014     1     1        -2       -14 AA      JFK    SEA        349     2422
# ℹ 253,306 more rows
# ℹ 2 more variables: hour <dbl>, total_delay <dbl>

We can’t see the new column, so we add a select command to examine the columns of interest.

mutate(flights, total_delay = dep_delay + arr_delay) |> 
  select(dep_delay, arr_delay, total_delay)

# A tibble: 253,316 × 3
   dep_delay arr_delay total_delay
       <dbl>     <dbl>       <dbl>
 1        14        13          27
 2        -3        13          10
 3         2         9          11
 4        -8       -26         -34
 5         2         1           3
 6         4         0           4
 7        -2       -18         -20
 8        -3       -14         -17
 9        -1       -17         -18
10        -2       -14         -16
# ℹ 253,306 more rows

Multiple new columns can be made, and you can refer to columns made in preceding statements.

mutate(flights, 
       delay = dep_delay + arr_delay,
       delay_in_hours = delay / 60) |> 
  select(delay, delay_in_hours)

Try it out:

Calculate the flight time (air_time) in hours rather than in minutes, add as a new column.

mutate(flights, flight_time = air_time / 60)

# A tibble: 253,316 × 12
    year month   day dep_delay arr_delay carrier origin dest  air_time distance
   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2014     1     1        14        13 AA      JFK    LAX        359     2475
 2  2014     1     1        -3        13 AA      JFK    LAX        363     2475
 3  2014     1     1         2         9 AA      JFK    LAX        351     2475
 4  2014     1     1        -8       -26 AA      LGA    PBI        157     1035
 5  2014     1     1         2         1 AA      JFK    LAX        350     2475
 6  2014     1     1         4         0 AA      EWR    LAX        339     2454
 7  2014     1     1        -2       -18 AA      JFK    LAX        338     2475
 8  2014     1     1        -3       -14 AA      JFK    LAX        356     2475
 9  2014     1     1        -1       -17 AA      JFK    MIA        161     1089
10  2014     1     1        -2       -14 AA      JFK    SEA        349     2422
# ℹ 253,306 more rows
# ℹ 2 more variables: hour <dbl>, flight_time <dbl>

Summarizing columns

summarize() is a function that will collapse the data from a column into a summary value based on a function that takes a vector and returns a single value (e.g. mean(), sum(), median()). It is not very useful yet, but will be very powerful when we discuss grouped operations.

summarize(flights, 
          avg_arr_delay = mean(arr_delay),
          med_air_time = median(air_time))

# A tibble: 1 × 2
  avg_arr_delay med_air_time
          <dbl>        <dbl>
1          8.15          134

Grouped operations

All of the functionality described above can be easily expressed in base R syntax (see examples here). However, where dplyr really shines is the ability to apply the functions above to groups of data within each data frame.

We can establish groups within the data using group_by(). The functions mutate(), summarize(), and optionally arrange() will instead operate on each group independently rather than all of the rows.

Common approaches: group_by -> summarize: calculate summaries per group group_by -> mutate: calculate summaries per group and add as new column to original tibble

group_by(tibble, <columns_to_establish_groups>)

group_by(flights, carrier) # notice the new "Groups:" metadata. 

# calculate average dep_delay per carrier
group_by(flights, carrier) |> 
  summarize(avg_dep_delay = mean(dep_delay)) 

# calculate average arr_delay per carrier at each airport
group_by(flights, carrier, origin) |> 
  summarize(avg_dep_delay = mean(dep_delay)) 

# calculate # of flights between each origin and destination city, per carrier, and average air time.
 # n() is a special function that returns the # of rows per group
group_by(flights, carrier, origin, dest) |>
  summarize(n_flights = n(),
            mean_air_time = mean(air_time))

Here are some questions that we can answer using grouped operations in a few lines of dplyr code.

What is the average flight air_time between each origin airport and destination airport?

group_by(flights, origin, dest) |> 
  summarize(avg_air_time = mean(air_time))

# A tibble: 221 × 3
# Groups:   origin [3]
   origin dest  avg_air_time
   <chr>  <chr>        <dbl>
 1 EWR    ALB           31.4
 2 EWR    ANC          424. 
 3 EWR    ATL          111. 
 4 EWR    AUS          210. 
 5 EWR    AVL           89.7
 6 EWR    AVP           25  
 7 EWR    BDL           25.4
 8 EWR    BNA          115. 
 9 EWR    BOS           40.1
10 EWR    BQN          197. 
# ℹ 211 more rows

Which cites take the longest (air_time) to fly between between on average? the shortest?

group_by(flights, origin, dest) |> 
  summarize(avg_air_time = mean(air_time)) |> 
  arrange(desc(avg_air_time)) |> 
  head(1)

# A tibble: 1 × 3
# Groups:   origin [1]
  origin dest  avg_air_time
  <chr>  <chr>        <dbl>
1 JFK    HNL           625.

group_by(flights, origin, dest) |> 
  summarize(avg_air_time = mean(air_time)) |> 
  arrange(avg_air_time) |> 
  head(1)

# A tibble: 1 × 3
# Groups:   origin [1]
  origin dest  avg_air_time
  <chr>  <chr>        <dbl>
1 EWR    AVP             25

Try it out:

Which carrier has the fastest flight (air_time) on average from JFK to LAX?

Which month has the longest departure delays on average when flying from JFK to HNL?

String manipulation

stringr is a package for working with strings (i.e. character vectors). It provides a consistent syntax for string manipulation and can perform many routine tasks:

str_c: concatenate strings (similar to paste() in base R)
str_count: count occurrence of a substring in a string
str_subset: keep strings with a substring
str_replace: replace a string with another string
str_split: split a string into multiple pieces based on a string

library(stringr)
some_words <- c("a sentence", "with a ", "needle in a", "haystack")
str_detect(some_words, "needle") # use with dplyr::filter
str_subset(some_words, "needle")

str_replace(some_words, "needle", "pumpkin")
str_replace_all(some_words, "a", "A")

str_c(some_words, collapse = " ")

str_c(some_words, " words words words", " anisfhlsdihg")

str_count(some_words, "a")
str_split(some_words, " ")

stringr uses regular expressions to pattern match strings. This means that you can perform complex matching to the strings of interest. Additionally this means that there are special characters with behaviors that may be surprising if you are unaware of regular expressions.

A useful resource when using regular expressions is https://regex101.com

complex_strings <- c("10101-howdy", "34-world", "howdy-1010", "world-.")
# keep words with a series of #s followed by a dash, + indicates one or more occurrences.
str_subset(complex_strings, "[0-9]+-") 

# keep words with a dash followed by a series of #s
str_subset(complex_strings, "-[0-9]+") 

str_subset(complex_strings, "^howdy") # keep words starting with howdy
str_subset(complex_strings, "howdy$") # keep words ending with howdy
str_subset(complex_strings, ".") # . signifies any character
str_subset(complex_strings, "\\.") # need to use backticks to match literal special character

Let’s use dplyr and stringr together.

Which destinations contain an “LL” in their 3 letter code?

library(stringr)
filter(flights, str_detect(dest, "LL")) |> 
  select(dest) |> 
  unique()

# A tibble: 1 × 1
  dest 
  <chr>
1 FLL

Which 3-letter destination codes start with H?

filter(flights, str_detect(dest, "^H")) |> 
  select(dest) |> 
  unique()

# A tibble: 4 × 1
  dest 
  <chr>
1 HOU  
2 HNL  
3 HDN  
4 HYA

Let’s make a new column that combines the origin and dest columns.

mutate(flights, new_col = str_c(origin, ":", dest)) |> 
  select(new_col, everything())

# A tibble: 253,316 × 12
   new_col  year month   day dep_delay arr_delay carrier origin dest  air_time
   <chr>   <dbl> <dbl> <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>    <dbl>
 1 JFK:LAX  2014     1     1        14        13 AA      JFK    LAX        359
 2 JFK:LAX  2014     1     1        -3        13 AA      JFK    LAX        363
 3 JFK:LAX  2014     1     1         2         9 AA      JFK    LAX        351
 4 LGA:PBI  2014     1     1        -8       -26 AA      LGA    PBI        157
 5 JFK:LAX  2014     1     1         2         1 AA      JFK    LAX        350
 6 EWR:LAX  2014     1     1         4         0 AA      EWR    LAX        339
 7 JFK:LAX  2014     1     1        -2       -18 AA      JFK    LAX        338
 8 JFK:LAX  2014     1     1        -3       -14 AA      JFK    LAX        356
 9 JFK:MIA  2014     1     1        -1       -17 AA      JFK    MIA        161
10 JFK:SEA  2014     1     1        -2       -14 AA      JFK    SEA        349
# ℹ 253,306 more rows
# ℹ 2 more variables: distance <dbl>, hour <dbl>

Show session info

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Denver
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.5.1 tibble_3.2.1  dplyr_1.1.3   readr_2.1.4  

loaded via a namespace (and not attached):
 [1] bit_4.0.5         jsonlite_1.8.7    compiler_4.3.1    crayon_1.5.2     
 [5] tidyselect_1.2.0  parallel_4.3.1    jquerylib_0.1.4   yaml_2.3.7       
 [9] fastmap_1.1.1     R6_2.5.1          generics_0.1.3    knitr_1.45       
[13] distill_1.6       bslib_0.5.1       pillar_1.9.0      tzdb_0.4.0       
[17] rlang_1.1.2       utf8_1.2.4        cachem_1.0.8      stringi_1.8.1    
[21] xfun_0.41         sass_0.4.7        bit64_4.0.5       memoise_2.0.1    
[25] cli_3.6.1         withr_2.5.2       magrittr_2.0.3    digest_0.6.33    
[29] vroom_1.6.4       rstudioapi_0.15.0 hms_1.1.3         lifecycle_1.0.4  
[33] vctrs_0.6.4       downlit_0.4.3     evaluate_0.23     glue_1.6.2       
[37] fansi_1.0.5       rmarkdown_2.25    tools_4.3.1       pkgconfig_2.0.3  
[41] htmltools_0.5.7

Acknowledgements and additional references

The content of this class borrows heavily from previous tutorials:

R code style guide: http://adv-r.had.co.nz/Style.html

Tutorial organization: https://github.com/sjaganna/molb7910-2019

Other R tutorials: https://github.com/matloff/fasteR https://r4ds.had.co.nz/index.html https://bookdown.org/rdpeng/rprogdatascience/