Exercises 8

Author

Jay Hesselberth

Published

October 21, 2024

Putting it all together

For the next two classes we’ll combine everything we’ve learned to process and visualize data from some some biological experiments. These exercises will illustrate a complete analysis pipeline – from data tidying to manipulation and visualization – using tools from the tidyverse.

Libraries

Load the libraries you need for analysis below.

library(tidyverse)
library(here)
library(cowplot)

A quantitative PCR experiment

Here is the experimental setup:

Two cell lines (wt and mut) were treated with a drug that induces interferon expression
After specific time points, cells were harvested and actin and interferon mRNA were analyzed by quantitative PCR (with 3 technical replicates), with a control containing no reverse transcriptase.

Load the data

These data are in two TSV files:

data/qpcr_names.tsv.gz
data/qpcr_data.tsv.gz

Load these data sets and inspect.

qpcr_names <- read_tsv(here("data/bootcamp/qpcr_names.tsv.gz"))

Rows: 8 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (13): row, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

qpcr_data <- read_tsv(here("data/bootcamp/qpcr_data.tsv.gz"))

Rows: 8 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (1): row
dbl (12): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Note the shape of the data and the names of the rows and columns. Do they remind you of anything?

Tidy the data

Given the experimental setup and the shape of the tibbles, you should be able to answer: Are these data tidy?

What are the variables in the data?
Are the variables the column names?

qpcr_data_long <-
  pivot_longer(qpcr_data, -row, names_to = "col")

qpcr_names_long <- 
  pivot_longer(qpcr_names, -row, names_to = "col") |>
  separate(
    value,
    into = c("gt", "time", "gene", "rt", "rep"),
    sep = "_"
  )

Merge the data

Note the structure of the tidied data. What columns (variables) are shared by both tibbles?

How we can join the data from these two tibbles, linking the sample identifiers with their gene expression values?

qpcr_tidy <-
  left_join(qpcr_names_long, qpcr_data_long) |>
  # we don't need row & col anymore.
  # the -RT samples are all 0, so we can drop those, too
  filter(rt == "+") |>
  select(-(row:col), -rt)

Joining with `by = join_by(row, col)`

Summarize the data

Calculate the mean and standard deviation across replicates.

Do this two ways:

Calculate the statistics for each gene separately.
Calculate a ratio of interferon to actin levels for each sample before calculating the mean and standard deviation of the ratios.

qpcr_summary <-
  group_by(
    qpcr_tidy,
    gt, time, gene) |>
  summarize(
    exp_mean = mean(value),
    exp_sd = sd(value)
  ) |>
  arrange(gt, time, gene)

`summarise()` has grouped output by 'gt', 'time'. You can override using the
`.groups` argument.

Plot the data

Now we can plot the summary statistics. We’ll use ggplot2::geom_pointrange() to represent the mean and standard deviation.

You’ll need to fill in the blanks (___) below.

ggplot(
  qpcr_summary,
  aes(
    x = ___,
    y = ___,
    color = ___
  ),
) +
  geom_pointrange(
    aes(
      ymin = ___,
      ymax = ___ 
    ),
    # position = ___
  )

Inspect the above plot. How might you improve it?

Copy the above chunk and add functions that modify the plot’s look and feel.

Facet the plot to see differences between the genotypes.
Update the theme using cowplot.
Update the x, y, and title labels (ggplot2::labs()).
Update the colors with a nicer palette (ggplot2::scale_*).
Fix the position of the geoms by updating their position aesthetic.

Interpret the plot

What can you say about the expression of ACTIN and IFN?
What can you say about the mutant and wild-type cells?