RNA Block - Problem Set 25

Published

October 21, 2024

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'cowplot'

The following object is masked from 'package:lubridate':

    stamp
here() starts at /home/runner/work/molb-7950/molb-7950
library(ggrepel)
library(pheatmap)
library(RColorBrewer)

Problem Set

Total points: 20. Q1 - 10 pts, Q2 - 10 points each.

Exercises: Take rMATS SE output, apply filters, and make a volcano plot. Also, make a heatmap of the significant PSI for each sample using all replicates. NOTE: this analysis will be performed for the mutually exclusive exons.

Exercises

We worked with an rMATS output file from an experiment in which mouse embryonic stem cells had been treated with either an shRNA against RBFOX2 or a control, nontargeting shRNA. We applied filters to remove any event from consideration in which the number of informative reads for that event (IJC + SJC) was less than 20 in any sample.

Q1 make volcano plot

Read in the rMATS output for mutually exclusive exons. Apply a new read coverage filter such that any event with less than 10 informative junction reads in any sample is removed. Then make a volcano plot where each dot is an event, the x-axis is the difference in PSI between the two conditions (aka IncLevelDifference) and the y axis is -log10 of the FDR. Color dots that pass a significance threshold (FDR < 0.05) in red. Label the 4 most significant events (i.e. 4 lowest FDR values) with the name of the gene they are in.

# Read in table
psis <- read.table(here("data/block-rna/rMATS/MXE.MATS.JC.txt.gz"), header = T) %>%
  # Get rid of columns we aren't really going to use.
  dplyr::select(., c("ID", "geneSymbol", "IJC_SAMPLE_1", "SJC_SAMPLE_1", "IJC_SAMPLE_2", "SJC_SAMPLE_2", "FDR", "IncLevel1", "IncLevel2", "IncLevelDifference"))

# Split the replicate read counts that are separated by commas into different columns
psis <- psis %>%
  separate(., col = IJC_SAMPLE_1, into = c("IJC_S1R1", "IJC_S1R2", "IJC_S1R3", "IJC_S1R4"), sep = ",", remove = T, convert = T) %>%
  separate(., col = SJC_SAMPLE_1, into = c("SJC_S1R1", "SJC_S1R2", "SJC_S1R3", "SJC_S1R4"), sep = ",", remove = T, convert = T) %>%
  separate(., col = IJC_SAMPLE_2, into = c("IJC_S2R1", "IJC_S2R2", "IJC_S2R3", "IJC_S2R4"), sep = ",", remove = T, convert = T) %>%
  separate(., col = SJC_SAMPLE_2, into = c("SJC_S2R1", "SJC_S2R2", "SJC_S2R3", "SJC_S2R4"), sep = ",", remove = T, convert = T)

# filter events (reads >= 10)
thresh <- 10

psis_filt <- psis %>%
  mutate(., S1R1counts = IJC_S1R1 + SJC_S1R1) %>%
  mutate(., S1R2counts = IJC_S1R2 + SJC_S1R2) %>%
  mutate(., S1R3counts = IJC_S1R3 + SJC_S1R3) %>%
  mutate(., S1R4counts = IJC_S1R4 + SJC_S1R4) %>%
  mutate(., S2R1counts = IJC_S2R1 + SJC_S2R1) %>%
  mutate(., S2R2counts = IJC_S2R2 + SJC_S2R2) %>%
  mutate(., S2R3counts = IJC_S2R3 + SJC_S2R3) %>%
  mutate(., S2R4counts = IJC_S2R4 + SJC_S2R4) %>%
  filter(., S1R1counts >= thresh & S1R2counts >= thresh & S1R3counts >= thresh & S1R4counts >= thresh &
    S2R1counts >= thresh & S2R2counts >= thresh & S2R3counts >= thresh & S2R4counts >= thresh)

# Separate the inclusion levels for each sample and replicte (you will need this later)

psis_filt_psi <- psis_filt %>%
  separate(., col = IncLevel1, into = c("PSI_S1R1", "PSI_S1R2", "PSI_S1R3", "PSI_S1R4"), sep = ",", remove = T, convert = T) %>%
  separate(., col = IncLevel2, into = c("PSI_S2R1", "PSI_S2R2", "PSI_S2R3", "PSI_S2R4"), sep = ",", remove = T, convert = T)


# Add another column to table that says whether or not this event is significant (FDR < 0.05)

psis_filt_psi <- psis_filt_psi %>%
  mutate(., sig = ifelse(FDR <= 0.05, "sig", "ns"))


# Volcano Plot with sig events in red
ggplot(
  data = psis_filt_psi,
  aes(
    x = IncLevelDifference,
    y = -log10(FDR),
    color = sig
  )
) +
  scale_color_manual(values = c("black", "red")) +
  geom_point() +
  theme_cowplot()

explosion

Exercise 2

Take your skipped exon data and make a heatmap where the rows are events, columns are samples (each replicate separately, 4 replicates per condition), and the color of the cell is a scaled PSI value. Only plot significant (FDR < 0.05) events.

# Filter for those that are significant (FDR < 0.05) and only keep columns with inclusion differences

rownames(psis_filt_psi) <- paste(psis_filt_psi$geneSymbol, psis_filt_psi$ID, sep = "_")

psi_hm <- psis_filt_psi %>%
  filter(FDR < 0.05) %>%
  select(-FDR) %>%
  select(., c(contains("PSI")))


# Plot with pheatmap(), using scale = 'row' to plot scaled PSI values
pheatmap(psi_hm,
  filename = here("img/block-rna/psi_heatmap.png"),
  clustering_method = "ward.D2",
  scale = "row"
)

you don’t need to have the gene symbols