Chromatin accessibility II

Jay Hesselberth

RNA Bioscience Initiative | CU Anschutz

2024-10-21

Genomewide chromatin analysis with meta-plots and heatmaps

Last class we saw what the different methods to profile chromatin accessibility can tell us about general chromatin structure and possible regulation at specific regions in a small portion of a chromosome.

We also want to make sure these conclusions are valid throughout the genome. Since we want to keep the file sizes small, we will ask if they are valid across an entire chromosome.

Load libraries

First we will plot the profiles of all our data sets relative to the transcription start site (TSS), where all the action seems to be happening.

Load data

First, we need to load relevant files:

  • yeast_tss_chrII.bed.gz contains transcription start sites (TSS) for genes on yeast chromosome 2.
  • sacCer3.chrom.sizes contains the sizes of all yeast chromosomes, needed for some of the calculations we’ll do. We’ll grab this from the UCSC download site.

Load signals

Next we’ll load bigWigs for the ATAC and MNase experiments, containing either short or long fragments.

Recall that the information encoded in short and long fragments should be reflected in our interpretations.

Meta-plots

Setting up regions for a meta-plot

Next, we need to set up some windows for analyzing signal relative to each TSS. This is an important step that will ultimately impact our interpretations.

In genomic meta-plots, you first decide on a window size relevant to the features you are measuring, and then make “windows” around a reference point, spanning some distance both up- and downstream. If the features involve gene features, we also need to take strand into account.

Setting up regions for a meta-plot

Reference points could be:

  • transcription start or end sites
  • boundaries of exons and introns
  • enhancers
  • centromeres and telomeres

Setting up regions for a meta-plot

The window size should be relevant the reference points, such that small- or large-scale features are emphasized in the plot. Moreover, the window typically spans some distance both up- and downstream of the reference points.

Setting up regions for a meta-plot

Once the window size has been decided, the next step is to make “sub-windows” around a reference point. If gene features are involved, we also need to take strand into account.

(The state of genome annotation directly influences the quality of the meta-plot or heatmap.)

Setting up regions for a meta-plot

For small features like transcription factor binding sites (8-20 bp), you might set up smaller windows (maybe 1 bp) at a distance ~20 bp up- and downstream of a reference point.

For larger features like nucleosome positions or chromatin domains, you might set up larger windows (~200 bp) at distances up to ~10 kbp up- and downstream of a set of reference points.

Metaplot workflow

Metaplot workflow overview

Chromatin accessibility around transcription start sites (TSSs)

Chromatin accessibility around transcription start sites (TSSs)

Next, we’ll use two valr functions to expand the window of the reference point (bed_slop()) and then break those windows into evenly spaced intervals (bed_makewindows()).

# A tibble: 50,887 × 7
   chrom start   end name    score strand .win_id
   <chr> <int> <int> <chr>   <chr> <chr>    <int>
 1 chrII  9800  9810 YBL107C .     -            1
 2 chrII  9810  9820 YBL107C .     -            2
 3 chrII  9820  9830 YBL107C .     -            3
 4 chrII  9830  9840 YBL107C .     -            4
 5 chrII  9840  9850 YBL107C .     -            5
 6 chrII  9850  9860 YBL107C .     -            6
 7 chrII  9860  9870 YBL107C .     -            7
 8 chrII  9870  9880 YBL107C .     -            8
 9 chrII  9880  9890 YBL107C .     -            9
10 chrII  9890  9900 YBL107C .     -           10
# ℹ 50,877 more rows

Chromatin accessibility around transcription start sites (TSSs)

At this point, we also address the fact that the TSS data are stranded. Can someone describe what the issue is here, based on the figure above?

Chromatin accessibility around transcription start sites (TSSs)

This next step uses valr bed_map(), to calculate the total signal for each window by intersecting signals from the bigWig files.

Chromatin accessibility around transcription start sites (TSSs)

Once we have the values from bed_map(), we can group by win_coord and calculate a summary statistic for each window.

Remember that win_coord is the same relative position for each TSS, so these numbers represent a composite signal a the same position across all TSS.

Meta-plot of signals around TSSs

Finally, let’s plot the data relative to TSS for each of the windows.

Interpreting the meta-plots

  • What is the direction of transcription in these meta-plots?

  • What are the features of chromatin near TSS measured by these different experimental conditions?

  • How do you interpret the increased signal of the +1 nucleosome in the “MNase_Long” condition, relative to e.g. -1, +2, +3, etc.?

  • What are the differences in ATAC and MNase treatments that lead to these distinctive patterns?

Heatmaps

Heatmap of signals around TSSs

To generate a heatmap, we need to reformat our data slightly.

Take a look at acc_tbl and think about how you might reorganize the following way:

  • rows contain the data for individual loci (i.e., each TSS)
  • columns are ordered positions relative to the TSS (i.e., most upstream to most downstream)

Heatmap of signals around TSSs

We’re going to plot a heatmap of the “MNase_Long” data. There are two ways to get these data.

Or, using dplyr / tidyr:

Heatmap of signals around TSSs

Either way, now we need to reformat the data.

Heatmap of signals around TSSs

Once we have the data reformatted, we just convert to a matrix and feed it to ComplexHeatmap::Heatmap().

Interpreting meta-plots and heatmaps

It’s worth considering what meta-plots and heatmaps can and can’t tell you.

  1. What are the similarities and differences between heatmaps and meta-plots?

  2. What types of conclusions can you draw from each type of plot?

  3. What are some features of MNase-seq and ATAC-seq that become more clear when looking across many loci at the same time?

  4. What are some hypotheses you can generate based on these plots?