Chromatin accessibility I

Chromatin-centric measurement of genomic features

Published

October 20, 2025

Chromatin accessbility patterns and genome function

This class we’ll examine chromatin accessibility patterns and begin to get a sense of what they mean, both at the fine-scale (single base-pair) and across the genome.

Load the libraries

These are libraries we’ve used before.


These are new libraries specifically for genome analysis. You learned about valr and Gviz for your homework.

  • TxDb.Scerevisiae.UCSC.sacCer3.sgdGene provides gene annotations for the S. cerevisiae genome.

Load the data

In this and the next class we will analyze ATAC-seq and MNase-seq data sets from budding yeast. Here are the references for the two data set:

ATAC-seq

Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 2015 PMID: 26314830; PMCID: PMC4617971. [Link] [Data]

MNase-seq

Zentner GE, Henikoff S. Mot1 redistributes TBP from TATA-containing to TATA-less promoters. Mol Cell Biol. 2013 PMID: 24144978; PMCID: PMC3889552. [Link] [Data]

Experimental consideration

In a standard MNase-seq experiment, DNA around ~150 bp is extracted to look closely at nucleosome occupancy & positioning. However, the above study did not perform size selection. This is important as now we can look at both transcription factor binding sites and nucleosome positions.

Fragment size distributions are informative

First, we will determine the fragment size distributions obtained from the two experiments. These sizes are the fingerprints of particles that were protecting nuclear DNA from digestion.

I have performed the alignment of paired-end reads and converted all reads into a bed file where each line of the bed file denotes a single fragment from start to end.

First, load the ATAC-seq reads:

Next, load the MNase reads.

Working with a small genome is a huge advantage – we can study the whole chromosome in this class.

Expectations for chromatin fragment lengths

Let’s remind ourselves of the expectations for chromatin fragment lengths from MNase-seq and ATAC-seq experiments.

MNase-seq

ATAC-seq

Length distributions of chromatin-derived DNA fragments

For the MNase-seq BED file, you see that there are only three columns: chrom, start, and end.

Calculating fragment length is simple:

Let’s use this approach to examine the fragment length distribution.

First, we will combine the two data sets into a single tibble, adding a new column to indicate the type of experiment.


Interpretations

  1. How would you describe the two fragment length distributions? Are they similar?

  2. Can you make any biological conclusions based on the length distributions?

Periodicity in the fragment lengths

The ATAC data seems to be periodic. How can we test that hypothesis? We can calculate the autocorrelation of the length distribution. Can someone explain what autocorrelation means?

We’ll use the base hist function to calculate the densities of the above histogram. Let’s write a function we can use to analyze fragment lengths.



Autocorrelation

The density slot contains a vector of densities at base-pair resolution. We will use acf() to calculate the autocorrelation of these values, and will store the tidied result.

Autocorrelation

Now let’s plot the autocorrelation. First, we define a function.

And then we make the plot.

Autocorrelation

So, it looks like there significant bumps in autocorrelation at 10 and 21 bp positions, indicating that ATAC length distribution is periodic.

How do we confirm these bumps are interesting? Let’s calculate the acf of a negative control – the length distribution of the MNase data.

Interpretation

We can see a monotonic decrease in the MNase-seq data, which confirms that the bumps we see are distinctive features of ATAC-seq data.

What are these features? Consider that the specificity of binding of DNase, MNase, and Tn5 is not completely generic. These enzymes have specificity for the minor groove of DNA, and there is an optimal substrate geometry for cleavage. You can see this in previous studies, where DNase-seq revealed high-resolution views of DNA:protein structures.

So what then, exactly is the ~10-11 bp periodicity? And why is this not present in MNase data?

Molecular Picture of DNA accessibility

Visualize read density in genomic region

We will use Gviz to visualize read densities relative to a reference.

Load tracks

First, we load the gene annotations from the Saccharomyces Genome Databases (SGD).

Import bigWig

Next, import the bigwig file containing yeast nucleosome-sized fragments (via MNase-seq) using valr::read_bigwig().

Inspect the object.

(What is “GRanges”?)

Load track

Next, load the GRanges object as a track for Gviz to plot:

Vizualize a genomic region

Now, we can make a plot for this particular region of chrII:

Load the remaining data

That looks great! Let’s load all the other data sets.

  1. Load each bigWig as a GRanges object with valr::read_bigwig()
  2. Convert each to a Gviz::DataTrack() for plotting

Load the remaining data

We can do this one of two ways. We could do it one-by-one:


Or we can create a tibble with file and track names, and use purrr to load and convert each one.

First, we define a tibble of files and metadata.


Load the remaining data


Load the remaining data

Now, we just have to make a list of tracks to plot and Gviz takes care of the rest.

Interpretations

Recall this plot:

Some questions to think about as you look at the tracks:

  1. What is each data set reporting on?
  2. What are the major differences between MNase-seq and ATAC-seq based on these tracks?
  3. What can you infer about gene regulation based on these tracks?