Chromatin accessbility patterns and genome function
This class we’ll examine chromatin accessibility patterns and begin to get a sense of what they mean, both at the fine-scale (single base-pair) and across the genome.
Load the libraries
These are libraries we’ve used before. patchwork is a library for composing groups of plots.
These are new libraries specifically for genome analysis. You learned about valr and Gviz for your homework.
rtracklayer provides a few utility functions we’ll use today, and TxDb.Scerevisiae.UCSC.sacCer3.sgdGene provides gene annotations for the S. cerevisiae genome.
Load the data
In this and the next class we will analyze ATAC-seq and MNase-seq data sets from budding yeast. Here are the references for the two data set:
ATAC-seq
Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 2015 PMID: 26314830; PMCID: PMC4617971. [Link][Data]
MNase-seq
Zentner GE, Henikoff S. Mot1 redistributes TBP from TATA-containing to TATA-less promoters. Mol Cell Biol. 2013 PMID: 24144978; PMCID: PMC3889552. [Link][Data]
Experimental consideration
In a standard MNase-seq experiment, DNA around ~150 bp is extracted to look closely at nucleosome occupancy & positioning. However, the above study did not perform size selection. This is important as now we can look at both transcription factor binding sites and nucleosome positions.
Fragment size distributions are informative
First, we will determine the fragment size distributions obtained from the two experiments. These sizes are the fingerprints of particles that were protecting nuclear DNA from digestion.
I have performed the alignment of paired-end reads and converted all reads into a bed file where each line of the bed file denotes a single fragment from start to end.
Let’s use this approach to examine the fragment length distribution.
acc_tbl <-bind_rows(mutate(mnase_tbl, type ="mnase"),mutate(atac_tbl, type ="atac") )ggplot( acc_tbl,aes(x = end - start)) +geom_histogram(# single base-pair resolutionbinwidth =1 ) +facet_grid(rows =vars(type),scales ="free_y" ) +xlim(30, 500) +labs(x ="fragment length (bp)",title ="Histogram of fragment lengths from\naccessibility measurements" ) +theme_cowplot()
Interpretations
How would you describe the two fragment length distributions? Are they similar?
Can you make any biological conclusions based on the length distributions?
Periodicity in the fragment lengths
The ATAC data seems to be periodic. How can we test that hypothesis? We can calculate the autocorrelation of the length distribution. Can someone explain what autocorrelation means?
We’ll use the base hist function to calculate the densities of the above histogram. Let’s write a function we can use to analyze fragment lengths.
The density slot contains a vector of densities at base-pair resolution. We will use acf() to calculate the autocorrelation of these values, and will store the tidied result.
We can see a monotonic decrease in the MNase-seq data, which confirms that the bumps we see are distinctive features of ATAC-seq data.
What are these features? Consider that the specificity of binding of DNase, MNase, and Tn5 is not completely generic. These enzymes have specificity for the minor groove of DNA, and there is an optimal substrate geometry for cleavage. You can see this in previous studies, where DNase-seq revealed high-resolution views of DNA:protein structures.
So what then, exactly is the ~10-11 bp periodicity? And why is this not present in MNase data?
Molecular Picture of DNA accessibility
Visualize read density in genomic region
We will use Gviz to visualize read densities relative to a reference.
Next, load the GRanges object as a track for Gviz to plot:
DataTrack 'MNase_nuc'
| genome: NA
| active chromosome: chrII
| positions: 81160
| samples:1
| strand: *
Vizualize a genomic region
Now, we can make a plot for this particular region of chrII:
# special track for the x-axisx_axis <-GenomeAxisTrack()plotTracks(c( sgd_genes, mnase_nuc_trk, x_axis ),from =530811,to =540885,chromosome ="chrII",transcriptAnnotation ="gene",shape ="arrow",type ="histogram")
Load the remaining data
That looks great! Let’s load all the other data sets.
Load each bigWig as a GRanges object with rtracklayer::import.bw()
Convert each to a Gviz::DataTrack() for plotting
Load the remaining data
We can do this one of two ways. We can do it one-by-one:
# complete the following steps for each of the four tracksfile_name <-"yeast_mnase_lt50.bw"track_name <-"MNase_Short"big_wig <-import.bw(here("data/block-dna", file_name))data_track <-DataTrack(big_wig, track_name)
Load the remaining data
Or we can create a tibble with file and track names, and use purrr to load and convert each one: