Code
<- read_bed(___) mnase_tbl
Total points: 20.
For this problem set, you will explore the relationship between
Start by loading libraries you need analysis in the code chunk below.
Load the data from the MNase-seq experiment.
<- read_bed(___) mnase_tbl
In class we learned that MNase digestion yields nucleosomal “footprints” of ~150 bp in size. I’ve added blue vertical lines to emphasize positions of the major peak (intact nucleosomes) as well as smaller “sub-nucleosomal” peak.
We can calculate the counts for the histogram above and more precisely determine the maximum signal using which.max()
to identify the index of the maximum value in a vector (not the value itself!):
<-
frag_hist |>
mnase_tbl mutate(frag_len = ___) |>
count(frag_len)
# `which.max` takes a vector and gives us the index of the maximum value
<- which.max(___)
max_idx
# now we can use that index to find the most abundant fragment size
<- frag_hist$frag_len[max_idx] ncp_max
So this tells us that that the most abundant fragment size in the library is ** ___ *** bp.
Let’s take a closer look at the some of the smaller fragments in the MNase experiment. In particular, let’s zoom in on the populations of fragments that are smaller than 1 nucleosome in size, the peak between 85 and 95 bp (the left-most blue vertical line).
<- filter(
frag_hist_sub
frag_hist,
___
)
<- which.max(___)
n_max_sub # use the strategy above to identify the position of maximum signal
<- ___ subnuc_max
Answer:
The maximum size of sub-nucleosomal fragments is at ___ bp.
<- filter(
frag_hist_di
frag_hist,
___
)
<- which.max(___)
n_max_di <- ___ dinuc_max
ggplot(
mnase_tbl,aes(x = ___)
+
) geom_histogram(
# single base-pair resolution
binwidth = ___
+
) labs(
x = "___",
title = "___"
+
) theme_cowplot() +
geom_vline(
xintercept = ___,
color = "blue"
)
Next we’re going to look at where these sub-sucleosomes are with respect to intact nucleosomes.
Our strategy will be to compare the density of sub-nucleosomes relative to the mid-points of previously mapped nucleosomes. Specifically, our reference point will be the midpoints of the +1 nucleosomes.
So you’ll make a metaplot, but instead of using transcription start sites as the reference point, we’ll use the midpoints of the +1 nucleosome, and instead of MNase-seq signal density, you’ll count up the number of individual reads that intersect with windows around those midpoints.
First, load the relevant data. We’ll re-use the yeast_mnase_chrII.bed.gz
data you loaded above, plus you’ll need to load two other files:
sacCer3.chrom.sizes
yeast_p1_chrII.bed.gz
which contains the mid-points of the +1 nucleosomes on chromosome 2. Recall that the +1 nucleosome is the nucleosome downstream of the transcription start site.<- read_genome(___)
genome
<- read_bed(___) p1_tbl
<- function(tbl, min_len, max_len) {
calc_mids |>
tbl mutate(
frag_len = ___
|>
) filter(
___|>
) mutate(
midpoint = ___
|>
) select(chrom, midpoint) |>
rename(start = midpoint) |>
mutate(end = start + 1)
}
bed_slop()
, since we’ll be looking for intersections with these intervals.<-
ncp_mids_tbl # specify the min and the max sizes, expanding out from the max size 3 bp in each direction to capture the peak
calc_mids(mnase_tbl, ___, ___) |>
# now expand out by 20 bp on either side
bed_slop(___, ___)
<-
subnuc_mids_tbl calc_mids(mnase_tbl, ___, ___) |>
bed_slop(___, ___)
<-
p1_win_tbl bed_slop(
p1_tbl,
___,
___|>
) bed_makewindows(
___ )
Almost there! Now you just need to identify the number of short and long nuclesome fragments (based on their midpoints) that intersect with the +1 nucleosomes you defined above.
Use bed_intersect()
to identify fragments that overlap, and then just count the number of fragments per .win_id
(don’t forget the ..x
suffix). Note you will do this separately for the short and long fragments.
ncp_mids_summary_tbl <-
bed_intersect(
p1_win_tbl,
ncp_mids_tbl
) |>
count(.win_id.x) |>
mutate(type = "Intact nucleosomes (~149 bp)")
subnuc_mids_summary_tbl <-
bed_intersect(
p1_win_tbl,
subnuc_mids_tbl
) |>
count(.win_id.x) |>
mutate(type = "Sub-nucleosomes (~90 bp)")
win_ids <- seq(-100, 100, 1)
all_tbl <- bind_rows(
ncp_mids_summary_tbl,
subnuc_mids_summary_tbl
) |>
mutate(win_ids = win_ids, .by = "type")
ggplot(
all_tbl,
aes(x = win_ids, y = n)
) +
# add a line representing nucleosome density
geom_line(
linewidth = 1,
color = "red"
) +
# facet by type of fragment
facet_wrap(
~type,
scales = "free_y"
) +
theme_minimal_grid() +
labs(
x = "Position relative to +1 nucleosome midpoints",
y = "Number of intersecting fragments",
title = "Fragment density around +1 nucleosome midpoints"
)
Your plot should look like this:
How do you interpret these plots?
Rationalize the pattern for intact nucleosomes. What pattern did you expect to see?
Answer:
Rationalize the pattern for sub-nucleosome. How would you describe the positions of sub-nucleosomal fragments, relative to the +1 nucleosome midpoints? What might this mean with respect to gene transcription?
Answer:
What do the differences between signal magnitudes (reflected by the y-axis) mean?
Answer:
Be sure to click the “Render” button to render the HTML output. Then paste the URL of this Posit Cloud project into the problem set on Canvas.