RNA Bioscience Initiative | CU Anschutz
2024-10-21
Throughout the DNA and RNA blocks will refer to several data sets that might be good starting points for your final project (worth 20% of your grade).
We will ask for a sketch of the rough plan for a final project by the end of week 7 (Fri Oct 11). In addition, if you plan to work in a group, we’d like to know who you will be working with.
Final projects will be due Friday Nov 3. We will schedule short (5 minute) talks by each group on Oct 28 and Oct 29.
Before genome-wide DNA accessibility measurements, we knew about chromatin transactions at only a handful of loci.
This was a classic “keys under the lamppost” situation, leading to general models of chromatin-based gene regulation.
DNase-seq | ATAC-seq | MNase-seq | |
---|---|---|---|
Genome representation | Most active regions | Most active regions | Whole genome |
Ease of experiment | Very difficult | Easy peasy | One day’s work |
What is profiled? | Accessible DNA, “footprints” at low cut frequency | Accessible DNA. Not really “footprints”, single turnover enzyme, so fragments are not informative | Protections of TFs and nucleosomes |
chrom
, start
, end
name
, score
, strand
. Strand can be +
, -
, or .
(no strand)chr7 127473530 127474697 Pos3 0 +
chr7 127474697 127475864 Pos4 0 +
chr7 127475864 127477031 Neg1 0 -
chr7 127477031 127478198 Neg2 0 -
WIG and bedGraph store interval signals.
Many studies will provide genome-scale data in these formats
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50
chr19 49302900 49303200 -0.25
The primary tool in the genome interval analysis is BEDtools – it’s the Swiss-army knife of internal analysis.
We wrote an R package called valr that provides the same tools, but you don’t need to leave RStudio. valr provides the same tools for reading and manipulating genome intervals.
bed_intersect()
is a fundamental operation. It identifies intervals from two tibbles that intersect and reports their overlaps.
Let’s take a look at that it does.
bed_intersect()
exampleYou can use read_bed()
and related functions to load genome annotations and signals.
snps
and genes
?# A tibble: 6 × 6
chrom start end name score strand
<chr> <int> <int> <chr> <chr> <chr>
1 chr22 16053247 16053248 rs587721086 0 +
2 chr22 16053443 16053444 rs80167676 0 +
3 chr22 16055964 16055965 rs587706951 0 +
4 chr22 16069373 16069374 rs2154787 0 +
5 chr22 16069782 16069783 rs1963212 0 +
6 chr22 16100513 16100514 rs8140563 0 +
# A tibble: 6 × 6
chrom start end name score strand
<chr> <int> <int> <chr> <chr> <chr>
1 chr22 16150259 16193004 AK022914 8 -
2 chr22 16162065 16172265 LINC00516 3 +
3 chr22 16179617 16181004 BC017398 1 -
4 chr22 16239287 16239327 DQ590589 1 +
5 chr22 16240245 16240277 DQ573684 1 -
6 chr22 16240300 16240340 DQ595048 1 -
Let’s find and characterize intergenic SNPs. We’ll use the tools bed_substract()
and bed_closest()
. Take a look and their examples in the documentation to see what they do.
Take a look at the intergenic
and nearby
objects in the console.
Now that you have overlaps and distances between SNPs and genes, you can go back to dplyr tools to generate reports.
# A tibble: 1,047 × 4
name.x name.y .overlap .dist
<chr> <chr> <int> <int>
1 rs530458610 P704P 0 2579
2 rs2261631 P704P 0 -268
3 rs570770556 POTEH 0 -913
4 rs538163832 POTEH 0 -953
5 rs190224195 POTEH 0 -1399
6 rs2379966 DQ571479 0 4750
7 rs142687051 DQ571479 0 3558
8 rs528403095 DQ571479 0 3309
9 rs555126291 DQ571479 0 2745
10 rs5747567 DQ571479 0 -1778
# ℹ 1,037 more rows
bed_map()
examplebed_map()
does two things in order:
x
and y
A typical use is to count up signals (e.g., coverage from an MNase-seq experiment) over specific regions (e.g., promoter regions).
bed_map()
exampleCopy / paste these into your console.
bed_map()
example continuedFirst examine the intersecting intervals.
# A tibble: 3 × 7
chrom start.x end.x start.y end.y value.y .overlap
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 chr1 100 250 100 250 10 150
2 chr1 100 250 150 250 20 100
3 chr2 250 500 250 500 500 250
# A tibble: 2 × 5
chrom start end .sum .count
<chr> <dbl> <dbl> <dbl> <int>
1 chr1 100 250 30 2
2 chr2 250 500 500 1
Course website: https://rnabioco.github.io/molb-7950