MOLB 7950 – Long-read sequencing platforms & applications

Short & long read DNA sequencing

We have entered a new era (Swifties unite) wherein routine application of long-read sequencing is revealing new principles of gene regulation.

Comparing data from short- and long-read experiments

Things that are invisible to short-read sequencing

Genomic features greater 500 bp in size (i.e., the size of an Illumina fragment):

Patterns of pre-mRNA splicing (which exons / introns in a molecule)
Structural genomic variants
Chromatin structure across a single locus.

Patterns we have examined in class rely on patterns that emerge from many, small DNA or RNA fragments.

Long-read experiments examine single molecules and can capture patterns captured in single DNA fragments.

Major platforms — Pacific Biosciences

>25 kb reads
Low error rates achieved by “circular consensus”
Only DNA (RNA used to be available 😢)

Major platforms - Oxford Nanopore

>150 kb reads
higher error rates (0.1-1%)
detection of modified bases (5mC, RNA mods)

DNA case study – T2T

The T2T (telomere-to-telomere) consortium used long-read sequencing to define the first complete draft of a human genome, including telomeric and centromeric repeats.

DNA case study - Fiber-seq

Stergachis et al. (2020) Nature

RNA case study – splicing patterns

RNA case study – co-transcriptional communication

Applications beyond long-read nucleic acid sequencing

We might be able to sequence proteins soon. Why is this cool?