RNAseq QC

Matthew Taliaferro

RNA Bioscience Initiative | CU Anschutz

2024-10-21

Learning Objectives

By the end of the class, you should be able to:

Understand how salmon can be used to quantify gene expression given an RNAseq dataset
Collapse transcript-level quantifications to gene level quantifications using tximport
Deduce the relationships of samples to each other using hierarchical clustering and PCA

Rigor & Reproducibility

As with all computational experiments (yes, they are experiments, don’t let your pipette-toting friends tell you otherwise), keeping track of what you did is key. In the old days, I kept a written notebook of commands that I ran. Sounds silly, but there were many times that I went back to that notebook to see exactly what the parameters were for a given run using a piece of software.

Today, there are better options. You are using one of the better ones right now. Notebooks, including RMarkdown (mainly for R) and Jupyter (mainly for Python), are a great way to keep track of what you did as well as give justification or explanation for your analyses using plain ’ol English.

Trust me, one day you will be glad you used them. The Methods section of your paper is never fun to write without them.

Overview

In this class, we will examine RNAseq data collected over a time-course of differentiation from mouse embryonic stem cells to cortical glutamatergic neurons (Hubbard et al, F1000 Research (2013)). In this publication, the authors differentiated mESCs to neurons using a series of in vitro culture steps over a period of 37 days. During this timecourse, samples were extracted at selected intervals for transcriptome analysis. Importantly, for each timepoint, either 3 or 4 samples were taken for RNA extraction, library preparation and sequencing. This allows us to efficiently use the statistical frameworks provided by the DESeq2 package to identify genes whose RNA expression changes across the timecourse.

Experiment design

Cells were grown in generic differentiation-promoting media (LIF-) for 8 days until aggreates were dissociated and replated in neuronal differentiation media. This day of replating was designated as in vitro day 0 (DIV0). The timepoints taken before this replating therefore happened at “negative” times (DIV-8 and DIV-4). Because naming files with dashes or minus signs can cause problems, these samples are referred to as DIVminus8 and DIVminus4. Following the replating, samples were taken at days 1, 7, 16, 21, and 28 (DIV1, DIV7, DIV16, DIV21, and DIV28).

QC overview

Today we will focus on some quality control steps that are good ideas to do for every RNAseq dataset you encounter, whether produced by yourself or someone else.

Goal

Today we will focus on some Quality Control steps that are good ideas to do for every RNA-seq data set you encounter, whether produced by yourself or someone else.

Quantifying transcript expression with `salmon`

We recently learned about the RNAseq quantification tool salmon. We won’t rehash the details here about how salmon works. For our purposes, we just need to know that salmon reads in a fastq file of sequencing reads and a fasta file of transcript sequences to be quantified. Let’s take a look at this fasta file of transcripts:

head -n 20 data/block-rna/gencodecomprehensive.vM17.allcdna_subset.fa

Looks like we have ensembl transcript IDs, which is a good idea. I can tell because they start with ‘ENS’. Using ensembl IDs as transcript names will allow us to later collate transcript expression levels into gene expression levels using a database that relates transcripts and genes. More on that later.

Making a transcriptome index

The first step in quantifying these transcripts is to make an index from them. This is done as follows:

salmon index -t <transcripts.fa> -i <transcripts.idx> --type quasi -k <k>

Here, transcripts.fa is a path to our fasta file, transcripts.idx is the name of the index that will be created, and k is the length of the kmers that will be used in the hash table related kmers and transcripts. k is the length of the minimum accepted match for a kmer in a read and a kmer in a transcript. Longer kmers (higher values of k) will therefore be more stringent, and lowering k may improve mapping sensitivity at the cost of some specificity. You may also see here how read lengths can influence what value for k you should choose.

Consider an experiment where we had 25 nt reads (this was true wayyyyyy back in the old, dark days of high-throughput sequencing). What’s going to happen if I quantify these reads using an index where the kmer size was set to 29? Well, nothing will align. The index has represented the transcriptome in 29 nt chunks. However, no read will match to these 29mers because there are no 29mers in these reads! As a general rule of thumb, for reads 75 nt and longer (which is the bulk of the data produced nowadays), a good value for k that maximizes both specificity and sensitivity is 31. However, datasets that you may retrieve from the internet, particularly older ones, may have shorter read lengths, so keep this is mind when defining k.

Quantifying reads against this index

Once we have our index, we can quantify transcripts in the index using reads from our fastq files.

salmon quant --libType A -p 8 --seqBias --gcBias --validateMappings -1 <forwardreads.fastq> -2 <reversereads.fastq> -o <outputname> --index <transcripts.idx>

In this command, our forward and reverse read fastq files are supplied to -1 and -2, respectively. If the experiment produced single end reads, -2 is omitted. <transcripts.idx> is the path to the index produced in the previous step. I’m not going to go through the rest of the flags used here, but their meanings as well as other options can be found here.

Salmon outputs

Let’s take a look at what salmon spits out. The first file we will look at is a log that is found at /logs/salmon_quant.log. This file contains info about the quantification, but there’s one line of this file in particular that we are interested in. It lets us know how many of the reads in the fastq file that salmon found a home for in the transcriptome fasta.

less data/block-rna/differentiation_salmonouts/DIV0.Rep1/logs/salmon_quant.log

There are a lot of lines in this file, but really only one that we are interested in. We want the one that tells us the “Mapping rate.” How could we easily and efficiently look at the mapping rates of all our samples? Grep!

#Get the mapping rates for all samples
#In each log file, the line that we are interested in contains the string 'Mapping ' (notice the space)
grep 'Mapping ' data/block-rna/differentiation_salmonouts/*/logs/salmon_quant.log

Moving from transcript quantifications to gene quantifications

As we discussed, salmon quantifies transcripts, not genes. However, genes are made up of transcripts, so we can calculate gene expression values from transcript expression values if we knew which transcripts belonged to which genes.

Relating genes and transcripts

We can get this relationships between transcripts and genes through biomaRt.

biomaRt has many tables that relate genes, transcripts, and other useful data include gene biotypes and gene ontology categories, even across species. Let’s use it here to get a table of genes and transcripts for the mouse genome.

               biomart                version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 112
2   ENSEMBL_MART_MOUSE      Mouse strains 112
3     ENSEMBL_MART_SNP  Ensembl Variation 112
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 112

I encourage you to see what is in each mart, but for now we are only going to use ENSEMBL_MART_ENSEMBL.

Using biomaRt

Alright, we’ve chosen our mart. What data sets are available in this mart?

dataset	description	version
abrachyrhynchus_gene_ensembl	Pink-footed goose genes (ASM259213v1)	ASM259213v1
acalliptera_gene_ensembl	Eastern happy genes (fAstCal1.3)	fAstCal1.3
acarolinensis_gene_ensembl	Green anole genes (AnoCar2.0v2)	AnoCar2.0v2
acchrysaetos_gene_ensembl	Golden eagle genes (bAquChr1.2)	bAquChr1.2
acitrinellus_gene_ensembl	Midas cichlid genes (Midas_v5)	Midas_v5
amelanoleuca_gene_ensembl	Giant panda genes (ASM200744v2)	ASM200744v2
amexicanus_gene_ensembl	Mexican tetra genes (Astyanax_mexicanus-2.0)	Astyanax_mexicanus-2.0
anancymaae_gene_ensembl	Ma's night monkey genes (Anan_2.0)	Anan_2.0
aocellaris_gene_ensembl	Clown anemonefish genes (ASM2253959v1)	ASM2253959v1
apercula_gene_ensembl	Orange clownfish genes (Nemo_v1)	Nemo_v1
aplatyrhynchos_gene_ensembl	Mallard genes (ASM874695v1)	ASM874695v1
apolyacanthus_gene_ensembl	Spiny chromis genes (ASM210954v1)	ASM210954v1
applatyrhynchos_gene_ensembl	Duck genes (CAU_duck1.0)	CAU_duck1.0
atestudineus_gene_ensembl	Climbing perch genes (fAnaTes1.3)	fAnaTes1.3
bbbison_gene_ensembl	American bison genes (Bison_UMD1.0)	Bison_UMD1.0
bgrunniens_gene_ensembl	Domestic yak genes (LU_Bosgru_v3.0)	LU_Bosgru_v3.0
bihybrid_gene_ensembl	Hybrid - Bos Indicus genes (UOA_Brahman_1)	UOA_Brahman_1
bmusculus_gene_ensembl	Blue whale genes (mBalMus1.v2)	mBalMus1.v2
bmutus_gene_ensembl	Wild yak genes (BosGru_v2.0)	BosGru_v2.0
bsplendens_gene_ensembl	Siamese fighting fish genes (fBetSpl5.2)	fBetSpl5.2
btaurus_gene_ensembl	Cow genes (ARS-UCD1.3)	ARS-UCD1.3
cabingdonii_gene_ensembl	Abingdon island giant tortoise genes (ASM359739v1)	ASM359739v1
catys_gene_ensembl	Sooty mangabey genes (Caty_1.0)	Caty_1.0
cauratus_gene_ensembl	Goldfish genes (ASM336829v1)	ASM336829v1
cccarpio_gene_ensembl	Common carp genes (Cypcar_WagV4.0)	Cypcar_WagV4.0
cdromedarius_gene_ensembl	Arabian camel genes (CamDro2)	CamDro2
celegans_gene_ensembl	Caenorhabditis elegans (Nematode, N2) genes (WBcel235)	WBcel235
cgchok1gshd_gene_ensembl	Chinese hamster CHOK1GS genes (CHOK1GS_HDv1)	CHOK1GS_HDv1
cgobio_gene_ensembl	Channel bull blenny genes (fCotGob3.1)	fCotGob3.1
charengus_gene_ensembl	Atlantic herring genes (Ch_v2.0.2v2)	Ch_v2.0.2v2
chircus_gene_ensembl	Goat genes (ARS1)	ARS1
choffmanni_gene_ensembl	Sloth genes (choHof1)	choHof1
chyarkandensis_gene_ensembl	Yarkand deer genes (CEY_v1)	CEY_v1
cimitator_gene_ensembl	Panamanian white-faced capuchin genes (Cebus_imitator-1.0)	Cebus_imitator-1.0
cintestinalis_gene_ensembl	C.intestinalis genes (KH)	KH
cjacchus_gene_ensembl	White-tufted-ear marmoset genes (mCalJac1.pat.X)	mCalJac1.pat.X
cjaponica_gene_ensembl	Japanese quail genes (Coturnix_japonica_2.0)	Coturnix_japonica_2.0
clanigera_gene_ensembl	Long-tailed chinchilla genes (ChiLan1.0)	ChiLan1.0
cldingo_gene_ensembl	Dingo genes (ASM325472v1)	ASM325472v1
clfamiliaris_gene_ensembl	Dog genes (ROS_Cfam_1.0)	ROS_Cfam_1.0
clumpus_gene_ensembl	Lumpfish genes (fCycLum1.pri)	fCycLum1.pri
cmilii_gene_ensembl	Elephant shark genes (Callorhinchus_milii-6.1.3)	Callorhinchus_milii-6.1.3
cpbellii_gene_ensembl	Painted turtle genes (Chrysemys_picta_bellii-3.0.3)	Chrysemys_picta_bellii-3.0.3
cporcellus_gene_ensembl	Guinea Pig genes (Cavpor3.0)	Cavpor3.0
cporosus_gene_ensembl	Australian saltwater crocodile genes (CroPor_comp1)	CroPor_comp1
csabaeus_gene_ensembl	Vervet-AGM genes (ChlSab1.1)	ChlSab1.1
csavignyi_gene_ensembl	C.savignyi genes (CSAV 2.0)	CSAV 2.0
csemilaevis_gene_ensembl	Tongue sole genes (Cse_v1.0)	Cse_v1.0
csyrichta_gene_ensembl	Tarsier genes (Tarsius_syrichta-2.0.1)	Tarsius_syrichta-2.0.1
cvariegatus_gene_ensembl	Sheepshead minnow genes (C_variegatus-1.0)	C_variegatus-1.0
cwagneri_gene_ensembl	Chacoan peccary genes (CatWag_v2_BIUU_UCD)	CatWag_v2_BIUU_UCD
dclupeoides_gene_ensembl	Denticle herring genes (fDenClu1.2)	fDenClu1.2
dlabrax_gene_ensembl	European seabass genes (dlabrax2021)	dlabrax2021
dleucas_gene_ensembl	Beluga whale genes (ASM228892v3)	ASM228892v3
dmelanogaster_gene_ensembl	Drosophila melanogaster (Fruit fly) genes (BDGP6.46)	BDGP6.46
dnovemcinctus_gene_ensembl	Armadillo genes (Dasnov3.0)	Dasnov3.0
dordii_gene_ensembl	Kangaroo rat genes (Dord_2.0)	Dord_2.0
drerio_gene_ensembl	Zebrafish genes (GRCz11)	GRCz11
easinus_gene_ensembl	Donkey genes (ASM1607732v2)	ASM1607732v2
eburgeri_gene_ensembl	Hagfish genes (Eburgeri_3.2)	Eburgeri_3.2
ecaballus_gene_ensembl	Horse genes (EquCab3.0)	EquCab3.0
ecalabaricus_gene_ensembl	Reedfish genes (fErpCal1.1)	fErpCal1.1
eelectricus_gene_ensembl	Electric eel genes (fEleEle1.pri)	fEleEle1.pri
eeuropaeus_gene_ensembl	Hedgehog genes (eriEur1)	eriEur1
elucius_gene_ensembl	Northern pike genes (fEsoLuc1.pri)	fEsoLuc1.pri
etelfairi_gene_ensembl	Lesser hedgehog tenrec genes (TENREC)	TENREC
falbicollis_gene_ensembl	Collared flycatcher genes (FicAlb1.5)	FicAlb1.5
fcatus_gene_ensembl	Cat genes (Felis_catus_9.0)	Felis_catus_9.0
fheteroclitus_gene_ensembl	Mummichog genes (Fundulus_heteroclitus-3.0.2)	Fundulus_heteroclitus-3.0.2
gaculeatus_gene_ensembl	Stickleback genes (GAculeatus_UGA_version5)	GAculeatus_UGA_version5
gevgoodei_gene_ensembl	Goodes thornscrub tortoise genes (rGopEvg1_v1.p)	rGopEvg1_v1.p
gfortis_gene_ensembl	Medium ground-finch genes (GeoFor_1.0)	GeoFor_1.0
ggallus_gene_ensembl	Chicken genes (bGalGal1.mat.broiler.GRCg7b)	bGalGal1.mat.broiler.GRCg7b
ggorilla_gene_ensembl	Gorilla genes (gorGor4)	gorGor4
gmorhua_gene_ensembl	Atlantic cod genes (gadMor3.0)	gadMor3.0
hburtoni_gene_ensembl	Burton's mouthbrooder genes (AstBur1.0)	AstBur1.0
hcomes_gene_ensembl	Tiger tail seahorse genes (H_comes_QL1_v1)	H_comes_QL1_v1
hgfemale_gene_ensembl	Naked mole-rat female genes (Naked_mole-rat_maternal)	Naked_mole-rat_maternal
hhucho_gene_ensembl	Huchen genes (ASM331708v1)	ASM331708v1
hsapiens_gene_ensembl	Human genes (GRCh38.p14)	GRCh38.p14
ipunctatus_gene_ensembl	Channel catfish genes (ASM400665v3)	ASM400665v3
itridecemlineatus_gene_ensembl	Squirrel genes (SpeTri2.0)	SpeTri2.0
jjaculus_gene_ensembl	Lesser Egyptian jerboa genes (JacJac1.0)	JacJac1.0
kmarmoratus_gene_ensembl	Mangrove rivulus genes (ASM164957v1)	ASM164957v1
lafricana_gene_ensembl	Elephant genes (Loxafr3.0)	Loxafr3.0
lbergylta_gene_ensembl	Ballan wrasse genes (BallGen_V1)	BallGen_V1
lcalcarifer_gene_ensembl	Barramundi perch genes (ASB_HGAPassembly_v1)	ASB_HGAPassembly_v1
lchalumnae_gene_ensembl	Coelacanth genes (LatCha1)	LatCha1
lcrocea_gene_ensembl	Large yellow croaker genes (L_crocea_2.0)	L_crocea_2.0
llaticaudata_gene_ensembl	Blue-ringed sea krait genes (latLat_1.0)	latLat_1.0
lleishanense_gene_ensembl	Leishan spiny toad genes (ASM966780v1)	ASM966780v1
loculatus_gene_ensembl	Spotted gar genes (LepOcu1)	LepOcu1
marmatus_gene_ensembl	Zig-zag eel genes (fMasArm1.2)	fMasArm1.2
mauratus_gene_ensembl	Golden Hamster genes (MesAur1.0)	MesAur1.0
mcaroli_gene_ensembl	Ryukyu mouse genes (CAROLI_EIJ_v1.1)	CAROLI_EIJ_v1.1
mdomestica_gene_ensembl	Opossum genes (ASM229v1)	ASM229v1
mfascicularis_gene_ensembl	Crab-eating macaque genes (Macaca_fascicularis_6.0)	Macaca_fascicularis_6.0
mgallopavo_gene_ensembl	Turkey genes (Turkey_5.1)	Turkey_5.1
mleucophaeus_gene_ensembl	Drill genes (Mleu.le_1.0)	Mleu.le_1.0
mlucifugus_gene_ensembl	Microbat genes (Myoluc2.0)	Myoluc2.0
mmmarmota_gene_ensembl	Alpine marmot genes (marMar2.1)	marMar2.1
mmonoceros_gene_ensembl	Narwhal genes (NGI_Narwhal_1)	NGI_Narwhal_1
mmoschiferus_gene_ensembl	Siberian musk deer genes (MosMos_v2_BIUU_UCD)	MosMos_v2_BIUU_UCD
mmulatta_gene_ensembl	Macaque genes (Mmul_10)	Mmul_10
mmurdjan_gene_ensembl	Pinecone soldierfish genes (fMyrMur1.1)	fMyrMur1.1
mmurinus_gene_ensembl	Mouse Lemur genes (Mmur_3.0)	Mmur_3.0
mmusculus_gene_ensembl	Mouse genes (GRCm39)	GRCm39
mnemestrina_gene_ensembl	Pig-tailed macaque genes (Mnem_1.0)	Mnem_1.0
mochrogaster_gene_ensembl	Prairie vole genes (MicOch1.0)	MicOch1.0
mpahari_gene_ensembl	Shrew mouse genes (PAHARI_EIJ_v1.1)	PAHARI_EIJ_v1.1
mpfuro_gene_ensembl	Ferret genes (MusPutFur1.0)	MusPutFur1.0
mspicilegus_gene_ensembl	Steppe mouse genes (MUSP714)	MUSP714
mspretus_gene_ensembl	Algerian mouse genes (SPRET_EiJ_v1)	SPRET_EiJ_v1
mzebra_gene_ensembl	Zebra mbuna genes (M_zebra_UMD2a)	M_zebra_UMD2a
nbrichardi_gene_ensembl	Lyretail cichlid genes (NeoBri1.0)	NeoBri1.0
neugenii_gene_ensembl	Wallaby genes (Meug_1.0)	Meug_1.0
nfurzeri_gene_ensembl	Turquoise killifish genes (Nfu_20140520)	Nfu_20140520
ngalili_gene_ensembl	Upper Galilee mountains blind mole rat genes (S.galili_v1.0)	S.galili_v1.0
nleucogenys_gene_ensembl	Gibbon genes (Nleu_3.0)	Nleu_3.0
nnaja_gene_ensembl	Indian cobra genes (Nana_v5)	Nana_v5
nscutatus_gene_ensembl	Mainland tiger snake genes (TS10Xv2-PRI)	TS10Xv2-PRI
nvison_gene_ensembl	American mink genes (NNQGG.v01)	NNQGG.v01
oanatinus_gene_ensembl	Platypus genes (mOrnAna1.p.v1)	mOrnAna1.p.v1
oarambouillet_gene_ensembl	Sheep genes (ARS-UI_Ramb_v2.0)	ARS-UI_Ramb_v2.0
oaries_gene_ensembl	Sheep (texel) genes (Oar_v3.1)	Oar_v3.1
ocuniculus_gene_ensembl	Rabbit genes (OryCun2.0)	OryCun2.0
odegus_gene_ensembl	Degu genes (OctDeg1.0)	OctDeg1.0
ogarnettii_gene_ensembl	Bushbaby genes (OtoGar3)	OtoGar3
ojavanicus_gene_ensembl	Javanese ricefish genes (OJAV_1.1)	OJAV_1.1
okisutch_gene_ensembl	Coho salmon genes (Okis_V2)	Okis_V2
olatipes_gene_ensembl	Japanese medaka HdrR genes (ASM223467v1)	ASM223467v1
omelastigma_gene_ensembl	Indian medaka genes (Om_v0.7.RACA)	Om_v0.7.RACA
omykiss_gene_ensembl	Rainbow trout genes (USDA_OmykA_1.1)	USDA_OmykA_1.1
oniloticus_gene_ensembl	Nile tilapia genes (O_niloticus_UMD_NMBU)	O_niloticus_UMD_NMBU
oprinceps_gene_ensembl	Pika genes (OchPri2.0-Ens)	OchPri2.0-Ens
osinensis_gene_ensembl	Chinese medaka genes (ASM858656v1)	ASM858656v1
otshawytscha_gene_ensembl	Chinook salmon genes (Otsh_v2.0)	Otsh_v2.0
pabelii_gene_ensembl	Sumatran orangutan genes (Susie_PABv2)	Susie_PABv2
panubis_gene_ensembl	Olive baboon genes (Panubis1.0)	Panubis1.0
pcapensis_gene_ensembl	Hyrax genes (proCap1)	proCap1
pcatodon_gene_ensembl	Sperm whale genes (ASM283717v2)	ASM283717v2
pcinereus_gene_ensembl	Koala genes (phaCin_unsw_v4.1)	phaCin_unsw_v4.1
pcoquereli_gene_ensembl	Coquerel's sifaka genes (Pcoq_1.0)	Pcoq_1.0
pformosa_gene_ensembl	Amazon molly genes (Poecilia_formosa-5.1.2)	Poecilia_formosa-5.1.2
pkingsleyae_gene_ensembl	Paramormyrops kingsleyae genes (PKINGS_0.1)	PKINGS_0.1
platipinna_gene_ensembl	Sailfin molly genes (P_latipinna-1.0)	P_latipinna-1.0
pleo_gene_ensembl	Lion genes (PanLeo1.0)	PanLeo1.0
pmajor_gene_ensembl	Great Tit genes (Parus_major1.1)	Parus_major1.1
pmarinus_gene_ensembl	Lamprey genes (Pmarinus_7.0)	Pmarinus_7.0
pmbairdii_gene_ensembl	Northern American deer mouse genes (HU_Pman_2.1)	HU_Pman_2.1
pmuralis_gene_ensembl	Common wall lizard genes (PodMur_1.0)	PodMur_1.0
pnattereri_gene_ensembl	Red-bellied piranha genes (fPygNat1.pri)	fPygNat1.pri
pnyererei_gene_ensembl	Makobe Island cichlid genes (PunNye1.0)	PunNye1.0
ppaniscus_gene_ensembl	Bonobo genes (panpan1.1)	panpan1.1
ppardus_gene_ensembl	Leopard genes (PanPar1.0)	PanPar1.0
preticulata_gene_ensembl	Guppy genes (Guppy_female_1.0_MT)	Guppy_female_1.0_MT
psimus_gene_ensembl	Greater bamboo lemur genes (Prosim_1.0)	Prosim_1.0
psinensis_gene_ensembl	Chinese softshell turtle genes (PelSin_1.0)	PelSin_1.0
psinus_gene_ensembl	Vaquita genes (mPhoSin1.pri)	mPhoSin1.pri
ptaltaica_gene_ensembl	Tiger genes (PanTig1.0)	PanTig1.0
ptextilis_gene_ensembl	Eastern brown snake genes (EBS10Xv2-PRI)	EBS10Xv2-PRI
ptroglodytes_gene_ensembl	Chimpanzee genes (Pan_tro_3.0)	Pan_tro_3.0
pvampyrus_gene_ensembl	Megabat genes (pteVam1)	pteVam1
rbieti_gene_ensembl	Black snub-nosed monkey genes (ASM169854v1)	ASM169854v1
rferrumequinum_gene_ensembl	Greater horseshoe bat genes (mRhiFer1_v1.p)	mRhiFer1_v1.p
rnorvegicus_gene_ensembl	Rat genes (mRatBN7.2)	mRatBN7.2
rroxellana_gene_ensembl	Golden snub-nosed monkey genes (Rrox_v1)	Rrox_v1
saraneus_gene_ensembl	Shrew genes (sorAra1)	sorAra1
saurata_gene_ensembl	Gilthead seabream genes (fSpaAur1.1)	fSpaAur1.1
sbboliviensis_gene_ensembl	Bolivian squirrel monkey genes (SaiBol1.0)	SaiBol1.0
scanaria_gene_ensembl	Common canary genes (SCA1)	SCA1
scaustralis_gene_ensembl	African ostrich genes (ASM69896v1)	ASM69896v1
scerevisiae_gene_ensembl	Saccharomyces cerevisiae genes (R64-1-1)	R64-1-1
sdumerili_gene_ensembl	Greater amberjack genes (Sdu_1.0)	Sdu_1.0
sformosus_gene_ensembl	Asian bonytongue genes (fSclFor1.1)	fSclFor1.1
shabroptila_gene_ensembl	Kakapo genes (bStrHab1_v1.p)	bStrHab1_v1.p
sharrisii_gene_ensembl	Tasmanian devil genes (mSarHar1.11)	mSarHar1.11
sldorsalis_gene_ensembl	Yellowtail amberjack genes (Sedor1)	Sedor1
slucioperca_gene_ensembl	Pike-perch genes (SLUC_FBN_1)	SLUC_FBN_1
smaximus_gene_ensembl	Turbot genes (ASM1334776v1)	ASM1334776v1
smerianae_gene_ensembl	Argentine black and white tegu genes (HLtupMer3)	HLtupMer3
spartitus_gene_ensembl	Bicolor damselfish genes (Stegastes_partitus-1.0.2)	Stegastes_partitus-1.0.2
spunctatus_gene_ensembl	Tuatara genes (ASM311381v1)	ASM311381v1
ssalar_gene_ensembl	Atlantic salmon genes (Ssal_v3.1)	Ssal_v3.1
ssbamei_gene_ensembl	Pig - Bamei genes (Bamei_pig_v1)	Bamei_pig_v1
ssberkshire_gene_ensembl	Pig - Berkshire genes (Berkshire_pig_v1)	Berkshire_pig_v1
sscrofa_gene_ensembl	Pig genes (Sscrofa11.1)	Sscrofa11.1
sshampshire_gene_ensembl	Pig - Hampshire genes (Hampshire_pig_v1)	Hampshire_pig_v1
ssjinhua_gene_ensembl	Pig - Jinhua genes (Jinhua_pig_v1)	Jinhua_pig_v1
sslandrace_gene_ensembl	Pig - Landrace genes (Landrace_pig_v1)	Landrace_pig_v1
sslargewhite_gene_ensembl	Pig - Largewhite genes (Large_White_v1)	Large_White_v1
ssmeishan_gene_ensembl	Pig - Meishan genes (Meishan_pig_v1)	Meishan_pig_v1
sspietrain_gene_ensembl	Pig - Pietrain genes (Pietrain_pig_v1)	Pietrain_pig_v1
ssrongchang_gene_ensembl	Pig - Rongchang genes (Rongchang_pig_v1)	Rongchang_pig_v1
sstibetan_gene_ensembl	Pig - Tibetan genes (Tibetan_Pig_v2)	Tibetan_Pig_v2
ssusmarc_gene_ensembl	Pig USMARC genes (USMARCv1.0)	USMARCv1.0
sswuzhishan_gene_ensembl	Pig - Wuzhishan genes (minipig_v1.0)	minipig_v1.0
strutta_gene_ensembl	Brown trout genes (fSalTru1.1)	fSalTru1.1
svulgaris_gene_ensembl	Eurasian red squirrel genes (mSciVul1.1)	mSciVul1.1
tbelangeri_gene_ensembl	Tree Shrew genes (tupBel1)	tupBel1
tctriunguis_gene_ensembl	Three-toed box turtle genes (T_m_triunguis-2.0)	T_m_triunguis-2.0
tguttata_gene_ensembl	Zebra finch genes (bTaeGut1_v1.p)	bTaeGut1_v1.p
tnigroviridis_gene_ensembl	Tetraodon genes (TETRAODON 8.0)	TETRAODON 8.0
trubripes_gene_ensembl	Fugu genes (fTakRub1.2)	fTakRub1.2
ttruncatus_gene_ensembl	Dolphin genes (turTru1)	turTru1
uamericanus_gene_ensembl	American black bear genes (ASM334442v1)	ASM334442v1
umaritimus_gene_ensembl	Polar bear genes (UrsMar_1.0)	UrsMar_1.0
uparryii_gene_ensembl	Arctic ground squirrel genes (ASM342692v1)	ASM342692v1
vpacos_gene_ensembl	Alpaca genes (vicPac1)	vicPac1
vursinus_gene_ensembl	Common wombat genes (bare-nosed_wombat_genome_assembly)	bare-nosed_wombat_genome_assembly
vvulpes_gene_ensembl	Red fox genes (VulVul2.2)	VulVul2.2
xcouchianus_gene_ensembl	Monterrey platyfish genes (Xiphophorus_couchianus-4.0.1)	Xiphophorus_couchianus-4.0.1
xmaculatus_gene_ensembl	Platyfish genes (X_maculatus-5.0-male)	X_maculatus-5.0-male
xtropicalis_gene_ensembl	Tropical clawed frog genes (UCB_Xtro_10.0)	UCB_Xtro_10.0

A lot of stuff for a lot of species! Perhaps we want to limit it to see which ones are relevant to mouse.

dataset	description	version
mmusculus_gene_ensembl	Mouse genes (GRCm39)	GRCm39

Using biomaRt

Ah so we probably want the dataset called ‘mmusculus_gene_ensembl’!

Using biomaRt

name	description	page
ensembl_gene_id	Gene stable ID	feature_page
ensembl_gene_id_version	Gene stable ID version	feature_page
ensembl_transcript_id	Transcript stable ID	feature_page
ensembl_transcript_id_version	Transcript stable ID version	feature_page
ensembl_peptide_id	Protein stable ID	feature_page
ensembl_peptide_id_version	Protein stable ID version	feature_page
ensembl_exon_id	Exon stable ID	feature_page
description	Gene description	feature_page
chromosome_name	Chromosome/scaffold name	feature_page
start_position	Gene start (bp)	feature_page
end_position	Gene end (bp)	feature_page
strand	Strand	feature_page
band	Karyotype band	feature_page
transcript_start	Transcript start (bp)	feature_page
transcript_end	Transcript end (bp)	feature_page
transcription_start_site	Transcription start site (TSS)	feature_page
transcript_length	Transcript length (including UTRs and CDS)	feature_page
transcript_tsl	Transcript support level (TSL)	feature_page
transcript_gencode_basic	GENCODE basic annotation	feature_page
transcript_appris	APPRIS annotation	feature_page

Using biomaRt

So there are 2885 rows of goodies about the mouse genome and its relationship to many other genomes. However, you can probably see that the ones that are most useful to us right now are right at the top: ‘ensembl_transcript_id’ and ‘ensembl_gene_id’. We can use those attributes in our mart to make a table relating genes and transcripts.

I’m going to through one more attribute in: external_gene_name. Those are usually more informative than ensembl IDs.

ensembl_transcript_id	ensembl_gene_id	external_gene_name
ENSMUST00000082387	ENSMUSG00000064336	mt-Tf
ENSMUST00000082388	ENSMUSG00000064337	mt-Rnr1
ENSMUST00000082389	ENSMUSG00000064338	mt-Tv
ENSMUST00000082390	ENSMUSG00000064339	mt-Rnr2
ENSMUST00000082391	ENSMUSG00000064340	mt-Tl1
ENSMUST00000082392	ENSMUSG00000064341	mt-Nd1
ENSMUST00000082393	ENSMUSG00000064342	mt-Ti
ENSMUST00000082394	ENSMUSG00000064343	mt-Tq
ENSMUST00000082395	ENSMUSG00000064344	mt-Tm
ENSMUST00000082396	ENSMUSG00000064345	mt-Nd2
ENSMUST00000082397	ENSMUSG00000064346	mt-Tw
ENSMUST00000082398	ENSMUSG00000064347	mt-Ta
ENSMUST00000082399	ENSMUSG00000064348	mt-Tn
ENSMUST00000082400	ENSMUSG00000064349	mt-Tc
ENSMUST00000082401	ENSMUSG00000064350	mt-Ty
ENSMUST00000082402	ENSMUSG00000064351	mt-Co1
ENSMUST00000082403	ENSMUSG00000064352	mt-Ts1
ENSMUST00000082404	ENSMUSG00000064353	mt-Td
ENSMUST00000082405	ENSMUSG00000064354	mt-Co2
ENSMUST00000082406	ENSMUSG00000064355	mt-Tk

Using biomaRt

Alright this looks good! We are going to split this into two tables. One that contains transcript ID and gene ID, and the other that contains gene ID and gene name.

Getting gene level expression data with `tximport`

Now that we have our table relating transcripts and genes, we can give it to tximport to have it calculate gene-level expression data from our transcript-level expression data.

First, we have to tell it where the salmon quantification files (the quant.sf.gz files) are. Here’s what our directory structure that contains these files looks like:

Gene expression data with `tximport`

You can see that we got a list of sample names and the absolute path to each sample’s quantification file.

Now we are ready to run tximport

tximport is going to want paths to all the quantification files (salm_dirs) and a table that relates transcripts to genes (t2g). Luckily, we happen to have those exact two things.

Gene expression data with `tximport`

Notice how we chose lengthscaledTPM for our abundance measurement. This is going to give us TPM values (transcripts per million) for expression in the $abundance slot. Let’s check out what we have now.

ensembl_gene_id	DIV0.Rep1	DIV0.Rep2	DIV0.Rep3	DIV1.Rep1	DIV1.Rep2	DIV1.Rep3	DIV1.Rep4	DIV16.Rep1	DIV16.Rep2	DIV16.Rep3	DIV16.Rep4	DIV21.Rep1	DIV21.Rep2	DIV21.Rep3	DIV21.Rep4	DIV28.Rep1	DIV28.Rep2	DIV28.Rep3	DIV28.Rep4	DIV7.Rep1	DIV7.Rep2	DIV7.Rep3	DIV7.Rep4	DIVminus4.Rep1	DIVminus4.Rep2	DIVminus4.Rep3	DIVminus8.Rep1	DIVminus8.Rep2	DIVminus8.Rep3	DIVminus8.Rep4
ENSMUSG00000000001	148.004511	145.539529	164.680183	142.713005	138.042672	137.027499	123.894516	41.689798	41.554126	41.002841	39.522314	28.162815	29.694158	29.333682	27.956845	17.960041	17.979483	18.102228	19.590551	76.088940	76.669016	79.505738	76.038479	106.634101	106.395094	109.997598	92.799612	93.563524	95.500337	101.951963
ENSMUSG00000000003	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
ENSMUSG00000000028	34.180035	36.215924	39.486489	16.894844	15.391383	18.171282	16.654062	1.067147	1.270428	1.109391	1.136970	1.939558	1.003108	1.245186	1.086912	1.303824	0.935135	0.864451	0.862209	6.582942	6.752942	5.547107	5.510259	45.614077	47.851930	47.516125	59.718765	57.981259	57.935839	41.663343
ENSMUSG00000000031	7.213808	6.431925	13.253322	3.301713	2.539161	3.591509	2.798164	93.312776	56.314466	62.958855	68.565195	32.037271	37.465959	40.581328	24.780757	67.818115	64.333586	30.408290	60.198406	151.244212	137.227925	169.615059	150.599359	0.914481	1.402704	1.317939	0.177770	0.215726	0.320948	36.425779
ENSMUSG00000000037	5.877237	5.001733	7.579149	2.861842	3.176129	2.457556	3.591170	0.319281	0.708167	0.578467	0.513434	0.186198	0.260270	0.258075	0.585622	0.615895	0.133394	0.197050	1.005331	2.750903	1.827190	2.446732	2.894751	2.196733	2.309187	2.213591	3.187886	2.507155	2.624140	2.281871
ENSMUSG00000000049	0.000000	0.046208	0.000000	0.000000	0.000000	0.000000	0.411807	0.213589	0.118890	1.024154	0.628760	0.303218	0.449928	0.262019	0.105047	0.567535	0.649720	0.515246	0.360877	0.188254	0.050654	0.277837	0.189814	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
ENSMUSG00000000056	31.553583	28.610456	19.741875	59.046908	51.324548	54.015893	43.897451	36.575044	47.171198	41.459018	38.403920	55.170239	37.441141	44.043899	45.852392	38.906362	40.362321	44.933634	42.356811	41.828967	46.812515	44.172289	46.834670	14.509458	18.656028	16.335093	10.671419	10.666954	12.386288	8.077939
ENSMUSG00000000058	0.420304	0.507728	0.602757	0.357594	0.504091	0.498006	0.565744	0.931596	0.632998	0.606530	0.503290	0.292719	0.224968	0.401341	0.428359	0.490118	0.200389	0.038911	0.399079	0.194399	0.102993	0.063154	0.130644	0.163821	0.162316	0.251082	0.693347	0.201366	0.527565	0.482299
ENSMUSG00000000078	52.604855	64.802703	76.892914	60.446343	60.861211	61.059568	50.159693	32.474649	28.287563	27.391212	29.511742	19.632508	19.942257	20.182975	19.363735	16.845197	16.128839	18.828771	15.732413	52.939918	52.709453	54.334181	50.804497	51.427093	49.900737	53.274561	19.669879	19.472875	20.216492	21.591729
ENSMUSG00000000085	60.617248	59.182914	61.623794	72.657499	71.506733	75.816856	63.507273	36.498680	31.450681	36.983697	34.558348	31.893252	35.692047	31.095232	34.751542	36.339896	32.474173	38.456832	39.182603	52.494911	43.715591	45.720249	46.351620	9.949776	11.703136	9.320286	13.843137	11.699411	13.486194	10.366294
ENSMUSG00000000088	372.546920	364.764405	381.963748	389.617591	388.204349	413.080094	533.720843	882.254461	918.864762	881.567068	952.656933	1064.561469	1018.564149	1023.266589	935.493316	944.628320	1102.492902	1011.428801	993.369153	703.366663	749.229000	711.055019	744.360315	602.694088	539.095644	568.529023	715.919453	723.040511	713.534783	795.564123
ENSMUSG00000000093	18.748818	26.367729	28.740489	18.171262	17.023064	19.681817	21.696499	1.748123	1.159551	1.244226	1.286737	1.859175	1.536283	1.490581	1.446475	3.138448	3.362518	3.197714	3.445495	1.960168	1.812696	2.626110	2.401155	1.641559	1.120127	1.520160	0.123672	0.064585	0.118030	0.062646
ENSMUSG00000000094	0.114764	0.197628	0.237980	0.067150	0.050183	0.025762	0.000000	0.000000	0.000000	0.020946	0.000000	0.023273	0.000000	0.020613	0.020391	0.000000	0.000000	0.000000	0.000000	0.253309	0.359092	0.191826	0.280499	0.644665	0.508711	0.638906	1.610289	2.043695	1.513517	1.061169
ENSMUSG00000000103	0.000000	0.000000	0.019350	0.019932	0.000000	0.000000	0.000000	0.024280	0.000000	0.000000	0.027595	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.022570	0.020990	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.049946
ENSMUSG00000000120	18.187330	17.621063	16.764680	65.095814	63.742121	70.050947	65.437767	7.138900	6.728019	6.471314	7.243701	3.043655	3.779647	4.010854	3.292345	1.055312	1.336741	1.401802	1.323000	7.879629	8.472306	7.826161	8.344422	1.302789	1.494076	1.100786	2.720008	2.666598	3.080750	7.171139
ENSMUSG00000000125	9.426741	9.885487	5.404755	2.614957	1.888344	2.419112	2.543413	1.836920	2.186343	2.081065	2.033197	1.815480	1.714785	1.591194	1.707502	1.388731	1.150249	1.311930	1.489738	2.613648	2.668347	2.770951	2.336912	1.482033	1.217406	1.473157	0.168666	0.267824	0.162984	0.439311
ENSMUSG00000000126	1.215801	0.746274	0.814387	0.893695	0.726967	0.795409	0.811422	1.698931	0.816782	1.232844	1.175287	0.776041	0.671299	0.527801	0.856885	0.881084	0.587611	0.480369	0.599140	1.920124	1.992736	2.643161	2.442442	1.243716	0.787857	1.154595	1.480120	1.685512	1.662030	1.757216
ENSMUSG00000000127	13.113348	14.260035	14.074681	16.162555	17.427828	17.565289	11.863558	12.271221	14.241964	13.551247	14.347367	12.813797	12.260097	11.685471	12.283357	8.849005	8.757533	8.775956	9.371477	16.306853	15.553249	15.896542	15.505971	11.131771	11.417468	12.892073	12.916885	12.422329	12.138589	11.783150
ENSMUSG00000000131	55.485697	54.448611	62.549063	71.704310	71.301569	71.997625	85.422041	53.653944	54.569370	54.041498	53.203853	56.560171	55.599344	57.503726	56.142399	56.062512	54.730314	56.114317	55.657729	64.772660	60.737488	64.559072	65.908557	73.304601	82.616101	81.113216	72.032652	72.128976	70.421851	64.545350
ENSMUSG00000000134	33.480962	36.683455	38.983454	39.609181	45.133664	47.577079	38.693647	19.425951	20.279171	18.258503	19.466904	20.759852	19.514517	20.453484	23.945999	24.145265	23.148770	24.341792	20.775309	26.254515	25.201055	25.298568	24.448624	70.194375	70.469434	72.039951	62.710030	65.426174	62.960782	52.892113
ENSMUSG00000000142	34.556890	33.051268	31.249676	26.910897	25.255541	26.237510	24.678957	4.314812	3.851946	5.201375	4.297327	3.944854	3.709407	4.698792	4.180144	4.008057	4.408832	5.413859	4.183791	12.235464	12.443578	13.543412	12.406199	5.611935	4.639913	5.140094	3.517411	3.338406	3.652103	5.866998
ENSMUSG00000000148	12.975017	12.265602	13.479207	15.929561	15.998245	15.251781	16.439377	11.187122	11.765327	12.255481	11.805331	9.727571	10.811203	10.842452	9.828969	9.057784	8.261829	8.418036	8.912263	13.212425	11.883479	13.518251	13.482611	10.102323	10.474313	9.593967	7.725926	8.271340	8.221738	8.493223
ENSMUSG00000000149	62.786926	60.575599	64.868495	81.035249	77.612600	80.007932	79.501053	19.769167	19.068923	18.725772	19.826181	17.829466	16.300954	17.513712	15.040879	14.602690	13.900191	14.489153	13.408245	37.005858	35.055680	36.987634	36.668932	36.844849	38.723659	38.975206	27.466288	29.294849	29.622090	30.355845
ENSMUSG00000000154	0.333055	0.115844	0.432376	0.325385	0.332173	0.555266	0.274624	0.315355	0.635393	0.370886	0.117576	0.114637	0.156711	0.361648	0.316280	0.277836	0.297838	0.396606	0.245718	0.372255	0.472991	0.428795	0.411684	1.678056	1.611784	1.901671	3.967049	5.087375	4.540421	5.049930
ENSMUSG00000000157	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.019660	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.038957	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
ENSMUSG00000000159	0.000000	0.000000	0.000000	0.000000	0.066624	0.000000	0.000000	0.000000	0.000000	0.120014	0.192008	0.000000	0.069363	0.465487	0.133256	0.105527	0.609742	0.232339	0.000000	0.000000	0.000000	0.000000	0.034548	0.047125	0.116238	0.039351	0.258603	0.235119	0.181094	0.033496
ENSMUSG00000000167	7.167658	7.111275	4.562918	4.611452	5.165249	3.980233	5.810633	3.014607	2.165885	1.420868	2.402707	1.174127	1.731610	2.442734	1.718371	1.983554	1.487518	0.867104	1.640726	2.877345	4.086346	2.980909	2.788065	2.412113	1.093266	2.134765	3.075103	2.544267	4.048963	1.194254
ENSMUSG00000000168	46.898364	51.544129	50.803506	46.728899	46.416810	43.668901	42.770089	135.597979	142.174183	140.466776	155.267598	176.558772	177.449941	173.533809	177.441385	140.085372	135.955772	149.522280	139.289347	77.093701	65.875673	72.031057	69.113490	65.284363	59.589716	67.122029	66.846252	65.352465	66.011315	60.027737
ENSMUSG00000000171	183.510028	191.339737	185.366419	146.485489	136.055305	134.774362	144.419003	250.624457	258.445965	254.599788	264.936389	297.223823	290.955520	287.995065	284.714197	228.171116	236.310898	224.511224	221.322408	180.849179	173.186896	175.786723	175.691677	147.487631	156.118401	153.095971	219.561253	219.763695	210.634119	196.601985
ENSMUSG00000000182	0.103193	0.032365	0.029772	0.000000	0.000000	0.000000	0.000000	0.109766	0.257429	0.112760	0.211044	0.082172	0.037902	0.110168	0.111682	0.380279	0.139486	0.081007	0.246593	0.169495	0.356141	0.264438	0.269860	0.040837	0.041344	0.036939	0.000000	0.030240	0.000000	0.203803
ENSMUSG00000000183	0.000000	0.000000	0.010340	0.000000	0.010401	0.016126	0.000000	0.000000	0.000000	0.000000	0.015047	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.012499	0.000000	0.000000	0.000000	0.000000	0.012861	0.000000	0.000000	0.000000	0.020984
ENSMUSG00000000184	142.747177	147.826378	149.197724	66.761615	63.335841	65.070470	48.113890	21.348042	17.384228	15.935255	15.653769	16.292530	14.504585	17.797905	15.544638	17.701942	13.309930	15.605609	17.667904	28.101151	33.067019	38.044163	34.866641	7.046037	5.154655	5.428973	9.002990	11.536859	8.437719	2.867368
ENSMUSG00000000194	36.103417	38.818908	42.800890	36.817530	38.667199	36.778395	26.187279	33.002131	37.727627	36.606405	37.474231	36.703396	34.536615	36.214178	34.713799	33.020334	26.550992	28.899042	31.118602	34.300556	32.621947	33.363838	34.012681	30.084083	26.911440	26.198168	28.291703	30.043730	28.726287	29.337298
ENSMUSG00000000197	0.724701	0.864508	0.538785	1.073368	1.074458	0.792837	0.844630	25.556904	27.172721	27.223684	27.307364	23.469370	25.193771	24.809004	25.595933	22.134882	17.538889	21.238627	22.714445	11.528675	11.227085	11.667388	11.717720	0.717834	0.544315	0.843883	0.333168	0.479806	0.295036	0.135287
ENSMUSG00000000202	46.873410	38.860603	20.595529	21.277139	19.304139	22.041307	20.637502	1.998099	2.664860	2.147623	1.671951	0.707986	1.171352	2.805967	0.762410	1.275708	0.275591	0.391964	0.762712	3.703893	4.510447	4.937764	4.553973	1.186139	1.366082	0.695439	1.176641	2.441639	0.983961	3.130113
ENSMUSG00000000204	0.000000	0.077812	0.000000	0.036758	0.087585	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.043964	0.000000	0.000000	0.000000	0.018414	0.000000	0.000000	0.000000	0.038316	0.000000	0.000000	0.050157	0.000000	0.041803	0.000000	0.040884	0.152281
ENSMUSG00000000214	0.316821	0.376734	0.580427	0.074393	0.077525	0.232658	0.047076	0.000000	0.000000	0.000000	0.195838	0.050855	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.059494	0.028719	0.000000	0.080425	0.133600	0.051463	0.000000	0.062615	0.191165	0.206326	0.128695	0.070217
ENSMUSG00000000215	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.201293	0.000000	0.000000	0.079614	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.125601	0.000000	0.060125	0.339217	0.000000	0.526094	0.000000	0.000000	0.000000	0.000000	0.000000
ENSMUSG00000000216	0.055074	0.051534	0.015706	0.000000	0.031705	0.000000	0.000000	0.039904	0.022940	0.019738	0.000000	0.000000	0.020080	0.000000	0.000000	0.000000	0.018491	0.000000	0.038279	0.017680	0.000000	0.017147	0.017771	0.022180	0.000000	0.000000	0.000000	0.065627	0.033124	0.015572
ENSMUSG00000000223	1.490207	1.396739	1.636890	1.736106	1.971587	1.840778	1.365031	2.454484	2.567922	2.499309	2.498017	2.092696	1.983752	2.066465	1.717463	2.653865	1.954384	1.841953	2.723702	2.868183	2.988016	3.715324	3.097974	0.185306	0.237818	0.133284	0.229376	0.237803	0.222297	0.190730
ENSMUSG00000000244	0.083692	0.054689	0.130041	0.090199	0.193026	0.120789	0.083430	0.218939	0.038914	0.099444	0.015106	0.000000	0.000000	0.013459	0.000000	0.154390	0.067568	0.270073	0.127634	0.108335	0.113842	0.088430	0.052551	0.638876	0.595938	0.471887	0.533838	0.596377	0.453464	0.350619
ENSMUSG00000000247	3.581850	3.499372	3.528227	2.710486	3.101592	2.767801	2.565610	1.225779	0.659762	1.060458	0.978868	0.964519	0.809062	0.716354	0.546474	0.468901	0.592905	0.252756	0.383217	9.739157	11.367096	12.204956	9.050891	1.025635	1.058490	0.835800	0.563193	0.906406	0.423688	0.336354
ENSMUSG00000000248	0.000000	0.307460	0.275004	0.000000	0.000000	0.037225	0.000000	0.000000	0.000000	0.000000	0.000000	0.726755	0.000000	0.000000	0.000000	0.000000	0.000000	0.032926	0.094631	0.027475	0.000000	0.264993	0.027614	0.000000	0.035735	0.000000	0.000000	0.000000	0.000000	0.000000
ENSMUSG00000000253	5.002777	4.525335	4.179621	5.048770	4.947710	4.666987	4.505384	23.658215	24.517931	25.751619	26.217720	35.486540	36.812237	35.029205	34.464815	47.977399	51.307180	44.104269	43.421043	18.585852	19.591512	21.542503	19.528295	10.439729	10.042434	12.184047	20.222381	19.607781	19.986088	25.072968
ENSMUSG00000000263	2.981638	1.970632	2.113011	1.671629	1.341986	2.387899	0.558066	43.069555	53.099363	55.631124	57.681339	85.004519	87.323404	79.087829	83.067736	94.715692	60.793459	96.932514	95.108836	3.780805	3.893267	3.860449	4.027712	0.037352	0.037589	0.032486	0.056548	0.079220	0.058177	0.109525
ENSMUSG00000000266	4.207659	3.886346	3.670926	5.967348	5.313591	5.564873	3.908537	18.434130	19.178063	18.998609	19.210836	22.110790	21.232814	21.788512	20.701333	21.877453	18.154458	24.446192	24.809788	14.273999	14.445732	13.674834	13.532920	0.403795	0.353641	0.384367	0.045010	0.072257	0.094877	0.126511
ENSMUSG00000000275	15.337964	15.667032	22.966957	6.220162	6.831170	6.443813	6.385162	1.564104	1.618634	1.642415	1.574792	1.154512	1.511644	1.017289	0.828477	1.030165	0.983866	0.759897	1.061297	3.119988	3.554271	3.629310	3.486077	44.342670	44.274541	45.952267	68.204400	66.534540	67.546482	65.233133
ENSMUSG00000000276	6.707699	6.335487	5.715570	11.190985	11.445650	10.558276	8.857024	23.079722	24.157868	24.513742	24.306458	22.041553	22.124420	21.233254	22.742842	28.209391	23.221775	27.581952	27.546310	20.847164	20.360543	19.165914	19.357199	7.914284	8.048394	8.766309	11.622329	11.302601	11.791095	7.286859
ENSMUSG00000000278	90.617387	96.681487	113.010324	37.035350	35.574970	37.008561	35.981529	53.884610	53.796869	52.070052	55.294585	71.168345	71.952532	69.563003	72.057016	80.149296	90.857958	89.634870	77.269610	38.043631	38.898064	39.409246	38.148701	174.002655	172.278617	171.035085	162.763884	161.548267	159.085443	194.152958
ENSMUSG00000000282	28.352789	28.584065	28.490442	51.536750	48.043432	53.508818	41.034434	34.048141	39.276596	38.718342	42.472822	33.504992	34.355608	38.392612	31.313721	28.164786	28.592877	31.653213	29.223527	46.532177	45.670087	48.495228	50.032227	27.125277	25.598833	24.739579	18.692252	16.789268	15.102999	17.659619

TPM as an expression metric

Alright, not bad!

Let’s stop and think for a minute about what tximport did and the metric we are using (TPM). What does transcripts per million mean? Well, it means pretty much what it sounds like. For every million transcripts in the cell, X of them are this particular transcript. Importantly, this means when this TPM value was calculated from the number of counts a transcript received, this number had to be adjusted for both the total number of counts in the library and the length of a transcript.

If sample A had twice the number of total counts as sample B (i.e. was sequenced twice as deeply), then you would expect every transcript to have approximately twice the number of counts in sample A as it has in sample B. Similarly, if transcript X is twice as long as transcript Y, then you would expect that if they were equally expressed (i.e. the same number of transcript X and transcript Y molecules were present in the sample) that X would have approximately twice the counts that Y does. Working with expression units of TPM incorporates both of these normalizations.

So, if a TPM of X means that for every million transcripts in the sample that X of them were the transcript of interest, then the sum of TPM values across all species should equal one million, right?

Let’s check and see if that’s true.

TPM as an expression metric

[1] 985678.2

[1] 985291.2

[1] 985100.6

OK, not quite one million, but pretty darn close.

This notion that TPMs represent proportions of a whole also leads to another interesting insight into what tximport is doing here. If all transcripts belong to genes, then the TPM for a gene must be the sum of the TPMs of its transcripts. Can we verify that that is true?

TPM as an expression metric

ensembl_transcript_id	DIV0.Rep1	DIV0.Rep2	DIV0.Rep3	DIV1.Rep1	DIV1.Rep2	DIV1.Rep3	DIV1.Rep4	DIV16.Rep1	DIV16.Rep2	DIV16.Rep3	DIV16.Rep4	DIV21.Rep1	DIV21.Rep2	DIV21.Rep3	DIV21.Rep4	DIV28.Rep1	DIV28.Rep2	DIV28.Rep3	DIV28.Rep4	DIV7.Rep1	DIV7.Rep2	DIV7.Rep3	DIV7.Rep4	DIVminus4.Rep1	DIVminus4.Rep2	DIVminus4.Rep3	DIVminus8.Rep1	DIVminus8.Rep2	DIVminus8.Rep3	DIVminus8.Rep4	ensembl_gene_id
ENSMUST00000119854	2.286511	2.475351	2.778492	3.752931	4.290970	3.606815	2.289835	4.753009	6.407206	6.093523	8.073422	4.114218	3.854100	4.504607	4.233634	0.000000	3.062028	2.657892	4.551532	7.708040	5.771621	4.913897	5.733164	0.821894	0.000000	0.791545	0.452225	0.283685	0.253438	0.000000	ENSMUSG00000037736
ENSMUST00000147408	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	ENSMUSG00000052516
ENSMUST00000070080	17.294246	16.642412	20.145840	22.849371	23.111531	23.055949	18.922899	85.742059	92.095921	92.061439	90.117232	80.253664	82.962866	82.837521	83.218155	68.286276	59.179619	65.655743	67.027119	61.090866	60.764495	60.258015	58.545413	22.716225	23.042382	25.299848	18.289952	19.497363	17.597384	22.126199	ENSMUSG00000056124
ENSMUST00000033908	0.277638	0.771701	0.734236	0.835938	1.130580	1.774933	0.262763	0.231432	0.209738	0.302866	0.342360	0.394527	0.196355	0.088545	0.170725	0.088308	0.148182	0.037937	0.361527	0.124041	0.411704	0.216135	0.384765	0.627833	0.412680	0.393967	0.177899	0.097658	0.091573	0.404344	ENSMUSG00000031511
ENSMUST00000174016	8.545914	6.122679	4.978280	13.429257	13.448801	13.789999	9.139241	3.772152	6.493719	6.454788	7.620481	3.964981	3.769810	4.272302	4.711036	4.523138	4.025177	6.303732	5.551957	20.404943	18.886851	26.036222	24.703049	15.936284	17.060844	18.859078	9.623711	12.255722	10.591347	2.846295	ENSMUSG00000036036
ENSMUST00000082868	5.240588	7.435392	21.577246	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	3.157229	8.363363	2.106274	7.631468	4.013754	0.000000	ENSMUSG00000064802
ENSMUST00000187399	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.112263	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	ENSMUSG00000090925
ENSMUST00000159512	1.197789	1.783770	1.592837	3.586005	2.904509	3.830286	1.389749	1.361614	1.953179	1.801017	2.061461	2.504494	1.787667	1.311947	1.985731	2.389696	0.801072	1.741500	1.538186	0.880757	2.094465	0.624705	1.609800	0.248865	0.954488	1.411125	1.029973	0.838168	0.218119	0.135338	ENSMUSG00000029207
ENSMUST00000194793	0.000000	0.068955	0.061078	0.064820	0.000000	0.000000	0.000000	0.078959	0.177114	0.299510	0.089493	0.181351	0.080516	0.310951	0.222825	0.324322	0.070346	0.163243	0.468088	0.067476	0.360582	0.332645	0.068630	0.000000	0.000000	0.000000	0.155780	0.076557	0.000000	0.000000	ENSMUSG00000103497
ENSMUST00000221298	0.077941	0.100916	0.090854	0.309668	0.244962	0.239357	0.299259	0.174734	0.291636	0.193253	0.231279	0.296867	0.117060	0.200170	0.109928	0.266300	0.103878	0.301950	0.395391	0.569643	0.215417	0.462677	0.353049	0.000000	0.000000	0.028680	0.000000	0.000000	0.000000	0.050635	ENSMUSG00000113688
ENSMUST00000138551	0.000000	0.000000	0.119676	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	ENSMUSG00000034164
ENSMUST00000210944	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	ENSMUSG00000110312
ENSMUST00000194146	0.066535	0.185273	0.055779	0.000000	0.000000	0.000000	0.070382	0.000000	0.000000	0.000000	0.078488	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.129887	0.000000	0.000000	0.000000	0.162839	0.143064	0.000000	0.000000	0.058096	0.000000	ENSMUSG00000102375
ENSMUST00000112725	1.270589	1.049486	1.271776	1.294794	1.619005	1.511612	0.893824	0.688353	0.884466	0.635204	0.709094	0.928998	0.846144	0.829175	0.696928	0.989624	1.018670	0.818393	1.100337	1.270363	0.878781	1.156467	1.281586	1.667594	1.633202	1.868757	1.477511	1.274160	1.640671	1.306233	ENSMUSG00000025269
ENSMUST00000181391	2.222061	1.416381	2.497798	3.006018	3.142753	1.571165	3.167750	3.530355	2.965603	4.649879	3.558337	3.465148	3.203126	3.106205	2.862499	0.000000	1.333758	2.039999	3.282519	5.710520	3.937353	6.847600	3.991743	1.733049	0.496494	1.601051	1.270107	1.507055	1.477476	0.000000	ENSMUSG00000030446
ENSMUST00000223290	0.000000	0.000000	0.000000	0.026984	0.000000	0.000000	0.032844	0.033378	0.000000	0.000000	0.000000	0.000000	0.000000	0.033128	0.000000	0.000000	0.031158	0.072496	0.032133	0.000000	0.031972	0.029910	0.000000	0.000000	0.036096	0.000000	0.029574	0.000000	0.000000	0.000000	ENSMUSG00000035954
ENSMUST00000116191	0.000000	0.044746	0.000000	0.000000	0.104528	0.168647	0.046020	0.048814	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.102273	0.042109	0.000000	0.069164	0.000000	0.111104	0.170086	0.000000	0.000000	ENSMUSG00000080078
ENSMUST00000060782	0.267962	0.170311	0.391316	0.446440	0.000000	0.000000	0.389636	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.087233	0.046855	0.000000	0.000000	0.110542	0.000000	0.048208	0.044187	0.083067	0.328807	0.000000	ENSMUSG00000051716
ENSMUST00000141479	0.960027	0.541858	0.000000	0.000000	0.473587	1.080745	0.000000	0.274066	0.000000	0.000000	0.000000	0.000000	0.359059	1.129914	0.300957	0.388096	0.000000	0.000000	0.801308	1.141492	1.285565	0.314952	1.575395	0.000000	0.348874	0.398863	0.000000	0.970236	1.421541	0.351342	ENSMUSG00000049119
ENSMUST00000211096	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	ENSMUSG00000110353

OK so lets look at the expression of ENSMUSG00000020634 in the first sample (DIVminus8.Rep1).

TPM as an expression metric

[1] 46.71414

[1] 46.71414

Basic RNAseq QC

OK now that we’ve got expression values for all genes, we now might want to use these expression values to learn a little bit about our samples. One simple question is > Are replicates similar to each other, or at least more similar to each other than to other samples?

If our data is worth anything at all, we would hope that differences between replicates, which are supposed to be drawn from the same condition, are smaller than differences between samples drawn from different conditions. If that’s not true, it could indicate that one replicate is very different from other replciates (in which case we might want to remove it), or that the data in general is of poor quality.

Another question is:

How similar is each sample to every other sample?

In our timecourse, we might expect that samples drawn from adjacent timepoints might be more similar to each other than samples from more distant timepoints.

Hierarchical clustering

A simple way to think about this is to simply correlate TPM values for genes between samples. For plotting purposes here, let’s plot the log(TPM) of two samples against each other. However, for the actual correlation coefficient we are going to be using the Spearman correlation method, which uses ranks, not absolute values. This means that whether or not you take the log will have no effect on the Spearman correlation coefficient.

# DIVminus8.Rep1 vs DIVminus8.Rep2

# Since we are plotting log TPM values, we need to add a pseudocount to all samples.
# log(0) is a problem.

# Add pseudocounts and take log within ggplot function call
r.spearman <- cor.test(
  tpms$DIVminus8.Rep1, tpms$DIVminus8.Rep2,
  method = "spearman"
)$estimate[[1]]

r.spearman <- signif(r.spearman, 2)

ggplot(tpms, aes(x = log10(DIVminus8.Rep1 + 1e-3), y = log10(DIVminus8.Rep2 + 1e-3))) +
  geom_point() +
  theme_classic() +
  annotate("text", x = 2, y = 0, label = paste0("R = ", r.spearman))

Hierarchical clustering

Hierarchical clustering

With RNAseq data, the variance of a gene’s expression increases as the expression increases. However, using a pseudocount and taking the log of the expression value actually reverses this trend. Now, genes with the lowest expression have the most variance. Why is this a problem? Well, the genes with the most variance are going to be the ones that contribute the most to intersample differences. Ideally, we would like to therefore remove the relationship between expression and variance.

There are transformations, notably rlog and vst, that are made to deal with this, but they are best used when dealing with normalized count data, while here we are dealing with TPMs. We will talk about counts later, but not here.

So, for now, we will take another approach of simply using an expression threshold. Any gene that does not meet our threshold will be excluded from the analysis. Obviously where to set this threshold is a bit subjective. For now, we will set this cutoff at 1 TPM.

Hierarchical clustering

# DIVminus8.Rep1 vs DIVminus8.Rep2

# Since we are plotting log TPM values we will only keep for genes that have expression of at least 1 TPM in both samples
tpms.2samplecor <-
  dplyr::select(tpms, c(ensembl_gene_id, DIVminus8.Rep1, DIVminus8.Rep2)) |>
  filter(DIVminus8.Rep1 >= 1 & DIVminus8.Rep2 >= 1)

# pull the correlation coefficient
r.spearman <- cor.test(
  tpms.2samplecor$DIVminus8.Rep1,
  tpms.2samplecor$DIVminus8.Rep2,
  method = "spearman"
)$estimate[[1]]

# round/set sig digits
r.spearman <- signif(r.spearman, 2)

# plot
ggplot(tpms.2samplecor, aes(x = log10(DIVminus8.Rep1 + 1e-3), y = log10(DIVminus8.Rep2 + 1e-3))) +
  geom_point() +
  theme_classic() +
  annotate("text", x = 2, y = 1, label = paste0("R = ", r.spearman))

Hierarchical clustering

Filtering lowly expressed genes

OK that’s two samples compared to each other, but now we want to see how all samples compare to all other samples. Before we do this we need to decide how to apply our expression cutoff across many samples. Should a gene have to meet the cutoff in only one sample? In all samples? Let’s start by saying it has to meet the cutoff in at least half of the 30 samples.

[1] 50123

[1] 14843

Correlating gene expression values

Now we can use the cor function to calculate pairwise correlations in a .red[matrix] of TPM values.

          DIV0.Rep1 DIV0.Rep2 DIV0.Rep3 DIV1.Rep1 DIV1.Rep2 DIV1.Rep3 DIV1.Rep4
DIV0.Rep1 1.0000000 0.9921992 0.9857075 0.9344949 0.9320084 0.9318094 0.9341577
DIV0.Rep2 0.9921992 1.0000000 0.9895527 0.9290296 0.9269420 0.9270552 0.9301675
DIV0.Rep3 0.9857075 0.9895527 1.0000000 0.9172688 0.9149980 0.9161315 0.9175274
DIV1.Rep1 0.9344949 0.9290296 0.9172688 1.0000000 0.9932976 0.9913331 0.9857878
DIV1.Rep2 0.9320084 0.9269420 0.9149980 0.9932976 1.0000000 0.9911381 0.9851731
DIV1.Rep3 0.9318094 0.9270552 0.9161315 0.9913331 0.9911381 1.0000000 0.9863747
          DIV16.Rep1 DIV16.Rep2 DIV16.Rep3 DIV16.Rep4 DIV21.Rep1 DIV21.Rep2
DIV0.Rep1  0.6674215  0.6543366  0.6549751  0.6557675  0.6430027  0.6437791
DIV0.Rep2  0.6609602  0.6460813  0.6477044  0.6479861  0.6365390  0.6368957
DIV0.Rep3  0.6374190  0.6211743  0.6223612  0.6227213  0.6121500  0.6133398
DIV1.Rep1  0.7547993  0.7419703  0.7432819  0.7436624  0.7234613  0.7235614
DIV1.Rep2  0.7567175  0.7438094  0.7457015  0.7456283  0.7251852  0.7253125
DIV1.Rep3  0.7560271  0.7419976  0.7444082  0.7441764  0.7240899  0.7238994
          DIV21.Rep3 DIV21.Rep4 DIV28.Rep1 DIV28.Rep2 DIV28.Rep3 DIV28.Rep4
DIV0.Rep1  0.6455802  0.6422156  0.6385205  0.6416556  0.6336616  0.6352217
DIV0.Rep2  0.6387583  0.6339425  0.6312565  0.6348317  0.6259258  0.6283085
DIV0.Rep3  0.6138374  0.6091561  0.6080339  0.6112330  0.6026283  0.6047245
DIV1.Rep1  0.7259016  0.7220668  0.7162478  0.7139679  0.7103676  0.7129652
DIV1.Rep2  0.7277639  0.7246262  0.7182024  0.7164955  0.7130165  0.7151487
DIV1.Rep3  0.7259727  0.7221704  0.7165046  0.7140036  0.7110218  0.7129018
          DIV7.Rep1 DIV7.Rep2 DIV7.Rep3 DIV7.Rep4 DIVminus4.Rep1 DIVminus4.Rep2
DIV0.Rep1 0.7738807 0.7788370 0.7768059 0.7753821      0.8313811      0.8285264
DIV0.Rep2 0.7667818 0.7719339 0.7696461 0.7689326      0.8356858      0.8331484
DIV0.Rep3 0.7431343 0.7490053 0.7462518 0.7452457      0.8515066      0.8500630
DIV1.Rep1 0.8501343 0.8518467 0.8519696 0.8513401      0.7365926      0.7348312
DIV1.Rep2 0.8519090 0.8530144 0.8535201 0.8521009      0.7330103      0.7303372
DIV1.Rep3 0.8503243 0.8519535 0.8523296 0.8519729      0.7341659      0.7314245
          DIVminus4.Rep3 DIVminus8.Rep1 DIVminus8.Rep2 DIVminus8.Rep3
DIV0.Rep1      0.8298582      0.7952033      0.7960610      0.7958396
DIV0.Rep2      0.8342924      0.8016864      0.8025006      0.8009213
DIV0.Rep3      0.8516980      0.8181704      0.8185364      0.8187208
DIV1.Rep1      0.7354999      0.6986520      0.6993633      0.6996563
DIV1.Rep2      0.7320226      0.6954396      0.6958901      0.6953545
DIV1.Rep3      0.7338895      0.6967133      0.6970533      0.6969969
          DIVminus8.Rep4
DIV0.Rep1      0.7849066
DIV0.Rep2      0.7897993
DIV0.Rep3      0.8067435
DIV1.Rep1      0.6906591
DIV1.Rep2      0.6866114
DIV1.Rep3      0.6879394

Hierarchical clustering

Now we need to plot these and have similar samples (i.e. those that are highly correlated with each other) be clustered near to each other. We will use pheatmap to do this.

# let's pull information that we want to add as categories
pheatmap(
  tpms.cor,
  annotation_col = metadata[, 2:3],
  fontsize = 7,
  show_colnames = FALSE
)

This looks pretty good! There are two main points to takeaway here. First, all replicates for a given timepoint are clustering with each other. Second, you can kind of derive the order of the timepoints from the clustering. The biggest separation is between early (DIVminus8 to DIV1) and late (DIV7 to DIV28). After that you can then see finer-grained structure.

Hierarchical clustering

PCA analysis

Another way to visualize relationships is using a dimensionality reduction technique called Principal Component Analysis (PCA). Let’s watch this short video. It focuses more on how to interpret them rather than the math behind their creation.

PCA analysis

PCA works best when values are approximately normally distributed, so we will first take the log of our expression values.

With our cutoff as it is now (genes have to have expression of at least 1 TPM in half the samples), it is possible that we will have some 0 values. Taking the log of 0 might cause a problem, so we will add a pseudocount.

PCA analysis

Very similar interpretation as before (heatmap of correlation).

RNAseq QC

Learning Objectives

Rigor & Reproducibility

Overview

Experiment design

QC overview

Goal

Quantifying transcript expression with salmon

Making a transcriptome index

Quantifying reads against this index

Salmon outputs

Moving from transcript quantifications to gene quantifications

Relating genes and transcripts

Using biomaRt

Using biomaRt

Using biomaRt

Using biomaRt

Using biomaRt

Getting gene level expression data with tximport

Gene expression data with tximport

Gene expression data with tximport

Gene expression data with tximport

TPM as an expression metric

TPM as an expression metric

TPM as an expression metric

TPM as an expression metric

Basic RNAseq QC

Hierarchical clustering

Hierarchical clustering

Hierarchical clustering

Hierarchical clustering

Hierarchical clustering

Filtering lowly expressed genes

Correlating gene expression values

Hierarchical clustering

Hierarchical clustering

PCA analysis

PCA analysis

PCA analysis

Quantifying transcript expression with `salmon`

Getting gene level expression data with `tximport`

Gene expression data with `tximport`

Gene expression data with `tximport`

Gene expression data with `tximport`