clustifyrdata provides 42 external data sets for cell-type assignment with clustifyr and reproducible scripts to build data objects.

Commonly used references:

name desc ntypes ngenes org from_pub
ref_MCA Mouse Cell Atlas 713 8601 mouse from
ref_tabula_muris_drop Tabula Muris (10X) 112 23341 mouse from
ref_tabula_muris_facs Tabula Muris (SmartSeq2) 175 23341 mouse from
ref_mouse.rnaseq Mouse RNA-seq from 28 cell types 28 21214 mouse from
ref_moca_main Mouse Organogenesis Cell Atlas (main cell types) 37 26183 mouse from
ref_immgen Mouse sorted immune cells 253 22134 mouse from
ref_hema_microarray Human hematopoietic cell microarray 38 21246 human from
ref_cortex_dev Human cortex development scRNA-seq 47 56864 human from
ref_pan_indrop Human pancreatic cell scRNA-seq (inDrop) 14 20125 human from
ref_pan_smartseq2 Human pancreatic cell scRNA-seq (SmartSeq2) 12 25525 human from

See the reference page for available data sets, and individual ref download page. Additionally these datasets will be made available as a Bioconductor ExperimentHub (clustifyrdatahub)

Data sets have uniform suffixes: - ref_* : the prebuilt reference expression matrix.

  • *_matrix : single-cell RNA expression matrix.

  • *_avg : average expression caluculated from a single-cell RNA expression matrix.

  • *_meta : metadata from a single-cell RNA-seq experiment.

  • *_vargenes : variable genes used for dimension reduction, determined by Seurat.

  • *_markers : marker genes determined by Seurat.

  • *_M3Drop : variable genes used for dimension reduction as determined by M3Drop.

Installation

N.B.: clustifyrdata is a large data package (nearly 350 Mb uncompressed).

# install.packages("remotes")
remotes::install_github("rnabioco/clustifyrdata")