Some comments on using clustifyr
with other scRNA-seq analysis-augmenting packages
clustifyr
is designed to be compatible with most workflows that find distinct cell clusters, even if the cells are overclustered (still with at least 15+ cells per cluster). However, if the data is continuous (such as development transitions), then other alternative approaches may be better suited.
Using all genes works moderately well, but performance is improved in various testing scenarios by some form of feature selection, such as var.genes in seurat
, or M3Drop
, or simply highly variable genes. In general, we see satifactory results with 500-1000 variable genes, although this number is dependent upon the biology in the target dataset. Note that SeuratV3
now always stores 2000 variable genes, which may be too many.
Imputation methods such as ALRA
, in attempt to fill in drop-out signal, are not necessary for cluster-level identity assignment. Per-cell assignment is somewhat improved with imputation. Alternatively, clustifyr
offers the rm0
option to treat genes with 0 count as missing instead of low expression, and ignore them.
clustifyr
operates on raw counts and log transformed data. VST, such as implemented by sctransform
is acceptable, but does not appear to give additional benefits. If used, ideally the query and reference matrices should both be transformed.
Massive cell death can potentially lead to RNA contamination of all cells sequenced. Assessment of background contamination, and mitigation actions if needed, is important. Building a background reference by averaging all filtered out cell ids would be one simple way.
Marker genes generated by seurat
and other methods can be converted to gene list matrix form for clustify_list
. Somewhat differently, clustify_nudge
works best with normal feature selection plus a short and nonoverlapping list of markers. Both approaches are handled by settings in matrixize_markers
.
On a related note, in most cases where the significance of marker RPL or RPS genes is unclear, possibly due to normalization issues, matrixize_markers
offers a remove_rp
option. This and similar actions can of course be done manually as well.