Cluster cells based on CDR3 sequences
Usage
cluster_sequences(
input,
data_col = "cdr3",
chain = NULL,
method = "louvain",
resolution = 0.5,
k = 10,
dist_method = NULL,
run_umap = TRUE,
chain_col = global$chain_col,
prefix = paste0(data_col, "_"),
return_df = FALSE,
sep = global$sep,
...
)
Arguments
- input
Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.
- data_col
meta.data column containing sequences to use for calculating Levenshtein distance.
- chain
Chain to use for clustering sequences. Cells with more than one of the provided chain will be excluded from the analysis. If NULL, sequences for cells with multiple chains will be concatenated.
- method
Method to use for clustering, possible values include:
'louvain', multi-level optimization of modality implemented with
igraph::cluster_louvain()
'leiden', the Leiden clustering algorithm implemented with
igraph::cluster_leiden()
- resolution
Resolution (coarseness) of clusters
- k
Number of neighbors for k-nearest neighbors algorithm
- dist_method
Method to use for computing distance between sequences. If NULL, distance is calculated for amino acid sequences using the BLOSUM62 substitution matrix and levenshtein distance is calculated for nucleotide sequences. Other possible values include:
'levenshtein'
'hamming'
The name of a substitution matrix available from the Biostrings package, e.g. 'BLOSUM62'
- run_umap
Should the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction method be performed. This will add UMAP coordinates to the meta.data.
- chain_col
meta.data column containing chains for each cell.
- prefix
Prefix to add to graph name
- return_df
Return results as a data.frame. If FALSE, results will be added to the input object.
- sep
Separator used for storing per cell V(D)J data
- ...
Additional parameters to pass to clustering method
Examples
# Cluster cells based on CDR3 amino acid sequences and plot results
res <- cluster_sequences(
vdj_sce,
data_col = "cdr3",
chain = "IGK",
resolution = c(0.5, 1)
)
plot_scatter(
res,
x = "cdr3_UMAP_1",
y = "cdr3_UMAP_2",
data_col = "cdr3_cluster_0.5"
)
#> Warning: Removed 141 rows containing missing values