Skip to contents

Cluster cells based on CDR3 sequences

Usage

cluster_sequences(
  input,
  data_col = "cdr3",
  chain = NULL,
  method = "louvain",
  resolution = 0.5,
  k = 10,
  dist_method = NULL,
  run_umap = TRUE,
  chain_col = global$chain_col,
  prefix = paste0(data_col, "_"),
  return_df = FALSE,
  sep = global$sep,
  ...
)

Arguments

input

Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.

data_col

meta.data column containing sequences to use for calculating Levenshtein distance.

chain

Chain to use for clustering sequences. Cells with more than one of the provided chain will be excluded from the analysis. If NULL, sequences for cells with multiple chains will be concatenated.

method

Method to use for clustering, possible values include:

resolution

Resolution (coarseness) of clusters

k

Number of neighbors for k-nearest neighbors algorithm

dist_method

Method to use for computing distance between sequences. If NULL, distance is calculated for amino acid sequences using the BLOSUM62 substitution matrix and levenshtein distance is calculated for nucleotide sequences. Other possible values include:

  • 'levenshtein'

  • 'hamming'

  • The name of a substitution matrix available from the Biostrings package, e.g. 'BLOSUM62'

run_umap

Should the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction method be performed. This will add UMAP coordinates to the meta.data.

chain_col

meta.data column containing chains for each cell.

prefix

Prefix to add to graph name

return_df

Return results as a data.frame. If FALSE, results will be added to the input object.

sep

Separator used for storing per cell V(D)J data

...

Additional parameters to pass to clustering method

Value

Single cell object or data.frame with clustering results

See also

Examples

# Cluster cells based on CDR3 amino acid sequences and plot results
res <- cluster_sequences(
  vdj_sce,
  data_col   = "cdr3",
  chain      = "IGK",
  resolution = c(0.5, 1)
)

plot_scatter(
  res,
  x = "cdr3_UMAP_1",
  y = "cdr3_UMAP_2",
  data_col = "cdr3_cluster_0.5"
)
#> Warning: Removed 141 rows containing missing values