Skip to contents


This vignette provides detailed examples for quantifying differences in clonal abundance. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.

library(djvdj)
library(Seurat)
library(ggplot2)

# Add V(D)J data to object
vdj_dirs <- c(
  BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"),
  MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj")
)

so <- splen_so |>
  import_vdj(vdj_dirs, define_clonotypes = "cdr3_gene")


Calculating clonal abundance

To quantify clonotype abundance and store the results in the object meta.data, the calc_frequency() function can be used. This will add columns showing the number of occurrences of each clonotype (‘freq’), the percentage of cells sharing the clonotype (‘pct’), and a label that can be used for plotting (‘grp’). By default these calculations will be performed for all cells in the object.

so_vdj <- so |>
  calc_frequency(data_col = "clonotype_id")

To calculate clonotype abundance separately for samples or clusters, the cluster_col argument can be used. To do this just specify the name of the column containing the sample or cluster IDs for each cell.

so_vdj <- so |>
  calc_frequency(
    data_col = "clonotype_id",
    cluster_col = "sample"
  )

When cluster_col is specified, an additional meta.data column (‘shared’) will be added indicating whether the clonotype is shared between multiple clusters.

so_vdj |>
  slot("meta.data") |>
  head(2)
#>                        orig.ident nCount_RNA nFeature_RNA RNA_snn_res.1
#> BL6_AAACGGGGTTCTGTTT-1        BL6        666          341             2
#> BL6_AAAGATGCAACAACCT-1        BL6        308          233             0
#>                        seurat_clusters     UMAP_1    UMAP_2
#> BL6_AAACGGGGTTCTGTTT-1               2 -0.2850037 -2.036348
#> BL6_AAAGATGCAACAACCT-1               0  2.2518005 -1.472473
#>                                           type         r cell_type sample
#> BL6_AAACGGGGTTCTGTTT-1 B cells (B.CD19CONTROL) 0.4712686   B cells  BL6-1
#> BL6_AAAGATGCAACAACCT-1 B cells (B.CD19CONTROL) 0.5435733   B cells  BL6-1
#>                        exact_subclonotype_id chains n_chains        cdr3
#> BL6_AAACGGGGTTCTGTTT-1                    NA   <NA>       NA        <NA>
#> BL6_AAAGATGCAACAACCT-1                     1    IGK        1 CFQGSHVPWTF
#>                                                  cdr3_nt cdr3_length
#> BL6_AAACGGGGTTCTGTTT-1                              <NA>        <NA>
#> BL6_AAAGATGCAACAACCT-1 TGCTTTCAAGGTTCACATGTTCCGTGGACGTTC          11
#>                        cdr3_nt_length    v_gene d_gene j_gene c_gene
#> BL6_AAACGGGGTTCTGTTT-1           <NA>      <NA>   <NA>   <NA>   <NA>
#> BL6_AAAGATGCAACAACCT-1             33 IGKV1-117   None  IGKJ1   IGKC
#>                        isotype reads umis productive full_length paired
#> BL6_AAACGGGGTTCTGTTT-1    <NA>  <NA> <NA>       <NA>        <NA>     NA
#> BL6_AAAGATGCAACAACCT-1    None   352   21       TRUE        TRUE  FALSE
#>                        clonotype_id clonotype_id_freq clonotype_id_pct
#> BL6_AAACGGGGTTCTGTTT-1         <NA>                NA               NA
#> BL6_AAAGATGCAACAACCT-1  clonotype34                 1         1.818182
#>                        clonotype_id_shared clonotype_id_grp
#> BL6_AAACGGGGTTCTGTTT-1                  NA             <NA>
#> BL6_AAAGATGCAACAACCT-1                TRUE                1


Plotting clonal abundance

djvdj includes the plot_clonal_abundance() function to visualize differences in clonotype frequency between samples or clusters. By default this will produce bargraphs. Plot colors can be adjusted using the plot_colors argument.

so |>
  plot_clonal_abundance(
    clonotype_col = "clonotype_id",
    plot_colors = "#3182bd"
  )

Abundance values can be calculated and plotted separately for each sample or cluster using the cluster_col argument. The panel_nrow and panel_scales arguments can be used to add separate scales for each sample or to adjust the number of rows used to arrange plots.

As expected we see that most MD4 B cells share the same clonotype, while BL6 cells have a diverse repertoire.

so |>
  plot_clonal_abundance(
    clonotype_col = "clonotype_id",
    cluster_col   = "orig.ident",
    panel_scales  = "free"
  )

Rank-abundance plots can also be generated by setting the method argument to ‘line’. Most djvdj plotting functions return ggplot objects that can be further modified with ggplot2 functions. Here we log10-transform the y-axis using the ggplot2::scale_y_log10() function.

so |>
  plot_clonal_abundance(
    clonotype_col = "clonotype_id",
    cluster_col   = "orig.ident",
    method        = "line",
    plot_colors   = c(MD4 = "#fec44f", BL6 = "#3182bd")
  ) +
  scale_y_log10()


UMAP projections

By default calc_frequency() will divide clonotypes into groups based on abundance and add a column to the meta.data containing these group labels. Clonotype abundance can be summarized on a UMAP projection by plotting the added ‘grp’ column using the generic plotting function plot_features()

# Create UMAP summarizing samples
mouse_gg <- so |>
  plot_features(
    feature = "orig.ident",
    size = 0.25
  )

# Create UMAP summarizing clonotype abundance
abun_gg <- so |>
  calc_frequency(
    data_col = "clonotype_id",
    cluster_col = "sample"
  ) |>
  plot_features(
    feature = "clonotype_id_grp",
    size = 0.25
  )

mouse_gg + abun_gg

Highly abundant clonotypes can also be specifically labeled on a UMAP projection. To do this, add a new meta.data column with the desired label using the mutate_vdj() function. This function works in a similar manner as dplyr::mutate(), but will specifically modify the object meta.data and allows to the user to parse per-chain information for each cell.

top_gg <- so |>
  mutate_vdj(
    top_clonotype = ifelse(clonotype_id == "clonotype907", clonotype_id, "other")
  ) |>
  plot_features(
    feature     = "top_clonotype",
    size        = 0.25,
    plot_colors = c(other = "#fec44f", clonotype907 = "#3182bd")
  )

mouse_gg + top_gg


Other frequency calculations

In addition to clonotype abundance, calc_frequency() can be used to summarize the frequency of any cell label present in the object. In this example we count the number of cells present for each cell type in each sample.

so_vdj <- so |>
  calc_frequency(
    data_col = "cell_type",
    cluster_col = "sample"
  )

To plot the fraction of cells present for each cell type, we can use the generic plotting function, plot_frequency(). This will create stacked bargraphs summarizing each cell label present in the data_col column. The color of each group can be specified with the plot_colors argument.

so |>
  plot_frequency(
    data_col    = "cell_type",
    cluster_col = "sample",
    plot_colors = c("#3182bd", "#fec44f", "#31a354")
  )

To summarize the number cells present for each cell type, set the units argument to ‘frequency’.

so |>
  plot_frequency(
    data_col    = "cell_type",
    cluster_col = "sample",
    units       = "frequency",
    stack       = FALSE
  )

Frequency plots can also be separated based on an additional grouping variable such as treatment group (e.g. placebo vs drug) or disease status (e.g. healthy vs disease). This will generate boxplots with each point representing a label present in the cluster_col column. In this example we have 3 BL6 and 3 MD4 samples, so there are 3 points shown for each boxplot.

so |>
  plot_frequency(
    data_col    = "cell_type",
    cluster_col = "sample",
    group_col   = "orig.ident",
    plot_colors = c(MD4 = "#fec44f", BL6 = "#3182bd")
  )