Repertoire Diversity

This vignette provides detailed examples for quantifying repertoire diversity. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.

library(djvdj)
library(Seurat)
library(ggplot2)

# Load GEX data
data_dir <- system.file("extdata/splen", package = "djvdj")

gex_dirs <- c(
  BL6 = file.path(data_dir, "BL6_GEX/filtered_feature_bc_matrix"),
  MD4 = file.path(data_dir, "MD4_GEX/filtered_feature_bc_matrix")
)

so <- gex_dirs |>
  Read10X() |>
  CreateSeuratObject() |>
  AddMetaData(splen_meta)

# Add V(D)J data to object
vdj_dirs <- c(
  BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"),
  MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj")
)

so <- so |>
  import_vdj(vdj_dirs, define_clonotypes = "cdr3_gene")

Calculating diversity

To calculate repertoire diversity and store the results in the object meta.data, the calc_diversity() function can be used. This function is designed to specifically work with the R package abdiv. The diversity metric can be selected by passing the name of the function to the method argument. Any alpha diversity function from the abdiv package that takes species counts as input can be used. Be sure to read the documentation for the function you are using to ensure it is appropriate for your analysis.

In this example we are calculating the Shannon entropy for BL6 and MD4 samples.

so_vdj <- so |>
  calc_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    method      = abdiv::shannon
  )

Estimations of species diversity are influenced by sample size. One approach to address this is to equalize the number of cells present in each cluster. The downsample argument will randomly sample cells so each sample being tested has the same number of cells as the smallest cluster. The bootstrapped standard error can also be calculated by setting the number of bootstrap samples with the n_boots argument.

so_vdj <- so |>
  calc_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    downsample  = TRUE,
    n_boots     = 50
  )

Diversity metrics can also be calculated for a specific chain. To do this, the column passed to the data_col argument must contain per-chain data, such as CDR3 amino acid or nucleotide sequences. In this example diversity is calculated based only on heavy chain CDR3 sequences.

so_vdj <- so |>
  calc_diversity(
    data_col    = "cdr3_nt",
    cluster_col = "sample",
    chain       = "IGH"
  )

Plotting diversity

The plot_diversity() function will create plots summarizing repertoire diversity for each sample. A named list of functions can also be passed to plot multiple metrics. Two metrics for measuring diversity are the Simpson index and Shannon entropy. Both of these metrics are influenced by species richness (number of unique sequences) and evenness (relative abundance of sequences). Pielou’s index will specifically measure species evenness. For these metrics, maximally diverse samples will return a value of 1.

As expected, BL6 B cells have a very diverse repertoire, while MD4 cells have a restricted repertoire.

div_fns <- list(
  "simpson" = abdiv::simpson,
  "shannon" = abdiv::shannon,
  "pielou evenness" = abdiv::pielou_e
)

so |>
  plot_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    method      = div_fns
  )

Diversity plots can also be separated based on an additional grouping variable such as treatment group (e.g. pacebo vs drug) or disease status (e.g. healthy vs disease). This will generate boxplots with each point representing a label present in the cluster_col column. In this example we have 3 BL6 and 3 MD4 samples, so there should be 5 points shown for each boxplot.

so |>
  plot_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    group_col   = "orig.ident",
    method      = div_fns
  )

Additional arguments are provided to adjust plot aesthetics. The plot_colors parameter can be used to modify colors, the panel_nrow and panel_scales arguments will adjust the plot scales and number of rows used to arrange plots.

so |>
  plot_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    group_col   = "orig.ident",
    method      = div_fns,
    plot_colors = c(BL6 = "#3182bd", MD4 = "#fec44f"),
    panel_nrow  = 2
  )

plot_diversity() returns a ggplot object that can be modified with ggplot2 functions such as ggplot2::theme(). Plots can be further adjusted by passing aesthetic parameters directly to ggplot2, e.g. alpha, linetype, color, etc.

so |>
  plot_diversity(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    method      = div_fns,
    
    alpha    = 0.5,         # parameters to pass to ggplot2
    linetype = 2,
    color    = "black"    
  ) +
  theme(strip.text = element_text(face = "bold"))

Rarefaction curves

Another approach to ensure differences in sample size are not having an undue influence on diversity results is to plot rarefaction curves. This method involves calculating species diversity for different sized samples generated by randomly downsampling each cluster. By default the bootstrapped 95% confidence interval will also be plotted.

Calculations used to generate rarefaction curves are performed using the iNEXT package. There are three diversity calculations that can be specified with the method argument:

‘richness’, species richness, this is equivalent to the calculation performed by abdiv::richness()
‘shannon’, the exponential of Shannon entropy
‘invsimpson’, the inverse Simpson index, this is equivalent to the calculation performed by abdiv::invsimpson()

so |>
  plot_rarefaction(
    data_col    = "clonotype_id",
    cluster_col = "orig.ident",
    method      = c("richness", "shannon", "invsimpson"),
    plot_colors = c("#3182bd", "#fec44f")
  )

If the 95% confidence interval is not desired, set n_boots to 0. In this example we also plot a separate line for each BL6 and MD4 sample.

so |>
  plot_rarefaction(
    data_col    = "clonotype_id",
    cluster_col = "sample",
    method      = c("richness", "shannon", "invsimpson"),
    n_boots     = 0
  )

Session info

#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.4.4      SeuratObject_4.1.4 Seurat_4.4.0      
#> [4] djvdj_0.1.0       
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3     jsonlite_1.8.7         magrittr_2.0.3        
#>   [4] spatstat.utils_3.0-3   iNEXT_3.0.0            farver_2.1.1          
#>   [7] rmarkdown_2.25         fs_1.6.3               ragg_1.2.6            
#>  [10] vctrs_0.6.4            ROCR_1.0-11            memoise_2.0.1         
#>  [13] spatstat.explore_3.2-5 htmltools_0.5.6.1      sass_0.4.7            
#>  [16] sctransform_0.4.1      parallelly_1.36.0      KernSmooth_2.23-21    
#>  [19] bslib_0.5.1            htmlwidgets_1.6.2      desc_1.4.2            
#>  [22] ica_1.0-3              plyr_1.8.9             plotly_4.10.3         
#>  [25] zoo_1.8-12             cachem_1.0.8           igraph_1.5.1          
#>  [28] mime_0.12              lifecycle_1.0.3        pkgconfig_2.0.3       
#>  [31] Matrix_1.6-1.1         R6_2.5.1               fastmap_1.1.1         
#>  [34] fitdistrplus_1.1-11    future_1.33.0          shiny_1.7.5.1         
#>  [37] digest_0.6.33          colorspace_2.1-0       patchwork_1.1.3       
#>  [40] rprojroot_2.0.3        tensor_1.5             irlba_2.3.5.1         
#>  [43] textshaping_0.3.7      labeling_0.4.3         abdiv_0.2.0           
#>  [46] progressr_0.14.0       fansi_1.0.5            spatstat.sparse_3.0-2 
#>  [49] httr_1.4.7             polyclip_1.10-6        abind_1.4-5           
#>  [52] compiler_4.3.1         bit64_4.0.5            withr_2.5.1           
#>  [55] MASS_7.3-60            tools_4.3.1            lmtest_0.9-40         
#>  [58] httpuv_1.6.12          future.apply_1.11.0    goftest_1.2-3         
#>  [61] glue_1.6.2             nlme_3.1-162           promises_1.2.1        
#>  [64] grid_4.3.1             Rtsne_0.16             cluster_2.1.4         
#>  [67] reshape2_1.4.4         generics_0.1.3         gtable_0.3.4          
#>  [70] spatstat.data_3.0-1    tzdb_0.4.0             tidyr_1.3.0           
#>  [73] data.table_1.14.8      hms_1.1.3              sp_2.1-1              
#>  [76] utf8_1.2.4             spatstat.geom_3.2-7    RcppAnnoy_0.0.21      
#>  [79] ggrepel_0.9.4          RANN_2.6.1             pillar_1.9.0          
#>  [82] stringr_1.5.0          vroom_1.6.4            later_1.3.1           
#>  [85] splines_4.3.1          dplyr_1.1.3            lattice_0.21-8        
#>  [88] bit_4.0.5              survival_3.5-5         deldir_1.0-9          
#>  [91] tidyselect_1.2.0       miniUI_0.1.1.1         pbapply_1.7-2         
#>  [94] knitr_1.44             gridExtra_2.3          scattermore_1.2       
#>  [97] xfun_0.40              matrixStats_1.0.0      stringi_1.7.12        
#> [100] boot_1.3-28.1          lazyeval_0.2.2         yaml_2.3.7            
#> [103] evaluate_0.22          codetools_0.2-19       tibble_3.2.1          
#> [106] cli_3.6.1              uwot_0.1.16            xtable_1.8-4          
#> [109] reticulate_1.34.0      systemfonts_1.0.5      munsell_0.5.0         
#> [112] jquerylib_0.1.4        Rcpp_1.0.11            globals_0.16.2        
#> [115] spatstat.random_3.2-1  png_0.1-8              parallel_4.3.1        
#> [118] ellipsis_0.3.2         pkgdown_2.0.7          readr_2.1.4           
#> [121] listenv_0.9.0          viridisLite_0.4.2      scales_1.2.1          
#> [124] ggridges_0.5.4         crayon_1.5.2           leiden_0.4.3          
#> [127] purrr_1.0.2            rlang_1.1.1            cowplot_1.1.1

October 24 2023

Calculating diversity

Plotting diversity

Rarefaction curves

Session info