This vignette provides detailed examples for quantifying repertoire diversity. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.
library(djvdj)
library(Seurat)
library(ggplot2)
# Load GEX data
data_dir <- system.file("extdata/splen", package = "djvdj")
gex_dirs <- c(
BL6 = file.path(data_dir, "BL6_GEX/filtered_feature_bc_matrix"),
MD4 = file.path(data_dir, "MD4_GEX/filtered_feature_bc_matrix")
)
so <- gex_dirs |>
Read10X() |>
CreateSeuratObject() |>
AddMetaData(splen_meta)
# Add V(D)J data to object
vdj_dirs <- c(
BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"),
MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj")
)
so <- so |>
import_vdj(vdj_dirs, define_clonotypes = "cdr3_gene")
Calculating diversity
To calculate repertoire diversity and store the results in the object
meta.data, the calc_diversity()
function can be used. This
function is designed to specifically work with the R package abdiv. The diversity
metric can be selected by passing the name of the function to the
method
argument. Any alpha diversity function from the
abdiv package that takes species counts as input can be used. Be sure to
read the documentation for the function you are using to ensure it is
appropriate for your analysis.
In this example we are calculating the Shannon entropy for BL6 and MD4 samples.
so_vdj <- so |>
calc_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
method = abdiv::shannon
)
Estimations of species diversity are influenced by sample size. One
approach to address this is to equalize the number of cells present in
each cluster. The downsample
argument will randomly sample
cells so each sample being tested has the same number of cells as the
smallest cluster. The bootstrapped standard error can also be calculated
by setting the number of bootstrap samples with the n_boots
argument.
so_vdj <- so |>
calc_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
downsample = TRUE,
n_boots = 50
)
Diversity metrics can also be calculated for a specific chain. To do
this, the column passed to the data_col
argument must
contain per-chain data, such as CDR3 amino acid or nucleotide sequences.
In this example diversity is calculated based only on heavy chain CDR3
sequences.
so_vdj <- so |>
calc_diversity(
data_col = "cdr3_nt",
cluster_col = "sample",
chain = "IGH"
)
Plotting diversity
The plot_diversity()
function will create plots
summarizing repertoire diversity for each sample. A named list of
functions can also be passed to plot multiple metrics. Two metrics for
measuring diversity are the Simpson index and Shannon entropy. Both of
these metrics are influenced by species richness (number of unique
sequences) and evenness (relative abundance of sequences). Pielou’s
index will specifically measure species evenness. For these metrics,
maximally diverse samples will return a value of 1.
As expected, BL6 B cells have a very diverse repertoire, while MD4 cells have a restricted repertoire.
div_fns <- list(
"simpson" = abdiv::simpson,
"shannon" = abdiv::shannon,
"pielou evenness" = abdiv::pielou_e
)
so |>
plot_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
method = div_fns
)
Diversity plots can also be separated based on an additional grouping
variable such as treatment group (e.g. pacebo vs drug) or disease status
(e.g. healthy vs disease). This will generate boxplots with each point
representing a label present in the cluster_col
column. In
this example we have 3 BL6 and 3 MD4 samples, so there should be 5
points shown for each boxplot.
so |>
plot_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
group_col = "orig.ident",
method = div_fns
)
Additional arguments are provided to adjust plot aesthetics. The
plot_colors
parameter can be used to modify colors, the
panel_nrow
and panel_scales
arguments will
adjust the plot scales and number of rows used to arrange plots.
so |>
plot_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
group_col = "orig.ident",
method = div_fns,
plot_colors = c(BL6 = "#3182bd", MD4 = "#fec44f"),
panel_nrow = 2
)
plot_diversity()
returns a ggplot object that can be
modified with ggplot2 functions such as ggplot2::theme()
.
Plots can be further adjusted by passing aesthetic parameters directly
to ggplot2, e.g. alpha
, linetype
,
color
, etc.
so |>
plot_diversity(
data_col = "clonotype_id",
cluster_col = "sample",
method = div_fns,
alpha = 0.5, # parameters to pass to ggplot2
linetype = 2,
color = "black"
) +
theme(strip.text = element_text(face = "bold"))
Rarefaction curves
Another approach to ensure differences in sample size are not having an undue influence on diversity results is to plot rarefaction curves. This method involves calculating species diversity for different sized samples generated by randomly downsampling each cluster. By default the bootstrapped 95% confidence interval will also be plotted.
Calculations used to generate rarefaction curves are performed using
the iNEXT package. There are three diversity calculations that can be
specified with the method
argument:
- ‘richness’, species richness, this is equivalent to the calculation
performed by
abdiv::richness()
- ‘shannon’, the exponential of Shannon entropy
- ‘invsimpson’, the inverse Simpson index, this is equivalent to the
calculation performed by
abdiv::invsimpson()
so |>
plot_rarefaction(
data_col = "clonotype_id",
cluster_col = "orig.ident",
method = c("richness", "shannon", "invsimpson"),
plot_colors = c("#3182bd", "#fec44f")
)
If the 95% confidence interval is not desired, set
n_boots
to 0. In this example we also plot a separate line
for each BL6 and MD4 sample.
so |>
plot_rarefaction(
data_col = "clonotype_id",
cluster_col = "sample",
method = c("richness", "shannon", "invsimpson"),
n_boots = 0
)
Session info
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.4.4 SeuratObject_4.1.4 Seurat_4.4.0
#> [4] djvdj_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.7 magrittr_2.0.3
#> [4] spatstat.utils_3.0-3 iNEXT_3.0.0 farver_2.1.1
#> [7] rmarkdown_2.25 fs_1.6.3 ragg_1.2.6
#> [10] vctrs_0.6.4 ROCR_1.0-11 memoise_2.0.1
#> [13] spatstat.explore_3.2-5 htmltools_0.5.6.1 sass_0.4.7
#> [16] sctransform_0.4.1 parallelly_1.36.0 KernSmooth_2.23-21
#> [19] bslib_0.5.1 htmlwidgets_1.6.2 desc_1.4.2
#> [22] ica_1.0-3 plyr_1.8.9 plotly_4.10.3
#> [25] zoo_1.8-12 cachem_1.0.8 igraph_1.5.1
#> [28] mime_0.12 lifecycle_1.0.3 pkgconfig_2.0.3
#> [31] Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
#> [34] fitdistrplus_1.1-11 future_1.33.0 shiny_1.7.5.1
#> [37] digest_0.6.33 colorspace_2.1-0 patchwork_1.1.3
#> [40] rprojroot_2.0.3 tensor_1.5 irlba_2.3.5.1
#> [43] textshaping_0.3.7 labeling_0.4.3 abdiv_0.2.0
#> [46] progressr_0.14.0 fansi_1.0.5 spatstat.sparse_3.0-2
#> [49] httr_1.4.7 polyclip_1.10-6 abind_1.4-5
#> [52] compiler_4.3.1 bit64_4.0.5 withr_2.5.1
#> [55] MASS_7.3-60 tools_4.3.1 lmtest_0.9-40
#> [58] httpuv_1.6.12 future.apply_1.11.0 goftest_1.2-3
#> [61] glue_1.6.2 nlme_3.1-162 promises_1.2.1
#> [64] grid_4.3.1 Rtsne_0.16 cluster_2.1.4
#> [67] reshape2_1.4.4 generics_0.1.3 gtable_0.3.4
#> [70] spatstat.data_3.0-1 tzdb_0.4.0 tidyr_1.3.0
#> [73] data.table_1.14.8 hms_1.1.3 sp_2.1-1
#> [76] utf8_1.2.4 spatstat.geom_3.2-7 RcppAnnoy_0.0.21
#> [79] ggrepel_0.9.4 RANN_2.6.1 pillar_1.9.0
#> [82] stringr_1.5.0 vroom_1.6.4 later_1.3.1
#> [85] splines_4.3.1 dplyr_1.1.3 lattice_0.21-8
#> [88] bit_4.0.5 survival_3.5-5 deldir_1.0-9
#> [91] tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2
#> [94] knitr_1.44 gridExtra_2.3 scattermore_1.2
#> [97] xfun_0.40 matrixStats_1.0.0 stringi_1.7.12
#> [100] boot_1.3-28.1 lazyeval_0.2.2 yaml_2.3.7
#> [103] evaluate_0.22 codetools_0.2-19 tibble_3.2.1
#> [106] cli_3.6.1 uwot_0.1.16 xtable_1.8-4
#> [109] reticulate_1.34.0 systemfonts_1.0.5 munsell_0.5.0
#> [112] jquerylib_0.1.4 Rcpp_1.0.11 globals_0.16.2
#> [115] spatstat.random_3.2-1 png_0.1-8 parallel_4.3.1
#> [118] ellipsis_0.3.2 pkgdown_2.0.7 readr_2.1.4
#> [121] listenv_0.9.0 viridisLite_0.4.2 scales_1.2.1
#> [124] ggridges_0.5.4 crayon_1.5.2 leiden_0.4.3
#> [127] purrr_1.0.2 rlang_1.1.1 cowplot_1.1.1