This vignette provides detailed examples for quantifying repertoire overlap between samples. For the examples shown below, we use data for splenocytes from BL6 and MD4 mice collected using the 10X Genomics scRNA-seq platform. MD4 B cells are monoclonal and specifically bind hen egg lysozyme.
library(djvdj)
library(Seurat)
library(ggplot2)
library(RColorBrewer)
# Load GEX data
data_dir <- system.file("extdata/splen", package = "djvdj")
gex_dirs <- c(
BL6 = file.path(data_dir, "BL6_GEX/filtered_feature_bc_matrix"),
MD4 = file.path(data_dir, "MD4_GEX/filtered_feature_bc_matrix")
)
so <- gex_dirs |>
Read10X() |>
CreateSeuratObject() |>
AddMetaData(splen_meta)
# Add V(D)J data to object
vdj_dirs <- c(
BL6 = system.file("extdata/splen/BL6_BCR", package = "djvdj"),
MD4 = system.file("extdata/splen/MD4_BCR", package = "djvdj")
)
so <- so |>
import_vdj(vdj_dirs, define_clonotypes = "cdr3_gene")
Calculating repertoire overlap
The calc_similarity()
function will calculate repertoire
overlap between clusters and add the results in the object meta.data.
This function is designed to specifically work with the R package abdiv. The similarity
metric to use for calculations can be selected by passing the name of
the function to the method
argument. Any beta diversity
function from the abdiv package that takes species counts as input can
be used. Be sure to read the documentation for the function you are
using to ensure it is appropriate for your analysis.
In this example we are calculating the Jaccard dissimilarity index for BL6 and MD4 samples. With this metric, a value close to 0 indicates that the samples have a high number of shared clonotypes.
Jaccard dissimilarity is calculated using the following equation, where \(a\) is the number of species present in both x and y, \(b\) is the number of species present in y but not x, and \(c\) is the number of species present in x but not y.
\[ 1 - {a \over a + b + c} \]
so_vdj <- so |>
calc_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
method = abdiv::jaccard
)
Similarity metrics can also be calculated for a specific chain. To do
this, the column passed to the data_col
argument must
contain per-chain data, such as CDR3 amino acid or nucleotide sequences.
In this example similarity is calculated based on heavy chain CDR3
sequences.
so_vdj <- so |>
calc_similarity(
data_col = "cdr3_nt",
cluster_col = "sample",
chain = "IGH"
)
Instead of adding the results to the object meta.data, a matrix can also be returned.
so |>
calc_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
return_mat = TRUE
)
#> BL6-1 BL6-2 BL6-3 MD4-1 MD4-2 MD4-3
#> BL6-1 0.0000000 0.9639640 0.9696970 1.0000000 1.0000000 0.9830508
#> BL6-2 0.9639640 0.0000000 0.9813084 1.0000000 1.0000000 0.9848485
#> BL6-3 0.9696970 0.9813084 0.0000000 0.9807692 1.0000000 1.0000000
#> MD4-1 1.0000000 1.0000000 0.9807692 0.0000000 0.8571429 0.9000000
#> MD4-2 1.0000000 1.0000000 1.0000000 0.8571429 0.0000000 0.8750000
#> MD4-3 0.9830508 0.9848485 1.0000000 0.9000000 0.8750000 0.0000000
Plotting overlap
The plot_similarity()
function will create plots
summarizing repertoire overlap between samples. By default, Jaccard
dissimilarity will be calculated and plotted as a heatmap. For this
example, none of the samples show strong overlap. This may seem
surprising since the MD4 samples are mainly composed of a single
clonotype and should have a very similar repertoire. However, the
Jaccard index is only measuring the number of overlapping clonotypes and
is not influenced by clonotype abundance.
so |>
plot_similarity(
data_col = "clonotype_id",
cluster_col = "sample"
)
A metric that takes into account clonotype abundance is the Horn-Morisita index. This metric measures the probability that clonotypes drawn from each sample will be different. Values close to 0 indicate high similarity. In this example we see that when clonotype abundance is taken into account, the MD4 samples appear very similar to each other.
so |>
plot_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
method = abdiv::horn_morisita
)
The appearance of the heatmap can be modified with additional
arguments. Setting cluster_heatmap
to FALSE
will remove the dendrograph. The remove_upper_triangle
argument can be used to only plot the lower triangle of the heatmap. The
plot_colors
argument will adjust the color gradient.
Additional parameters can be passed directly to
ComplexHeatmap::Heatmap()
.
so |>
plot_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
method = abdiv::horn_morisita,
plot_colors = c("#3182bd", "white", "#fec44f"),
cluster_heatmap = FALSE,
remove_upper_triangle = TRUE,
name = "Horn-Morisita" # parameter to pass to Heatmap()
)
Circos plot
A circos plot can be created by setting the method
argument to ‘circos’. This plot will summarize the number of clonotypes
overlapping between samples. The group_col
argument can be
used to split the graph into distinct sections based on a grouping
variable.
The number of overlapping clonotypes is shown as the axis labels for
each sample. The width of each link reflects the number of clonotypes
shared between the samples. Labels can be rotated to eliminate
overlapping text using the rotate_labels
argument.
so |>
plot_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
group_col = "orig.ident",
method = "circos",
rotate_labels = TRUE
)
Plot colors can be modified using the plot_colors
argument, additional parameters can be passed directly to
circlize::chordDiagram()
. In this example we scale the plot
so each sample is the same width.
so |>
plot_similarity(
data_col = "clonotype_id",
cluster_col = "sample",
group_col = "orig.ident",
method = "circos",
plot_colors = brewer.pal(10, "Spectral"),
scale = TRUE # parameter to pass to chordDiagram()
)
MDS plot
Multidimensional scaling (MDS) can be used to visualize the overall
similarity between repertoires. The calc_mds()
function
will calculate MDS coordinates using either the Jaccard dissimilarity
index or the Horn-Morisita index. The method can be specified using the
method
argument. MDS coordinates will get added to the
meta.data.
so_vdj <- so |>
calc_mds(
data_col = "clonotype_id",
cluster_col = "sample",
method = "horn_morisita"
)
The plot_mds()
function will create an MDS plot with
labels added for each sample.
so |>
plot_mds(
data_col = "clonotype_id",
cluster_col = "sample"
)
ggplot2 functions can be used to further adjust plot aesthetics, or
additional arguments can be passed directly to ggplot2,
e.g. size
, shape
, etc.
so |>
plot_mds(
data_col = "clonotype_id",
cluster_col = "sample",
size = 4, # parameters to pass to ggplot
shape = 2
) +
ggplot2::theme(legend.position = "none")
To remove sample labels, set label_points
to
FALSE
.
so |>
plot_mds(
data_col = "clonotype_id",
cluster_col = "sample",
label_points = FALSE
)
Session info
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] RColorBrewer_1.1-3 ggplot2_3.4.4 SeuratObject_4.1.4
#> [4] Seurat_4.4.0 djvdj_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] shape_1.4.6 jsonlite_1.8.7 magrittr_2.0.3
#> [4] spatstat.utils_3.0-3 farver_2.1.1 rmarkdown_2.25
#> [7] GlobalOptions_0.1.2 fs_1.6.3 ragg_1.2.6
#> [10] vctrs_0.6.4 ROCR_1.0-11 memoise_2.0.1
#> [13] spatstat.explore_3.2-5 htmltools_0.5.6.1 sass_0.4.7
#> [16] sctransform_0.4.1 parallelly_1.36.0 KernSmooth_2.23-21
#> [19] bslib_0.5.1 htmlwidgets_1.6.2 desc_1.4.2
#> [22] ica_1.0-3 plyr_1.8.9 plotly_4.10.3
#> [25] zoo_1.8-12 cachem_1.0.8 igraph_1.5.1
#> [28] iterators_1.0.14 mime_0.12 lifecycle_1.0.3
#> [31] pkgconfig_2.0.3 Matrix_1.6-1.1 R6_2.5.1
#> [34] fastmap_1.1.1 clue_0.3-65 fitdistrplus_1.1-11
#> [37] future_1.33.0 shiny_1.7.5.1 digest_0.6.33
#> [40] colorspace_2.1-0 S4Vectors_0.38.2 patchwork_1.1.3
#> [43] rprojroot_2.0.3 tensor_1.5 irlba_2.3.5.1
#> [46] textshaping_0.3.7 labeling_0.4.3 abdiv_0.2.0
#> [49] progressr_0.14.0 fansi_1.0.5 spatstat.sparse_3.0-2
#> [52] httr_1.4.7 polyclip_1.10-6 abind_1.4-5
#> [55] compiler_4.3.1 doParallel_1.0.17 bit64_4.0.5
#> [58] withr_2.5.1 MASS_7.3-60 rjson_0.2.21
#> [61] tools_4.3.1 lmtest_0.9-40 httpuv_1.6.12
#> [64] future.apply_1.11.0 goftest_1.2-3 glue_1.6.2
#> [67] nlme_3.1-162 promises_1.2.1 grid_4.3.1
#> [70] Rtsne_0.16 cluster_2.1.4 reshape2_1.4.4
#> [73] generics_0.1.3 gtable_0.3.4 spatstat.data_3.0-1
#> [76] tzdb_0.4.0 tidyr_1.3.0 data.table_1.14.8
#> [79] hms_1.1.3 sp_2.1-1 utf8_1.2.4
#> [82] BiocGenerics_0.46.0 spatstat.geom_3.2-7 RcppAnnoy_0.0.21
#> [85] foreach_1.5.2 ggrepel_0.9.4 RANN_2.6.1
#> [88] pillar_1.9.0 stringr_1.5.0 vroom_1.6.4
#> [91] later_1.3.1 circlize_0.4.15 splines_4.3.1
#> [94] dplyr_1.1.3 lattice_0.21-8 bit_4.0.5
#> [97] survival_3.5-5 deldir_1.0-9 tidyselect_1.2.0
#> [100] ComplexHeatmap_2.16.0 miniUI_0.1.1.1 pbapply_1.7-2
#> [103] knitr_1.44 gridExtra_2.3 IRanges_2.34.1
#> [106] scattermore_1.2 stats4_4.3.1 xfun_0.40
#> [109] matrixStats_1.0.0 stringi_1.7.12 lazyeval_0.2.2
#> [112] yaml_2.3.7 evaluate_0.22 codetools_0.2-19
#> [115] tibble_3.2.1 cli_3.6.1 uwot_0.1.16
#> [118] xtable_1.8-4 reticulate_1.34.0 systemfonts_1.0.5
#> [121] munsell_0.5.0 jquerylib_0.1.4 Rcpp_1.0.11
#> [124] globals_0.16.2 spatstat.random_3.2-1 png_0.1-8
#> [127] parallel_4.3.1 ellipsis_0.3.2 pkgdown_2.0.7
#> [130] readr_2.1.4 listenv_0.9.0 viridisLite_0.4.2
#> [133] scales_1.2.1 ggridges_0.5.4 crayon_1.5.2
#> [136] leiden_0.4.3 purrr_1.0.2 GetoptLong_1.0.5
#> [139] rlang_1.1.1 cowplot_1.1.1