Quantify the usage of different V(D)J segments for each cell cluster. The usage of two V(D)J segments can also be calculated for a single chain. For example, calc_gene_usage() can calculate the frequency that different heavy chain V and J segments appear together.
Usage
calc_gene_usage(
input,
data_cols,
cluster_col = NULL,
chain = NULL,
chain_col = global$chain_col,
prefix = paste0(data_cols[1], "_"),
return_df = FALSE,
sep = global$sep
)
Arguments
- input
Object containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.
- data_cols
meta.data column(s) containing V(D)J genes identified for each clonotype. If multiple columns are provided, paired usage of genes will be calculated.
- cluster_col
meta.data column containing cell clusters to use when calculating gene usage
- chain
Chain(s) to use for calculating gene usage. Set to
NULL
to include all chains.- chain_col
meta.data column containing chains for each cell
- prefix
Prefix to add to new columns
- return_df
Return results as a data.frame. If
FALSE
, results will be added to the input object.- sep
Separator used for storing per cell V(D)J data
Examples
# Calculate V(D)J segment usage for all cells
calc_gene_usage(
vdj_sce,
data_cols = "v_gene"
)
#> class: SingleCellExperiment
#> dim: 200 200
#> metadata(0):
#> assays(1): counts
#> rownames(200): Gm48486 Trav19 ... Hkdc1 Wdr62
#> rowData names(0):
#> colnames(200): 1_AAGCCGCAGCTTATCG-1 1_AATCCAGCATTACGAC-1 ...
#> 2_TTCTACAAGGCCCTTG-1 2_TTTGTCATCCAAAGTC-1
#> colData names(51): orig.ident nCount_RNA ... v_gene_freq
#> v_gene_pct
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
# Calculate gene usage separately for cell clusters
calc_gene_usage(
vdj_sce,
data_cols = "v_gene",
cluster_col = "orig.ident"
)
#> class: SingleCellExperiment
#> dim: 200 200
#> metadata(0):
#> assays(1): counts
#> rownames(200): Gm48486 Trav19 ... Hkdc1 Wdr62
#> rowData names(0):
#> colnames(200): 1_AAGCCGCAGCTTATCG-1 1_AATCCAGCATTACGAC-1 ...
#> 2_TTCTACAAGGCCCTTG-1 2_TTTGTCATCCAAAGTC-1
#> colData names(52): orig.ident nCount_RNA ... v_gene_pct
#> v_gene_shared
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
# Calculate gene usage for a specific chain(s)
calc_gene_usage(
vdj_sce,
data_cols = "v_gene",
chain = c("IGK", "IGL")
)
#> class: SingleCellExperiment
#> dim: 200 200
#> metadata(0):
#> assays(1): counts
#> rownames(200): Gm48486 Trav19 ... Hkdc1 Wdr62
#> rowData names(0):
#> colnames(200): 1_AAGCCGCAGCTTATCG-1 1_AATCCAGCATTACGAC-1 ...
#> 2_TTCTACAAGGCCCTTG-1 2_TTTGTCATCCAAAGTC-1
#> colData names(51): orig.ident nCount_RNA ... v_gene_freq
#> v_gene_pct
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
# Calculate paired usage of V(D)J segments
calc_gene_usage(
vdj_sce,
data_cols = c("v_gene", "j_gene"),
)
#> # A tibble: 41 × 5
#> v_gene j_gene n_cells freq pct
#> <chr> <chr> <int> <int> <dbl>
#> 1 IGKV5-43 IGKJ2 67 36 53.7
#> 2 IGKV6-23 IGKJ5 67 3 4.48
#> 3 IGLV1 IGLJ1 67 3 4.48
#> 4 IGKV8-24 IGKJ5 67 2 2.99
#> 5 IGKV10-96 IGKJ1 67 2 2.99
#> 6 IGKV17-127 IGKJ5 67 2 2.99
#> 7 IGHV5-17 IGHJ4 67 1 1.49
#> 8 IGKV4-72 IGKJ2 67 1 1.49
#> 9 IGKV4-54 IGKJ5 67 1 1.49
#> 10 IGHV5-4 IGHJ4 67 1 1.49
#> # ℹ 31 more rows