Fetch per-chain V(D)J data from object. Within the object meta.data, each row represents a single cell and can include information for multiple chains. This function can return a data.frame where each row represents a single chain. This is useful for plotting per-chain metrics such as CDR3 length or the number of insertions/deletions.
Usage
fetch_vdj(
input,
data_cols = NULL,
clonotype_col = NULL,
filter_cells = FALSE,
per_chain = TRUE,
unnest = TRUE,
sep = global$sep
)
Arguments
- input
Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.
- data_cols
meta.data columns containing per-chain V(D)J data to unnest. If NULL, V(D)J data are automatically selected by identifying columns that have NAs in the same rows as clonotype_col.
- clonotype_col
meta.data column containing clonotype IDs. This column is used to determine which columns have V(D)J data. If both clonotype_col and data_cols are NULL, all columns are included.
- filter_cells
Remove cells that do not have V(D)J data, clonotype_col must be provided to determine which cells to filter.
- per_chain
If
TRUE
return per-chain data, i.e. each row represents a chain.- unnest
If
FALSE
, a nested data.frame is returned where each row represents a cell and V(D)J data is stored as list-cols. IfTRUE
, columns are unnested so each row represents a single chain.- sep
Separator used for storing per cell V(D)J data. This is used to identify columns containing per-chain data that can be unnested.
Examples
# Fetch per-chain V(D)J data
fetch_vdj(vdj_sce)
#> # A tibble: 216 × 50
#> .cell_id orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#> <chr> <chr> <dbl> <int> <fct>
#> 1 1_AAGCCGCAGCTTATCG-1 avid_1 0 0 0
#> 2 1_AATCCAGCATTACGAC-1 avid_1 6 4 0
#> 3 1_ACAGCTAGTCTGGTCG-1 avid_1 15 4 0
#> 4 1_ACCAGTAGTGCAGTAG-1 avid_1 4 2 1
#> 5 1_ACCTTTATCGACGGAA-1 avid_1 5 5 0
#> 6 1_ACGAGGAGTGACTCAT-1 avid_1 1 1 1
#> 7 1_ACGCAGCTCGTGACAT-1 avid_1 3 2 1
#> 8 1_ACGCCGACACGTCAGC-1 avid_1 5 2 0
#> 9 1_ACGGAGACATGCTGGC-1 avid_1 5 3 0
#> 10 1_ACTTTCATCGCTAGCG-1 avid_1 1 1 0
#> # ℹ 206 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> # clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> # n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <dbl>,
#> # cdr3_nt_length <dbl>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> # c_gene <chr>, isotype <chr>, reads <dbl>, umis <dbl>,
#> # productive <lgl>, full_length <lgl>, paired <lgl>, v_ins <dbl>, …
# To increase performance, specify which columns to return per-chain data,
# per-cell data will be returned for all other columns
fetch_vdj(
vdj_sce,
data_cols = c("chains", "reads")
)
#> # A tibble: 216 × 50
#> .cell_id orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#> <chr> <chr> <dbl> <int> <fct>
#> 1 1_AAGCCGCAGCTTATCG-1 avid_1 0 0 0
#> 2 1_AATCCAGCATTACGAC-1 avid_1 6 4 0
#> 3 1_ACAGCTAGTCTGGTCG-1 avid_1 15 4 0
#> 4 1_ACCAGTAGTGCAGTAG-1 avid_1 4 2 1
#> 5 1_ACCTTTATCGACGGAA-1 avid_1 5 5 0
#> 6 1_ACGAGGAGTGACTCAT-1 avid_1 1 1 1
#> 7 1_ACGCAGCTCGTGACAT-1 avid_1 3 2 1
#> 8 1_ACGCCGACACGTCAGC-1 avid_1 5 2 0
#> 9 1_ACGGAGACATGCTGGC-1 avid_1 5 3 0
#> 10 1_ACTTTCATCGCTAGCG-1 avid_1 1 1 0
#> # ℹ 206 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> # clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> # n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <chr>,
#> # cdr3_nt_length <chr>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> # c_gene <chr>, isotype <chr>, reads <dbl>, umis <chr>,
#> # productive <chr>, full_length <chr>, paired <lgl>, v_ins <chr>, …
# Only include cells that have V(D)J data
# clonotype_col must be specified to identify cells with V(D)J data
fetch_vdj(
vdj_sce,
filter_cells = TRUE,
clonotype_col = "clonotype_id"
)
#> # A tibble: 83 × 50
#> .cell_id orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#> <chr> <chr> <dbl> <int> <fct>
#> 1 1_ACGGAGACATGCTGGC-1 avid_1 5 3 0
#> 2 1_AGGCCACTCGGTCCGA-1 avid_1 3 2 1
#> 3 1_AGGCCACTCGGTCCGA-1 avid_1 3 2 1
#> 4 1_AGGCCACTCGGTCCGA-1 avid_1 3 2 1
#> 5 1_ATCATCTAGGCTAGAC-1 avid_1 3 3 0
#> 6 1_CACAGGCGTGGTCCGT-1 avid_1 9 4 0
#> 7 1_CAGCAGCGTAAAGTCA-1 avid_1 3 3 0
#> 8 1_CAGCTGGGTGCGATAG-1 avid_1 2 2 1
#> 9 1_CAGCTGGGTGCGATAG-1 avid_1 2 2 1
#> 10 1_CAGCTGGGTGCGATAG-1 avid_1 2 2 1
#> # ℹ 73 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> # clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> # n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <dbl>,
#> # cdr3_nt_length <dbl>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> # c_gene <chr>, isotype <chr>, reads <dbl>, umis <dbl>,
#> # productive <lgl>, full_length <lgl>, paired <lgl>, v_ins <dbl>, …