Fetch V(D)J data from object — fetch

Fetch per-chain V(D)J data from object. Within the object meta.data, each row represents a single cell and can include information for multiple chains. This function can return a data.frame where each row represents a single chain. This is useful for plotting per-chain metrics such as CDR3 length or the number of insertions/deletions.

Usage

fetch_vdj(
  input,
  data_cols = NULL,
  clonotype_col = NULL,
  filter_cells = FALSE,
  per_chain = TRUE,
  unnest = TRUE,
  sep = global$sep
)

Arguments

input: Single cell object or data.frame containing V(D)J data. If a data.frame is provided, the cell barcodes should be stored as row names.
data_cols: meta.data columns containing per-chain V(D)J data to unnest. If NULL, V(D)J data are automatically selected by identifying columns that have NAs in the same rows as clonotype_col.
clonotype_col: meta.data column containing clonotype IDs. This column is used to determine which columns have V(D)J data. If both clonotype_col and data_cols are NULL, all columns are included.
filter_cells: Remove cells that do not have V(D)J data, clonotype_col must be provided to determine which cells to filter.
per_chain: If TRUE return per-chain data, i.e. each row represents a chain.
unnest: If FALSE, a nested data.frame is returned where each row represents a cell and V(D)J data is stored as list-cols. If TRUE, columns are unnested so each row represents a single chain.
sep: Separator used for storing per cell V(D)J data. This is used to identify columns containing per-chain data that can be unnested.

Value

data.frame containing V(D)J data

Examples

# Fetch per-chain V(D)J data
fetch_vdj(vdj_sce)
#> # A tibble: 216 × 50
#>    .cell_id             orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#>    <chr>                <chr>           <dbl>        <int> <fct>          
#>  1 1_AAGCCGCAGCTTATCG-1 avid_1              0            0 0              
#>  2 1_AATCCAGCATTACGAC-1 avid_1              6            4 0              
#>  3 1_ACAGCTAGTCTGGTCG-1 avid_1             15            4 0              
#>  4 1_ACCAGTAGTGCAGTAG-1 avid_1              4            2 1              
#>  5 1_ACCTTTATCGACGGAA-1 avid_1              5            5 0              
#>  6 1_ACGAGGAGTGACTCAT-1 avid_1              1            1 1              
#>  7 1_ACGCAGCTCGTGACAT-1 avid_1              3            2 1              
#>  8 1_ACGCCGACACGTCAGC-1 avid_1              5            2 0              
#>  9 1_ACGGAGACATGCTGGC-1 avid_1              5            3 0              
#> 10 1_ACTTTCATCGCTAGCG-1 avid_1              1            1 0              
#> # ℹ 206 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> #   clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> #   n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <dbl>,
#> #   cdr3_nt_length <dbl>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> #   c_gene <chr>, isotype <chr>, reads <dbl>, umis <dbl>,
#> #   productive <lgl>, full_length <lgl>, paired <lgl>, v_ins <dbl>, …

# To increase performance, specify which columns to return per-chain data,
# per-cell data will be returned for all other columns
fetch_vdj(
  vdj_sce,
  data_cols = c("chains", "reads")
)
#> # A tibble: 216 × 50
#>    .cell_id             orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#>    <chr>                <chr>           <dbl>        <int> <fct>          
#>  1 1_AAGCCGCAGCTTATCG-1 avid_1              0            0 0              
#>  2 1_AATCCAGCATTACGAC-1 avid_1              6            4 0              
#>  3 1_ACAGCTAGTCTGGTCG-1 avid_1             15            4 0              
#>  4 1_ACCAGTAGTGCAGTAG-1 avid_1              4            2 1              
#>  5 1_ACCTTTATCGACGGAA-1 avid_1              5            5 0              
#>  6 1_ACGAGGAGTGACTCAT-1 avid_1              1            1 1              
#>  7 1_ACGCAGCTCGTGACAT-1 avid_1              3            2 1              
#>  8 1_ACGCCGACACGTCAGC-1 avid_1              5            2 0              
#>  9 1_ACGGAGACATGCTGGC-1 avid_1              5            3 0              
#> 10 1_ACTTTCATCGCTAGCG-1 avid_1              1            1 0              
#> # ℹ 206 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> #   clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> #   n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <chr>,
#> #   cdr3_nt_length <chr>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> #   c_gene <chr>, isotype <chr>, reads <dbl>, umis <chr>,
#> #   productive <chr>, full_length <chr>, paired <lgl>, v_ins <chr>, …

# Only include cells that have V(D)J data
# clonotype_col must be specified to identify cells with V(D)J data
fetch_vdj(
  vdj_sce,
  filter_cells = TRUE,
  clonotype_col = "clonotype_id"
)
#> # A tibble: 83 × 50
#>    .cell_id             orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.3
#>    <chr>                <chr>           <dbl>        <int> <fct>          
#>  1 1_ACGGAGACATGCTGGC-1 avid_1              5            3 0              
#>  2 1_AGGCCACTCGGTCCGA-1 avid_1              3            2 1              
#>  3 1_AGGCCACTCGGTCCGA-1 avid_1              3            2 1              
#>  4 1_AGGCCACTCGGTCCGA-1 avid_1              3            2 1              
#>  5 1_ATCATCTAGGCTAGAC-1 avid_1              3            3 0              
#>  6 1_CACAGGCGTGGTCCGT-1 avid_1              9            4 0              
#>  7 1_CAGCAGCGTAAAGTCA-1 avid_1              3            3 0              
#>  8 1_CAGCTGGGTGCGATAG-1 avid_1              2            2 1              
#>  9 1_CAGCTGGGTGCGATAG-1 avid_1              2            2 1              
#> 10 1_CAGCTGGGTGCGATAG-1 avid_1              2            2 1              
#> # ℹ 73 more rows
#> # ℹ 45 more variables: seurat_clusters <fct>, UMAP_1 <dbl>, UMAP_2 <dbl>,
#> #   clonotype_id <chr>, exact_subclonotype_id <dbl>, chains <chr>,
#> #   n_chains <int>, cdr3 <chr>, cdr3_nt <chr>, cdr3_length <dbl>,
#> #   cdr3_nt_length <dbl>, v_gene <chr>, d_gene <chr>, j_gene <chr>,
#> #   c_gene <chr>, isotype <chr>, reads <dbl>, umis <dbl>,
#> #   productive <lgl>, full_length <lgl>, paired <lgl>, v_ins <dbl>, …