Version info for alevin run

Start time Tue Oct 17 12:32:33 2023
Salmon version 1.10.0
Index ../index/salmon_index
R1file 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L001_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L002_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L003_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L004_R1_001.fastq.gz
R2file 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L001_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L002_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L003_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L004_R2_001.fastq.gz
tgMap ../index/tx2gene.txt
Library type ISR

Summary tables

Full set of cell barcodes

Total number of processed reads 368191709
Number of reads with Ns 0
Number of reads with valid cell barcode (no Ns) 368191709
Number of mapped reads 177750947
Percent mapped (of all reads) 48.28%
Number of noisy CB reads 37447083
Number of noisy UMI reads 2356
Total number of observed cell barcodes 4667936
Number of used reads 330742270
Percent mapped (of used reads) 53.74%

Initial whitelist

Number of barcodes (initial whitelist) 5138
Number of barcodes with quantification (initial whitelist) 5137
Fraction reads in barcodes (initial whitelist) 91.65%
Mean number of reads per cell (initial whitelist) 65689
Median number of reads per cell (initial whitelist) 62974
Mean number of detected genes per cell (initial whitelist) 2390
Median number of detected genes per cell (initial whitelist) 2410
Mean UMI count per cell (initial whitelist) 8248
Median UMI count per cell (initial whitelist) 7878

Final whitelist

Number of barcodes (final whitelist) 4832
Number of barcodes with quantification (final whitelist) 4832
Fraction reads in barcodes (final whitelist) 90.51%
Mean number of reads per cell (final whitelist) 68964
Median number of reads per cell (final whitelist) 64710
Mean number of detected genes per cell (final whitelist) 2497
Median number of detected genes per cell (final whitelist) 2450
Mean UMI count per cell (final whitelist) 8656
Median UMI count per cell (final whitelist) 8073

Knee plot

The knee plot displays the number of times each cell barcode is observed, indecreasing order. By finding a ‘knee’ in this plot, Alevin determines a threshold (indicated in the plot) that defines an initial ‘whitelist’ - a set of cell barcodes that likely represent non-empty droplets - and distinguishes them from the background. The initial whitelisting is only performed if no external whitelist is provided when running alevin. In the figure below, red indicates cell barcodes in the initial whitelist, black indicates all other cell barcodes.

Cell barcode error correction and merging with initial whitelist

Once the initial set of whitelisted cell barcodes is defined, Alevin goes through the remaining cell barcodes. If a cell barcode is similar enough to a whitelisted cell barcode, it will be corrected and the reads will be added to those of the whitelisted one. The figure below shows the original frequency of the whitelisted barcodes vs the frequency after this correction. The reads corresponding to cell barcodes that can not be corrected to a whitelisted barcode are discarded.

Quantification

After cell barcode collapsing, Alevin estimates the UMI count for each cell and gene. Following quantification, an additional cell barcode whitelisting is performed with the aim of extracting good quality cells, using not only the barcode frequency but also other features such as the fraction of mapped reads, the duplication rate and the average gene count. The plots below show the association between the cell barcode frequency (the number of observed reads corresponding to a cell barcode), the total UMI count and the number of detected genes. The cell barcodes are colored by whether or not they are included in the final whitelist.

These figures can give an indication of whether the sequenced reads actually align to genes, as well as the duplication rate and the degree of saturation. For many droplet data sets, the association between the barcode frequency and the total UMI count is rougly linear, while the association of any of these with the number of detected genes often deviates from linearity, if a small subset of the genes are assigned a large fraction of the UMI counts.

Knee plot, number of detected genes

Similarly to the knee plot that was used to select the initial set of cell barcodes, the plot below shows the number of detected genes for each cell barcode included in the initial whitelist, in decreasing order.

Selected summary distributions

The histograms below show the distributions of the deduplication rates (number of deduplicated UMI counts/number of mapped reads) and the mapping rates, across the cells retained in the initial whitelist.

Session info

## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.2.1
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Denver
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
##  [1] alevinQC_1.16.1             ensembldb_2.24.1           
##  [3] AnnotationFilter_1.24.0     GenomicFeatures_1.52.2     
##  [5] AnnotationHub_3.8.0         BiocFileCache_2.8.0        
##  [7] dbplyr_2.3.4                org.Hs.eg.db_3.17.0        
##  [9] AnnotationDbi_1.62.2        eds_1.2.0                  
## [11] Matrix_1.6-1.1              tximport_1.28.0            
## [13] DropletUtils_1.20.0         scater_1.28.0              
## [15] ggplot2_3.4.3               scran_1.28.2               
## [17] scuttle_1.10.2              SingleCellExperiment_1.22.0
## [19] SummarizedExperiment_1.30.2 Biobase_2.60.0             
## [21] GenomicRanges_1.52.0        GenomeInfoDb_1.36.3        
## [23] IRanges_2.34.1              S4Vectors_0.38.2           
## [25] BiocGenerics_0.46.0         MatrixGenerics_1.12.3      
## [27] matrixStats_1.0.0           here_1.0.1                 
## 
## loaded via a namespace (and not attached):
##   [1] ProtGenerics_1.32.0           spatstat.sparse_3.0-2        
##   [3] bitops_1.0-7                  httr_1.4.7                   
##   [5] RColorBrewer_1.1-3            tools_4.3.1                  
##   [7] sctransform_0.4.0             DT_0.29                      
##   [9] utf8_1.2.3                    R6_2.5.1                     
##  [11] HDF5Array_1.28.1              lazyeval_0.2.2               
##  [13] uwot_0.1.16                   rhdf5filters_1.12.1          
##  [15] withr_2.5.1                   sp_2.0-0                     
##  [17] GGally_2.1.2                  prettyunits_1.2.0            
##  [19] gridExtra_2.3                 progressr_0.14.0             
##  [21] cli_3.6.1                     spatstat.explore_3.2-3       
##  [23] labeling_0.4.3                sass_0.4.7                   
##  [25] Seurat_4.4.0                  spatstat.data_3.0-1          
##  [27] ggridges_0.5.4                pbapply_1.7-2                
##  [29] Rsamtools_2.16.0              R.utils_2.12.2               
##  [31] parallelly_1.36.0             limma_3.56.2                 
##  [33] rstudioapi_0.15.0             RSQLite_2.3.1                
##  [35] BiocIO_1.10.0                 generics_0.1.3               
##  [37] ica_1.0-3                     spatstat.random_3.1-6        
##  [39] dplyr_1.1.3                   ggbeeswarm_0.7.2             
##  [41] fansi_1.0.4                   abind_1.4-5                  
##  [43] R.methodsS3_1.8.2             lifecycle_1.0.3              
##  [45] yaml_2.3.7                    edgeR_3.42.4                 
##  [47] rhdf5_2.44.0                  Rtsne_0.16                   
##  [49] grid_4.3.1                    blob_1.2.4                   
##  [51] promises_1.2.1                dqrng_0.3.1                  
##  [53] shinydashboard_0.7.2          crayon_1.5.2                 
##  [55] miniUI_0.1.1.1                lattice_0.21-8               
##  [57] beachmat_2.16.0               cowplot_1.1.1                
##  [59] KEGGREST_1.40.0               pillar_1.9.0                 
##  [61] knitr_1.44                    metapod_1.8.0                
##  [63] rjson_0.2.21                  future.apply_1.11.0          
##  [65] codetools_0.2-19              leiden_0.4.3                 
##  [67] glue_1.6.2                    data.table_1.14.8            
##  [69] vctrs_0.6.3                   png_0.1-8                    
##  [71] gtable_0.3.4                  cachem_1.0.8                 
##  [73] xfun_0.40                     S4Arrays_1.0.6               
##  [75] mime_0.12                     survival_3.5-7               
##  [77] statmod_1.5.0                 bluster_1.10.0               
##  [79] interactiveDisplayBase_1.38.0 ellipsis_0.3.2               
##  [81] fitdistrplus_1.1-11           ROCR_1.0-11                  
##  [83] nlme_3.1-163                  bit64_4.0.5                  
##  [85] progress_1.2.2                filelock_1.0.2               
##  [87] RcppAnnoy_0.0.21              rprojroot_2.0.3              
##  [89] bslib_0.5.1                   irlba_2.3.5.1                
##  [91] vipor_0.4.5                   KernSmooth_2.23-22           
##  [93] colorspace_2.1-0              DBI_1.1.3                    
##  [95] tidyselect_1.2.0              bit_4.0.5                    
##  [97] compiler_4.3.1                curl_5.0.2                   
##  [99] BiocNeighbors_1.18.0          xml2_1.3.5                   
## [101] DelayedArray_0.26.7           plotly_4.10.2                
## [103] rtracklayer_1.60.1            scales_1.2.1                 
## [105] lmtest_0.9-40                 rappdirs_0.3.3               
## [107] stringr_1.5.0                 digest_0.6.33                
## [109] goftest_1.2-3                 spatstat.utils_3.0-3         
## [111] rmarkdown_2.25                XVector_0.40.0               
## [113] htmltools_0.5.6               pkgconfig_2.0.3              
## [115] sparseMatrixStats_1.12.2      fastmap_1.1.1                
## [117] rlang_1.1.1                   htmlwidgets_1.6.2            
## [119] shiny_1.7.5                   DelayedMatrixStats_1.22.6    
## [121] farver_2.1.1                  jquerylib_0.1.4              
## [123] zoo_1.8-12                    jsonlite_1.8.7               
## [125] BiocParallel_1.34.2           R.oo_1.25.0                  
## [127] BiocSingular_1.16.0           RCurl_1.98-1.12              
## [129] magrittr_2.0.3                GenomeInfoDbData_1.2.10      
## [131] patchwork_1.1.3               Rhdf5lib_1.22.1              
## [133] munsell_0.5.0                 Rcpp_1.0.11                  
## [135] viridis_0.6.4                 reticulate_1.32.0            
## [137] stringi_1.7.12                zlibbioc_1.46.0              
## [139] MASS_7.3-60                   plyr_1.8.8                   
## [141] parallel_4.3.1                listenv_0.9.0                
## [143] ggrepel_0.9.3                 deldir_1.0-9                 
## [145] Biostrings_2.68.1             splines_4.3.1                
## [147] tensor_1.5                    hms_1.1.3                    
## [149] locfit_1.5-9.8                igraph_1.5.1                 
## [151] spatstat.geom_3.2-5           reshape2_1.4.4               
## [153] biomaRt_2.56.1                ScaledMatrix_1.8.1           
## [155] BiocVersion_3.17.1            XML_3.99-0.14                
## [157] evaluate_0.21                 SeuratObject_4.1.4           
## [159] renv_1.0.3                    BiocManager_1.30.22          
## [161] httpuv_1.6.11                 RANN_2.6.1                   
## [163] tidyr_1.3.0                   purrr_1.0.2.9000             
## [165] polyclip_1.10-6               reshape_0.8.9                
## [167] future_1.33.0                 scattermore_1.2              
## [169] rsvd_1.0.5                    xtable_1.8-4                 
## [171] restfulr_0.0.15               later_1.3.1                  
## [173] viridisLite_0.4.2             tibble_3.2.1                 
## [175] GenomicAlignments_1.36.0      memoise_2.0.1                
## [177] beeswarm_0.4.0                cluster_2.1.4                
## [179] globals_0.16.2