Start time | Tue Oct 17 12:32:33 2023 |
Salmon version | 1.10.0 |
Index | ../index/salmon_index |
R1file | 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L001_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L002_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L003_R1_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L004_R1_001.fastq.gz |
R2file | 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L001_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L002_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L003_R2_001.fastq.gz, 5k_pbmc_v3_nextgem_fastqs/5k_pbmc_v3_nextgem_S1_L004_R2_001.fastq.gz |
tgMap | ../index/tx2gene.txt |
Library type | ISR |
Total number of processed reads | 368191709 |
Number of reads with Ns | 0 |
Number of reads with valid cell barcode (no Ns) | 368191709 |
Number of mapped reads | 177750947 |
Percent mapped (of all reads) | 48.28% |
Number of noisy CB reads | 37447083 |
Number of noisy UMI reads | 2356 |
Total number of observed cell barcodes | 4667936 |
Number of used reads | 330742270 |
Percent mapped (of used reads) | 53.74% |
Number of barcodes (initial whitelist) | 5138 |
Number of barcodes with quantification (initial whitelist) | 5137 |
Fraction reads in barcodes (initial whitelist) | 91.65% |
Mean number of reads per cell (initial whitelist) | 65689 |
Median number of reads per cell (initial whitelist) | 62974 |
Mean number of detected genes per cell (initial whitelist) | 2390 |
Median number of detected genes per cell (initial whitelist) | 2410 |
Mean UMI count per cell (initial whitelist) | 8248 |
Median UMI count per cell (initial whitelist) | 7878 |
Number of barcodes (final whitelist) | 4832 |
Number of barcodes with quantification (final whitelist) | 4832 |
Fraction reads in barcodes (final whitelist) | 90.51% |
Mean number of reads per cell (final whitelist) | 68964 |
Median number of reads per cell (final whitelist) | 64710 |
Mean number of detected genes per cell (final whitelist) | 2497 |
Median number of detected genes per cell (final whitelist) | 2450 |
Mean UMI count per cell (final whitelist) | 8656 |
Median UMI count per cell (final whitelist) | 8073 |
The knee plot displays the number of times each cell barcode is observed, indecreasing order. By finding a ‘knee’ in this plot, Alevin determines a threshold (indicated in the plot) that defines an initial ‘whitelist’ - a set of cell barcodes that likely represent non-empty droplets - and distinguishes them from the background. The initial whitelisting is only performed if no external whitelist is provided when running alevin. In the figure below, red indicates cell barcodes in the initial whitelist, black indicates all other cell barcodes.
Once the initial set of whitelisted cell barcodes is defined, Alevin goes through the remaining cell barcodes. If a cell barcode is similar enough to a whitelisted cell barcode, it will be corrected and the reads will be added to those of the whitelisted one. The figure below shows the original frequency of the whitelisted barcodes vs the frequency after this correction. The reads corresponding to cell barcodes that can not be corrected to a whitelisted barcode are discarded.
After cell barcode collapsing, Alevin estimates the UMI count for each cell and gene. Following quantification, an additional cell barcode whitelisting is performed with the aim of extracting good quality cells, using not only the barcode frequency but also other features such as the fraction of mapped reads, the duplication rate and the average gene count. The plots below show the association between the cell barcode frequency (the number of observed reads corresponding to a cell barcode), the total UMI count and the number of detected genes. The cell barcodes are colored by whether or not they are included in the final whitelist.
These figures can give an indication of whether the sequenced reads actually align to genes, as well as the duplication rate and the degree of saturation. For many droplet data sets, the association between the barcode frequency and the total UMI count is rougly linear, while the association of any of these with the number of detected genes often deviates from linearity, if a small subset of the genes are assigned a large fraction of the UMI counts.
Similarly to the knee plot that was used to select the initial set of cell barcodes, the plot below shows the number of detected genes for each cell barcode included in the initial whitelist, in decreasing order.
The histograms below show the distributions of the deduplication rates (number of deduplicated UMI counts/number of mapped reads) and the mapping rates, across the cells retained in the initial whitelist.
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.2.1
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Denver
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] alevinQC_1.16.1 ensembldb_2.24.1
## [3] AnnotationFilter_1.24.0 GenomicFeatures_1.52.2
## [5] AnnotationHub_3.8.0 BiocFileCache_2.8.0
## [7] dbplyr_2.3.4 org.Hs.eg.db_3.17.0
## [9] AnnotationDbi_1.62.2 eds_1.2.0
## [11] Matrix_1.6-1.1 tximport_1.28.0
## [13] DropletUtils_1.20.0 scater_1.28.0
## [15] ggplot2_3.4.3 scran_1.28.2
## [17] scuttle_1.10.2 SingleCellExperiment_1.22.0
## [19] SummarizedExperiment_1.30.2 Biobase_2.60.0
## [21] GenomicRanges_1.52.0 GenomeInfoDb_1.36.3
## [23] IRanges_2.34.1 S4Vectors_0.38.2
## [25] BiocGenerics_0.46.0 MatrixGenerics_1.12.3
## [27] matrixStats_1.0.0 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] ProtGenerics_1.32.0 spatstat.sparse_3.0-2
## [3] bitops_1.0-7 httr_1.4.7
## [5] RColorBrewer_1.1-3 tools_4.3.1
## [7] sctransform_0.4.0 DT_0.29
## [9] utf8_1.2.3 R6_2.5.1
## [11] HDF5Array_1.28.1 lazyeval_0.2.2
## [13] uwot_0.1.16 rhdf5filters_1.12.1
## [15] withr_2.5.1 sp_2.0-0
## [17] GGally_2.1.2 prettyunits_1.2.0
## [19] gridExtra_2.3 progressr_0.14.0
## [21] cli_3.6.1 spatstat.explore_3.2-3
## [23] labeling_0.4.3 sass_0.4.7
## [25] Seurat_4.4.0 spatstat.data_3.0-1
## [27] ggridges_0.5.4 pbapply_1.7-2
## [29] Rsamtools_2.16.0 R.utils_2.12.2
## [31] parallelly_1.36.0 limma_3.56.2
## [33] rstudioapi_0.15.0 RSQLite_2.3.1
## [35] BiocIO_1.10.0 generics_0.1.3
## [37] ica_1.0-3 spatstat.random_3.1-6
## [39] dplyr_1.1.3 ggbeeswarm_0.7.2
## [41] fansi_1.0.4 abind_1.4-5
## [43] R.methodsS3_1.8.2 lifecycle_1.0.3
## [45] yaml_2.3.7 edgeR_3.42.4
## [47] rhdf5_2.44.0 Rtsne_0.16
## [49] grid_4.3.1 blob_1.2.4
## [51] promises_1.2.1 dqrng_0.3.1
## [53] shinydashboard_0.7.2 crayon_1.5.2
## [55] miniUI_0.1.1.1 lattice_0.21-8
## [57] beachmat_2.16.0 cowplot_1.1.1
## [59] KEGGREST_1.40.0 pillar_1.9.0
## [61] knitr_1.44 metapod_1.8.0
## [63] rjson_0.2.21 future.apply_1.11.0
## [65] codetools_0.2-19 leiden_0.4.3
## [67] glue_1.6.2 data.table_1.14.8
## [69] vctrs_0.6.3 png_0.1-8
## [71] gtable_0.3.4 cachem_1.0.8
## [73] xfun_0.40 S4Arrays_1.0.6
## [75] mime_0.12 survival_3.5-7
## [77] statmod_1.5.0 bluster_1.10.0
## [79] interactiveDisplayBase_1.38.0 ellipsis_0.3.2
## [81] fitdistrplus_1.1-11 ROCR_1.0-11
## [83] nlme_3.1-163 bit64_4.0.5
## [85] progress_1.2.2 filelock_1.0.2
## [87] RcppAnnoy_0.0.21 rprojroot_2.0.3
## [89] bslib_0.5.1 irlba_2.3.5.1
## [91] vipor_0.4.5 KernSmooth_2.23-22
## [93] colorspace_2.1-0 DBI_1.1.3
## [95] tidyselect_1.2.0 bit_4.0.5
## [97] compiler_4.3.1 curl_5.0.2
## [99] BiocNeighbors_1.18.0 xml2_1.3.5
## [101] DelayedArray_0.26.7 plotly_4.10.2
## [103] rtracklayer_1.60.1 scales_1.2.1
## [105] lmtest_0.9-40 rappdirs_0.3.3
## [107] stringr_1.5.0 digest_0.6.33
## [109] goftest_1.2-3 spatstat.utils_3.0-3
## [111] rmarkdown_2.25 XVector_0.40.0
## [113] htmltools_0.5.6 pkgconfig_2.0.3
## [115] sparseMatrixStats_1.12.2 fastmap_1.1.1
## [117] rlang_1.1.1 htmlwidgets_1.6.2
## [119] shiny_1.7.5 DelayedMatrixStats_1.22.6
## [121] farver_2.1.1 jquerylib_0.1.4
## [123] zoo_1.8-12 jsonlite_1.8.7
## [125] BiocParallel_1.34.2 R.oo_1.25.0
## [127] BiocSingular_1.16.0 RCurl_1.98-1.12
## [129] magrittr_2.0.3 GenomeInfoDbData_1.2.10
## [131] patchwork_1.1.3 Rhdf5lib_1.22.1
## [133] munsell_0.5.0 Rcpp_1.0.11
## [135] viridis_0.6.4 reticulate_1.32.0
## [137] stringi_1.7.12 zlibbioc_1.46.0
## [139] MASS_7.3-60 plyr_1.8.8
## [141] parallel_4.3.1 listenv_0.9.0
## [143] ggrepel_0.9.3 deldir_1.0-9
## [145] Biostrings_2.68.1 splines_4.3.1
## [147] tensor_1.5 hms_1.1.3
## [149] locfit_1.5-9.8 igraph_1.5.1
## [151] spatstat.geom_3.2-5 reshape2_1.4.4
## [153] biomaRt_2.56.1 ScaledMatrix_1.8.1
## [155] BiocVersion_3.17.1 XML_3.99-0.14
## [157] evaluate_0.21 SeuratObject_4.1.4
## [159] renv_1.0.3 BiocManager_1.30.22
## [161] httpuv_1.6.11 RANN_2.6.1
## [163] tidyr_1.3.0 purrr_1.0.2.9000
## [165] polyclip_1.10-6 reshape_0.8.9
## [167] future_1.33.0 scattermore_1.2
## [169] rsvd_1.0.5 xtable_1.8-4
## [171] restfulr_0.0.15 later_1.3.1
## [173] viridisLite_0.4.2 tibble_3.2.1
## [175] GenomicAlignments_1.36.0 memoise_2.0.1
## [177] beeswarm_0.4.0 cluster_2.1.4
## [179] globals_0.16.2