scraps

scraps extracts mRNA polyadenylation sites from “TVN”-primed single-cell RNA-seq libraries at near-nucleotide resolution.

scraps (Single Cell RNA PolyA Site Discovery) is currently implemented as a Snakemake pipeline for 10X Genomics 3’ end v2/3 libraries (and other platforms with similar library structure, including Drop-seq, Microwell-seq, and BD Rhapsody). If long Read1 is available (estimated ~6% of SRA-deposited data, or now planning new experiments), positional information will be calculated from paired realignment; otherwise, the less optimal anchored Read2 approach is used. scraps will eventually be expanded for analyzing a range of RNA processing changes in single-cell RNA-seq data.

For additional discussions and usage cases, please see bioRxiv preprint.

Example usage
Supported scRNA-seq platforms
Output
Setup
Dependencies
Extended function

Example usage

scraps requires the following as input (defined in config.yaml and sample_fastqs.tsv):

10X Genomics 3’ v2/3 single-cell FASTQs or other platforms (with names “_R1.fastq.gz”” and “_R2.fastq.gz””)
A STAR genome index (must be generated with STAR 2.7.4a and above)
Whitelist for cell barcodes (optional but recommended to speed up run time)
A featureCounts reference (SAF-formatted polya_db, hg38 and mm10 files are included in ref subdirectory)

To run test data, simply execute:

snakemake --snakefile Snakefile \
  --configfile config.yaml \
  --resources total_impact=5 \
  --keep-going

DAG steps illustration

submit jobs in cluster mode

Notes: total_impact is set to 5 for each sample, change this to control how many samples are processed in parallel

Supported scRNA-seq platforms

| Platform | Library (BC+UMI+A) | Setting | Test data | | :——–|:————| :————| :———| | 10x Chromium V3 | 16 + 12 + 30 | chromiumV3 | ✓ | | 10x V3 - Ultima Genomics | adapter + 16 + 9 + 3 ignored + 8 | chromiumV3UG | | | 10x Chromium V2 | 16 + 10 + 30 | chromiumV2 | ✓ | | 10x Chromium Visium | 16 + 10 + 30 | visium | | | Drop-seq | 12 + 8 + 30 | dropseq | ✓ | | Microwell-seq | 6x3 + 6 + 30 | microwellseq | ✓ | | BD Rhapsody | 9x3 + 8 + 18 | bd | | | inDrop | 8 + 6 + 18 | indrop | |

Custom chemistry supported, by editing chemistry.json. Also see synthetic FASTQ tool.

Output

bedgraph : TVN-priming site pileup

chr11   215106  215107  1
chr11   689216  689217  1
chr11   812862  812863  1
chr11   812870  812871  2
chr11   812871  812872  2

count table : +-10 around PolyA_DB sites, by cell barcode

gene    cell    count
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M)        AACTCCCGTTCCTCCA        1
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M)        CCCATACGTTAAAGAC        1
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M)        CGTCCATTCGACAGCC        1
ACTG1_71_ENSG00000184009_chr17_81509999_-_3'UTR(M)      ACATCAGGTGATGTCT        1
ADRM1_11047_ENSG00000130706_chr20_62308862_+_3'UTR(M)   CAGCGACTCTGCCCTA        1

html report : various metrics from steps in the pipeline

R functions available for importing results into Seurat object, and finding differential PA site usage. Alternatively, a package of the same functions can be installed with remotes::install_github("rnabioco/scrapR")

Setup

Clone repository: ` git clone https://github.com/rnabioco/scraps `
Check dependencies (ideally with Conda, see below)
Place appropriate STAR index in index/ folder, and barcode whitelists in whitelist/
Download links(all files need to be extracted): GRCh38 index; 10x V2 barcodes; 10x V3 barcodes
Edit settings in config.yaml
List files in sample_fastqs.tsv, note that SRA accessions in the form of SRR9887775 are supported for direct download
Run! (sample results can be found at inst/test_output/)

Dependencies

scraps requires the following executables in your PATH:

Python 3 (developed with version 3.8.5)
Snakemake (developed with version 3.11.2)
UMI-tools (developed with version 1.1.1)
cutadapt (developed with version 3.4)
STAR (developed with version 2.7.9a)
Samtools (developed with version 1.3.1)
Bedtools (developed with version 2.30.0)
Subread (developed with version 1.6.2)
MultiQC (developed with version 1.9)

Alternatively, we recommend using Conda to manage these dependencies, simply with: conda env create -f scraps_conda.yml and then conda activate scraps_conda

Docker image for automated deployment can also be found at https://hub.docker.com/r/rnabioco/scraps.

Please also see the Snakemake documentation for general information on executing and manipulating snakemake pipelines.

Extended function

1) Measuring internal priming as indicator of apoptotic cytoplasmic poly(A) RNA decay

(Based on widespread RNA decay during apoptosis: Liu and Fu et al.) Use SAF (hg38 version provided in ref subdirectory) file marking all gene regions (5’UTR, intron, CDS, 3’UTR), and helper R functions to process output. Please see Rmarkdown notebook for more.

2) Accurate intron/exon quantification for RNA velocity

(See discussions on quantification approaches and pitfalls: Soneson et al.)

Consideration	scraps
Avoid feature double-counting	✓
Take strandedness into account	✓
Avoid count substraction	✓
Resolve spliced vs unspliced target	✓
Speed	✓

This site is open source. Improve this page.