
scraps (Single Cell RNA PolyA Site Discovery) is currently implemented as a Snakemake pipeline for 10X Genomics 3’ end v2/3 libraries (and other platforms with similar library structure, including Drop-seq, Microwell-seq, and BD Rhapsody). If long Read1 is available (estimated ~6% of SRA-deposited data, or now planning new experiments), positional information will be calculated from paired realignment; otherwise, the less optimal anchored Read2 approach is used. scraps will eventually be expanded for analyzing a range of RNA processing changes in single-cell RNA-seq data.
For additional discussions and usage cases, please see bioRxiv preprint.

scraps requires the following as input (defined in config.yaml):
conda env create -f scraps_conda.yml
conda activate scraps_conda
Configure your samples in config.yaml under the SAMPLES section
snakemake --configfile config.yaml --resources total_impact=5 --keep-going
To run test data, simply execute:
snakemake --snakefile Snakefile \
--configfile config.yaml \
--resources total_impact=5 \
--keep-going
Notes: total_impact is set to 5 for each sample, change this to control how many samples are processed in parallel
scraps uses two main configuration files for flexible pipeline setup:
Main pipeline configuration file containing:
ref/)platform: Default sequencing platform (e.g., illumina, element, ultima)chemistry: Default chemistry type (e.g., chromiumV3, chromiumV2, dropseq)alignments: Which alignment modes to run ([R1, R2, paired])SAMPLES:
sample_name:
basename: sample- # FASTQ file prefix
platform: illumina # Sequencing platform
chemistry: chromiumV3 # Platform chemistry
alignments: # Optional: override default alignments
- R2
- paired
Platform and chemistry-specific parameters organized hierarchically. Each chemistry type (chromiumV3, chromiumV2, dropseq, microwellseq, bd, indrop) contains:
cutadapt_R1 / cutadapt_paired: Adapter trimming parametersSTAR_R1 / STAR_R2: STAR alignment parameters (UMI/barcode positions)The pipeline uses hierarchical configuration lookup to determine parameters for each sample:
┌─────────────────────────────────────────────────────────┐
│ 1. Sample-specific settings (config.yaml SAMPLES) │
│ Highest priority - overrides everything │
└────────────────────┬────────────────────────────────────┘
│ If not found ↓
┌─────────────────────────────────────────────────────────┐
│ 2. Chemistry + Platform (chemistry.yaml) │
│ e.g., chromiumV3 → illumina → STAR_R1 │
└────────────────────┬────────────────────────────────────┘
│ If not found ↓
┌─────────────────────────────────────────────────────────┐
│ 3. Chemistry defaults (chemistry.yaml) │
│ e.g., chromiumV3 → bc_whitelist │
└────────────────────┬────────────────────────────────────┘
│ If not found ↓
┌─────────────────────────────────────────────────────────┐
│ 4. Global defaults (config.yaml DEFAULTS) │
│ Lowest priority - fallback values │
└─────────────────────────────────────────────────────────┘
This allows platform-specific customization (e.g., Illumina vs Ultima Genomics) while maintaining chemistry-specific defaults.
| Platform | Library (BC+UMI+A) | Setting | Test data | | :——–|:————| :————| :———| | 10x Chromium V3 | 16 + 12 + 30 | chromiumV3 | ✓ | | 10x V3 - Ultima Genomics | adapter + 16 + 9 + 3 ignored + 8 | chromiumV3UG | | | 10x Chromium V2 | 16 + 10 + 30 | chromiumV2 | ✓ | | 10x Chromium Visium | 16 + 10 + 30 | visium | | | Drop-seq | 12 + 8 + 30 | dropseq | ✓ | | Microwell-seq | 6x3 + 6 + 30 | microwellseq | ✓ | | BD Rhapsody | 9x3 + 8 + 18 | bd | | | inDrop | 8 + 6 + 18 | indrop | |
Custom chemistry supported, by editing chemistry.yaml. Also see synthetic FASTQ tool.
chr11 215106 215107 1
chr11 689216 689217 1
chr11 812862 812863 1
chr11 812870 812871 2
chr11 812871 812872 2
gene cell count
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M) AACTCCCGTTCCTCCA 1
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M) CCCATACGTTAAAGAC 1
AC135178.2_NA_ENSG00000263809_chr17_8377523_-_Intron,RPL26_6154_ENSG00000161970_chr17_8377523_-_3'UTR(M) CGTCCATTCGACAGCC 1
ACTG1_71_ENSG00000184009_chr17_81509999_-_3'UTR(M) ACATCAGGTGATGTCT 1
ADRM1_11047_ENSG00000130706_chr20_62308862_+_3'UTR(M) CAGCGACTCTGCCCTA 1
R functions available for importing results into Seurat object, and finding differential PA site usage. Alternatively, a package of the same functions can be installed with remotes::install_github("rnabioco/scrapR")
git clone https://github.com/rnabioco/scraps
cd scraps
conda env create -f scraps_conda.yml
conda activate scraps_conda
Alternatively, ensure all dependencies are installed and available in your PATH.
Place STAR index in the ref/ directory or specify custom path in config.yaml (STAR_INDEX)
Download link (extract after download):
Whitelist paths are configured per chemistry in chemistry.yaml. Place downloaded whitelists in the ref/ directory.
Download links (extract after download):
ref/737K-august-2016.txtref/3M-february-2018.txtUpdate chemistry.yaml with the correct paths:
chromiumV3:
bc_whitelist: ref/3M-february-2018.txt
chromiumV2:
bc_whitelist: ref/737K-august-2016.txt
Edit config.yaml to specify:
*_R1.fastq.gz, *_R2.fastq.gz)ref/polyadb32.hg38.saf.gz or ref/polyadb32.mm10.saf.gz)basename: FASTQ filename prefixchemistry: Platform chemistry type (chromiumV2, chromiumV3, dropseq, microwellseq, bd, indrop)platform: Sequencing platform (illumina, element, ultima)alignments: Optional list of alignment modes to runExample:
SAMPLES:
my_sample:
basename: SRR9887775_ # Matches SRR9887775_R1.fastq.gz, SRR9887775_R2.fastq.gz
chemistry: chromiumV3
platform: illumina
Note: SRA accessions (e.g., SRR9887775) can be used directly as basenames for automatic download.
# Dry-run to check configuration
snakemake -npr --configfile config.yaml
# Run pipeline
snakemake --configfile config.yaml --resources total_impact=5 --keep-going
# Or with specific core count
snakemake -j 8 --configfile config.yaml
Sample test results can be found at inst/test_output/
scraps requires the following executables in your PATH:
Recommended: Use Conda to manage these dependencies:
conda env create -f scraps_conda.yml
conda activate scraps_conda
All required dependencies (including zsh) will be installed automatically.
Docker image for automated deployment can also be found at https://hub.docker.com/r/rnabioco/scraps.
Please also see the Snakemake documentation for general information on executing and manipulating snakemake pipelines.
For detailed development guidelines including code style conventions, testing procedures, and instructions for adding new rules or chemistry configurations, see AGENTS.md.
Key resources:

1) Measuring internal priming as indicator of apoptotic cytoplasmic poly(A) RNA decay
(Based on widespread RNA decay during apoptosis: Liu and Fu et al.) Use SAF (hg38 version provided in ref subdirectory) file marking all gene regions (5’UTR, intron, CDS, 3’UTR), and helper R functions to process output. Please see Rmarkdown notebook for more.
2) Accurate intron/exon quantification for RNA velocity
(See discussions on quantification approaches and pitfalls: Soneson et al.)
| Consideration | scraps |
|---|---|
| Avoid feature double-counting | ✓ |
| Take strandedness into account | ✓ |
| Avoid count substraction | ✓ |
| Resolve spliced vs unspliced target | ✓ |
| Speed | ✓ |