aa-tRNA-seq Pipeline¶
A Snakemake pipeline for analyzing Oxford Nanopore direct RNA sequencing of aminoacylated tRNAs.
Overview¶
This pipeline processes Oxford Nanopore Technologies (ONT) aa-tRNA-seq data to distinguish between charged (aminoacylated) and uncharged tRNA molecules. It uses Remora machine learning models trained on nanopore signal data over the CCA 3' end of tRNA molecules.
flowchart TD
subgraph Input
POD5[POD5 files]
end
subgraph Demux [Optional Demultiplexing]
W[warpdemux<br/>barcode classification]
end
subgraph Processing
A[merge_pods] --> B[rebasecall<br/>Dorado + move tables]
B --> C[ubam_to_fastq]
C --> D[bwa_align<br/>tRNA + adapter reference]
end
subgraph Classification
D --> F[classify_charging<br/>Remora ML model]
B -.-> F
A -.-> F
F --> G[transfer_bam_tags]
end
subgraph Outputs
G --> H[charging_prob<br/>per-read ML scores]
G --> I[get_cca_trna_cpm<br/>CPM counts]
G --> J[bcerror<br/>basecalling errors]
G --> K[align_stats]
G --> L[modkit pileups]
L -.-> M[odds_ratios<br/>pairwise mod ORs]
H -.-> M
K -.-> N[qc_report<br/>Quarto HTML]
H -.-> N
end
POD5 -.-> W
W -.-> A
POD5 --> A
Pipeline Steps¶
Given a directory of POD5 files, this pipeline:
- Merges all POD5 files per sample into a single file
- Rebasecalls with Dorado to generate unmapped BAM with move tables (required for Remora)
- Converts BAM to FASTQ and aligns to tRNA + adapter reference with BWA MEM
- Classifies charged vs. uncharged reads using a Remora model trained on nanopore signal over the CCA 3' end
The classification generates ML tag values (0-255) indicating the likelihood of aminoacylation. By default, ML values ≥200 are treated as charged, and values <200 as uncharged.
Key Features¶
- Charging Classification: ML-based classification of charged vs uncharged tRNAs using Remora
- Modification Calling: Detection of RNA modifications (pseU, m5C, m6A, inosine) via Dorado and Modkit
- Full-Length Filtering: Only full-length tRNA reads with proper adapters are analyzed
- Barcode Demultiplexing: Optional WarpDemuX support for pooled/multiplexed samples
- Cluster Support: Optimized profiles for LSF and SLURM schedulers
- Reproducibility: Git commit tracking and locked dependencies via Pixi
Quick Start¶
| Bash | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
See Installation for detailed setup instructions.
Documentation Sections¶
-
Getting Started
Install the pipeline and run your first analysis
-
User Guide
Configure samples, parameters, and understand outputs
-
Workflow
Detailed documentation of all rules and scripts
-
Cluster Setup
Configure LSF, SLURM, or other HPC schedulers
Output Overview¶
The pipeline produces several key output files per sample:
| Output | Description |
|---|---|
bam/final/{sample}/{sample}.bam |
Final BAM with charging tags (CL/CM/PT) |
summary/tables/{sample}/{sample}.charging.cpm.tsv.gz |
CPM-normalized charging counts per tRNA |
summary/tables/{sample}/{sample}.charging_prob.tsv.gz |
Per-read charging probabilities |
summary/modkit/{sample}/{sample}.pileup.bed.gz |
Modification pileup consensus |
See Output Files for complete documentation.
Downstream Analysis¶
Downstream analysis to generate figures for the initial preprint can be found at: https://github.com/rnabioco/aa-tRNA-seq
Citation¶
If you use this pipeline, please cite:
White LK, Radakovic A, Sajek MP, Dobson K, Riemondy KA, Del Pozo S, Szostak JW, Hesselberth JR. Nanopore sequencing of intact aminoacylated tRNAs. Nat Commun. 2025;16:7781. doi:10.1038/s41467-025-62545-9
License¶
This project is licensed under the MIT License - see the LICENSE file for details.