aa-tRNA-seq Pipeline¶

A Snakemake pipeline for analyzing Oxford Nanopore direct RNA sequencing of aminoacylated tRNAs.

Overview¶

This pipeline processes Oxford Nanopore Technologies (ONT) aa-tRNA-seq data to distinguish between charged (aminoacylated) and uncharged tRNA molecules. It uses Remora machine learning models trained on nanopore signal data over the CCA 3' end of tRNA molecules.

flowchart TD
    subgraph Input
        POD5[POD5 files]
    end

    subgraph Demux [Optional Demultiplexing]
        W[warpdemux<br/>barcode classification]
    end

    subgraph Processing
        A[merge_pods] --> B[rebasecall<br/>Dorado + move tables]
        B --> C[ubam_to_fastq]
        C --> D[bwa_align<br/>tRNA + adapter reference]
    end

    subgraph Classification
        D --> F[classify_charging<br/>Remora ML model]
        B -.-> F
        A -.-> F
        F --> G[transfer_bam_tags]
    end

    subgraph Outputs
        G --> H[charging_prob<br/>per-read ML scores]
        G --> I[get_cca_trna_cpm<br/>CPM counts]
        G --> J[bcerror<br/>basecalling errors]
        G --> K[align_stats]
        G --> L[modkit pileups]
        L -.-> M[odds_ratios<br/>pairwise mod ORs]
        H -.-> M
        K -.-> N[qc_report<br/>Quarto HTML]
        H -.-> N
    end

    POD5 -.-> W
    W -.-> A
    POD5 --> A

Pipeline Steps¶

Given a directory of POD5 files, this pipeline:

Merges all POD5 files per sample into a single file
Rebasecalls with Dorado to generate unmapped BAM with move tables (required for Remora)
Converts BAM to FASTQ and aligns to tRNA + adapter reference with BWA MEM
Classifies charged vs. uncharged reads using a Remora model trained on nanopore signal over the CCA 3' end

The classification generates ML tag values (0-255) indicating the likelihood of aminoacylation. By default, ML values ≥200 are treated as charged, and values <200 as uncharged.

Key Features¶

Charging Classification: ML-based classification of charged vs uncharged tRNAs using Remora
Modification Calling: Detection of RNA modifications (pseU, m5C, m6A, inosine) via Dorado and Modkit
Full-Length Filtering: Only full-length tRNA reads with proper adapters are analyzed
Barcode Demultiplexing: Optional WarpDemuX support for pooled/multiplexed samples
Cluster Support: Optimized profiles for LSF and SLURM schedulers
Reproducibility: Git commit tracking and locked dependencies via Pixi

Quick Start¶

Bash
# Clone repository
git clone https://github.com/rnabioco/aa-tRNA-seq-pipeline.git
cd aa-tRNA-seq-pipeline

# Install environment
pixi install

# One-time setup: download tools, models, and test data
pixi run setup
pixi run dl-test-data

# Run test pipeline
pixi run dry-run   # Preview what will run
pixi run test      # Execute with test data

See Installation for detailed setup instructions.

Documentation Sections¶

Getting Started

Install the pipeline and run your first analysis

Installation
User Guide

Configure samples, parameters, and understand outputs

Configuration
Workflow

Detailed documentation of all rules and scripts

Overview
Cluster Setup

Configure LSF, SLURM, or other HPC schedulers

LSF Setup

Output Overview¶

The pipeline produces several key output files per sample:

Output	Description
`bam/final/{sample}/{sample}.bam`	Final BAM with charging tags (CL/CM/PT)
`summary/tables/{sample}/{sample}.charging.cpm.tsv.gz`	CPM-normalized charging counts per tRNA
`summary/tables/{sample}/{sample}.charging_prob.tsv.gz`	Per-read charging probabilities
`summary/modkit/{sample}/{sample}.pileup.bed.gz`	Modification pileup consensus

See Output Files for complete documentation.

Downstream Analysis¶

Downstream analysis to generate figures for the initial preprint can be found at: https://github.com/rnabioco/aa-tRNA-seq

Citation¶

If you use this pipeline, please cite:

White LK, Radakovic A, Sajek MP, Dobson K, Riemondy KA, Del Pozo S, Szostak JW, Hesselberth JR. Nanopore sequencing of intact aminoacylated tRNAs. Nat Commun. 2025;16:7781. doi:10.1038/s41467-025-62545-9

License¶

This project is licensed under the MIT License - see the LICENSE file for details.