Skip to content

aa-tRNA-seq Pipeline

A Snakemake pipeline for analyzing Oxford Nanopore direct RNA sequencing of aminoacylated tRNAs.

GitHub Snakemake

Overview

This pipeline processes Oxford Nanopore Technologies (ONT) aa-tRNA-seq data to distinguish between charged (aminoacylated) and uncharged tRNA molecules. It uses Remora machine learning models trained on nanopore signal data over the CCA 3' end of tRNA molecules.

flowchart TD
    subgraph Input
        POD5[POD5 files]
    end

    subgraph Demux [Optional Demultiplexing]
        W[warpdemux<br/>barcode classification]
    end

    subgraph Processing
        A[merge_pods] --> B[rebasecall<br/>Dorado + move tables]
        B --> C[ubam_to_fastq]
        C --> D[bwa_align<br/>tRNA + adapter reference]
    end

    subgraph Classification
        D --> F[classify_charging<br/>Remora ML model]
        B -.-> F
        A -.-> F
        F --> G[transfer_bam_tags]
    end

    subgraph Outputs
        G --> H[charging_prob<br/>per-read ML scores]
        G --> I[get_cca_trna_cpm<br/>CPM counts]
        G --> J[bcerror<br/>basecalling errors]
        G --> K[align_stats]
        G --> L[modkit pileups]
        L -.-> M[odds_ratios<br/>pairwise mod ORs]
        H -.-> M
        K -.-> N[qc_report<br/>Quarto HTML]
        H -.-> N
    end

    POD5 -.-> W
    W -.-> A
    POD5 --> A

Pipeline Steps

Given a directory of POD5 files, this pipeline:

  1. Merges all POD5 files per sample into a single file
  2. Rebasecalls with Dorado to generate unmapped BAM with move tables (required for Remora)
  3. Converts BAM to FASTQ and aligns to tRNA + adapter reference with BWA MEM
  4. Classifies charged vs. uncharged reads using a Remora model trained on nanopore signal over the CCA 3' end

The classification generates ML tag values (0-255) indicating the likelihood of aminoacylation. By default, ML values ≥200 are treated as charged, and values <200 as uncharged.

Key Features

  • Charging Classification: ML-based classification of charged vs uncharged tRNAs using Remora
  • Modification Calling: Detection of RNA modifications (pseU, m5C, m6A, inosine) via Dorado and Modkit
  • Full-Length Filtering: Only full-length tRNA reads with proper adapters are analyzed
  • Barcode Demultiplexing: Optional WarpDemuX support for pooled/multiplexed samples
  • Cluster Support: Optimized profiles for LSF and SLURM schedulers
  • Reproducibility: Git commit tracking and locked dependencies via Pixi

Quick Start

Bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Clone repository
git clone https://github.com/rnabioco/aa-tRNA-seq-pipeline.git
cd aa-tRNA-seq-pipeline

# Install environment
pixi install

# One-time setup: download tools, models, and test data
pixi run setup
pixi run dl-test-data

# Run test pipeline
pixi run dry-run   # Preview what will run
pixi run test      # Execute with test data

See Installation for detailed setup instructions.

Documentation Sections

  • Getting Started


    Install the pipeline and run your first analysis

    Installation

  • User Guide


    Configure samples, parameters, and understand outputs

    Configuration

  • Workflow


    Detailed documentation of all rules and scripts

    Overview

  • Cluster Setup


    Configure LSF, SLURM, or other HPC schedulers

    LSF Setup

Output Overview

The pipeline produces several key output files per sample:

Output Description
bam/final/{sample}/{sample}.bam Final BAM with charging tags (CL/CM/PT)
summary/tables/{sample}/{sample}.charging.cpm.tsv.gz CPM-normalized charging counts per tRNA
summary/tables/{sample}/{sample}.charging_prob.tsv.gz Per-read charging probabilities
summary/modkit/{sample}/{sample}.pileup.bed.gz Modification pileup consensus

See Output Files for complete documentation.

Downstream Analysis

Downstream analysis to generate figures for the initial preprint can be found at: https://github.com/rnabioco/aa-tRNA-seq

Citation

If you use this pipeline, please cite:

White LK, Radakovic A, Sajek MP, Dobson K, Riemondy KA, Del Pozo S, Szostak JW, Hesselberth JR. Nanopore sequencing of intact aminoacylated tRNAs. Nat Commun. 2025;16:7781. doi:10.1038/s41467-025-62545-9

License

This project is licensed under the MIT License - see the LICENSE file for details.