Configuration
The pipeline uses YAML configuration files with a hierarchical inheritance system.
Configuration Hierarchy
flowchart TB
A[config/config-base.yml<br/>Default parameters] --> B[Your config.yml<br/>Project overrides]
B --> C[Command line<br/>--config key=value]
Your project config inherits all defaults from config-base.yml and only needs to specify overrides.
Configuration Files
| File |
Purpose |
config/config-base.yml |
Base defaults (included by Snakefile) |
config/config-test.yml |
Test data configuration |
config/config-preprint.yml |
Preprint analysis configuration |
config/config-demux-test.yml |
Demultiplexing test configuration |
Required Parameters
These parameters must be set in your config:
| YAML |
|---|
| # Path to sample file (TSV or YAML)
samples: config/samples-myproject.tsv
# Output directory for all pipeline results
output_directory: "results/myproject"
|
Reference Files
Basecalling Model
| YAML |
|---|
| # Path to Dorado model directory or model name for auto-download
base_calling_model: "resources/models/rna004_130bps_sup@v5.3.0"
|
The model is downloaded automatically if using a model name.
Reference FASTA
| YAML |
|---|
| # tRNA reference with adapters for BWA alignment
fasta: "resources/ref/sacCer3-mature-tRNAs-dual-adapt-v2.fa"
|
A BWA index is built automatically if it doesn't exist.
Adapter Sequences
The pipeline uses adapter sequences for reference validation and building. These must match what the Remora charging model was trained on:
| YAML |
|---|
| adapters:
# 5' adapter prepended to tRNA (23bp)
five_prime: "CCTAAGAGCAAGAAGAAGCCTGG"
# 3' adapter appended after tRNA CCA end (40bp)
three_prime: "GGCTTCTTCTTGCTCTTCCAACCTTGCCTTAAAAAAAAAA"
|
CCAGGC Junction
The charging classification uses the CCAGGC 6-mer junction where:
- CCA = last 3 bases of mature tRNA
- GGC = first 3 bases of 3' adapter
The 3' adapter must start with GGC for classification to work correctly.
Reference Validation and Building
The pipeline validates that the reference FASTA has proper adapter structure before alignment:
| YAML |
|---|
| reference:
# Mode: "validate" (default) or "build"
mode: "validate"
# For build mode: path to raw tRNA FASTA (without adapters)
raw_fasta: null
|
| Mode |
Description |
validate |
Check existing adapted reference has correct structure |
build |
Create adapted reference from raw tRNA sequences |
Validate Mode (Default)
Checks that each sequence in your reference has:
- Correct 5' adapter prefix
- tRNA portion ending with CCA
- Correct 3' adapter suffix (starting with GGC)
- Valid CCAGGC junction for charging classification
Build Mode
Creates an adapted reference from raw tRNA sequences:
- Reads raw tRNA FASTA (without adapters)
- Adds CCA to sequences missing it (with warning)
- Prepends 5' adapter
- Appends 3' adapter after CCA
- Verifies CCAGGC junction is created
| YAML |
|---|
| # Example: building reference from raw tRNAs
reference:
mode: "build"
raw_fasta: "resources/ref/my_raw_trnas.fa"
|
Custom References
To use a custom reference, either:
- Use
mode: "validate" with a pre-adapted FASTA
- Use
mode: "build" with raw tRNA sequences (CCA endings required or will be added)
Remora Models
| YAML |
|---|
| # Kmer level table for signal extraction (from ONT kmer_models repo)
remora_kmer_table: "resources/kmers/9mer_levels_v1.txt"
# Trained ML model for charging classification
remora_cca_classifier: "resources/models/cca_classifier.pt"
|
| YAML |
|---|
| # Dorado basecaller version
dorado_version: 1.4.0
dorado_model: rna004_130bps_sup@v5.3.0
|
Modkit is managed via Pixi and specified in pixi.toml.
Modkit Thresholds
Optimized thresholds for modification calling based on ModkitOpt:
| YAML |
|---|
| modkit:
# Global canonical base confidence threshold
filter_threshold: 0.5
# Per-modification pass thresholds
mod_thresholds:
a: 0.99 # m6A (N6-methyladenosine)
m: 0.99 # m5C (5-methylcytosine)
"17802": 0.995 # pseU (pseudouridine)
"17596": 0.99 # inosine
|
These thresholds improve F1 scores by 51% (m6A) and 1251% (pseU) compared to defaults.
Command-Line Options
Dorado Options
| YAML |
|---|
| opts:
dorado: " --modified-bases pseU m5C inosine_m6A --emit-moves "
|
| Option |
Description |
--modified-bases |
Modifications to call during basecalling |
--emit-moves |
Output move tables (required for Remora) |
BWA Options
| YAML |
|---|
| opts:
bwa: " -W 13 -k 6 -T 20 -x ont2d"
|
| Option |
Description |
-W 13 |
Band width for banded alignment |
-k 6 |
Minimum seed length |
-T 20 |
Minimum alignment score |
-x ont2d |
ONT 2D read preset |
These parameters are optimized for tRNA alignment based on Novoa lab research.
BAM Filtering
| YAML |
|---|
| opts:
bam_filter: "-5 24 -3 23 -s"
|
| Option |
Description |
-5 24 |
Allow up to 24bp 5' truncation |
-3 23 |
Require at least 23bp 3' adapter |
-s |
Require positive strand |
Coverage Options
| YAML |
|---|
| opts:
coverage: "--filterRNAstrand 'reverse' --samFlagExclude 256"
|
| Option |
Description |
--filterRNAstrand 'reverse' |
Filter by RNA strand |
--samFlagExclude 256 |
Exclude non-primary alignments |
WarpDemuX Demultiplexing
See WarpDemuX for more information.
For multiplexed samples, enable barcode demultiplexing:
| YAML |
|---|
| warpdemux:
enabled: true
barcode_kit: "WDX4_tRNA_rna004_v1_0"
save_boundaries: true
threads: 8
|
Available Barcode Kits
| Kit |
Barcodes |
Notes |
WDX4_tRNA_rna004_v1_0 |
bc03, bc04, bc05, bc07 |
Recommended, +3-7% recovery |
WDX4b_tRNA_rna004_v1_0 |
bc04, bc05, bc07, bc11 |
Alternative |
Protocol Compatibility
WarpDemuX-tRNA models work with Nano-tRNAseq protocol only. They do NOT work with Thomas splint adapter data.
See Demultiplexing for detailed setup.
Example Configurations
Minimal Config
| YAML |
|---|
| samples: config/samples.tsv
output_directory: "results/analysis"
|
Custom Reference
| YAML |
|---|
| samples: config/samples.tsv
output_directory: "results/analysis"
fasta: "path/to/my/reference.fa"
remora_cca_classifier: "path/to/my/model.pt"
|
With Demultiplexing
| YAML |
|---|
| samples: config/samples.yml # YAML format required
output_directory: "results/analysis"
warpdemux:
enabled: true
barcode_kit: "WDX4_tRNA_rna004_v1_0"
|
High-Stringency Modification Calling
| YAML |
|---|
| samples: config/samples.tsv
output_directory: "results/analysis"
modkit:
filter_threshold: 0.7
mod_thresholds:
a: 0.995
m: 0.995
"17802": 0.999
"17596": 0.995
|
Command-Line Overrides
Override any config parameter at runtime:
| Bash |
|---|
| pixi run snakemake --configfile=config/config.yml \
--config output_directory="results/test2"
|
Next Steps