Rules Reference¶
Complete documentation for all Snakemake rules in the pipeline.
Processing Rules¶
These rules form the core data processing pipeline.
merge_pods¶
Merge all POD5 files for a sample into a single file.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | All POD5 files for sample |
| Output | pod5/{sample}/{sample}.pod5 |
| Threads | 12 |
| GPU | No |
Command:
| Bash | |
|---|---|
1 | |
rebasecall¶
Re-basecall POD5 files with Dorado, emitting move tables for Remora.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | Merged POD5, mod model sentinel |
| Output | bam/rebasecall/{sample}/{sample}.rbc.bam |
| GPU | Yes |
| Parameters | base_calling_model, opts.dorado, models_dir |
Command:
| Bash | |
|---|---|
1 | |
Notes:
- Depends on
download_mod_modelsrule to pre-download modification models - Respects
CUDA_VISIBLE_DEVICESenvironment variable - Default options include
--modified-bases pseU m5C inosine_m6A --emit-moves
ubam_to_fastq¶
Extract reads from unmapped BAM to FASTQ format.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | Rebasecalled BAM |
| Output | fq/{sample}/{sample}.fq.gz |
Command:
| Bash | |
|---|---|
1 | |
bwa_idx¶
Build BWA index for reference FASTA.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | Reference FASTA |
| Output | .amb, .ann, .bwt, .pac, .sa files |
Command:
| Bash | |
|---|---|
1 | |
Notes:
- Only runs once per reference
- Index files are stored alongside the FASTA
bwa_align¶
Align reads to tRNA reference with BWA MEM.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | FASTQ (EDX-filtered for EDX samples), BWA index |
| Output | bam/aln/{sample}/{sample}.aln.bam, .bai |
| Threads | 12 |
| Parameters | fasta, opts.bwa |
Command:
| Bash | |
|---|---|
1 2 3 4 | |
Filtering:
-F 20: Remove unmapped reads (0x4) and reverse-strand reads (0x10)
Default BWA options:
-W 13 -k 6 -T 20 -x ont2d(RNA-optimized)
classify_charging¶
Run Remora ML model to classify charged vs uncharged reads.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | POD5, aligned BAM |
| Output | bam/charging/{sample}/{sample}.charging.bam, .bai |
| Threads | 8 |
| GPU | No (CPU) |
| Parameters | remora_cca_classifier |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
Output tags:
ML: Modification likelihood (0-255)MM: Modification metadata
transfer_bam_tags¶
Transfer and rename charging tags from Remora output to classified BAM.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | Charging BAM, aligned BAM |
| Output | bam/classified/{sample}/{sample}.bam, .bai |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 7 | |
Tag renaming:
ML→CL: Charging likelihoodMM→CM: Charging metadata
This prevents interference with standard SAM modification tags.
add_adapter_tags¶
Detect adapter positions using parasail alignment and add PT tags to create final BAM.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | Classified BAM |
| Output | bam/adapter_tagged/{sample}/{sample}.bam, .bai |
| Parameters | adapters.* config options |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 7 8 | |
PT tag format:
| Text Only | |
|---|---|
1 2 | |
Notes:
- Uses parasail Smith-Waterman alignment to find adapter positions
- Can infer 5' adapter presence from alignment position when adapter is truncated
- Output goes to
bam/adapter_tagged/; the downstreamfinalize_bamrule produces the final BAM atbam/final/
finalize_bam¶
Produce the final BAM for downstream analysis. Symlinks the adapter-tagged BAM as the final output. EDX filtering now happens early in the pipeline (before alignment) via the detect_edx_adapters / filter_fastq_by_edx / filter_pod5_by_edx rules.
File: workflow/rules/aatrnaseq-process.smk
| Property | Value |
|---|---|
| Input | bam/adapter_tagged/{sample}/{sample}.bam, .bai |
| Output | bam/final/{sample}/{sample}.bam, .bai |
Notes:
- Creates symlinks to the adapter-tagged BAM (zero-copy passthrough)
- This is the final BAM with all tags: CL/CM (charging) and PT (adapters)
Charging Analysis Rules¶
get_cca_trna¶
Extract charging probability (CL tag) per read to TSV.
File: workflow/rules/aatrnaseq-charging.smk
| Property | Value |
|---|---|
| Input | Final BAM |
| Output | summary/tables/{sample}/{sample}.charging_prob.tsv.gz |
Command:
| Bash | |
|---|---|
1 | |
Output columns:
| Column | Description |
|---|---|
| read_id | Nanopore read identifier |
| tRNA | Aligned reference name |
| charging_likelihood | CL tag value (0-255) |
get_cca_trna_cpm¶
Calculate CPM-normalized charging counts per tRNA.
File: workflow/rules/aatrnaseq-charging.smk
| Property | Value |
|---|---|
| Input | Charging probability TSV |
| Output | summary/tables/{sample}/{sample}.charging.cpm.tsv.gz |
| Parameters | ml_thresh=200 (hardcoded) |
Command:
| Bash | |
|---|---|
1 2 3 4 | |
Classification:
- Score ≥ 200: Charged
- Score < 200: Uncharged
Output columns:
| Column | Description |
|---|---|
| tRNA | Reference name |
| counts_charged | Charged read count |
| counts_uncharged | Uncharged read count |
| cpm_charged | Charged CPM |
| cpm_uncharged | Uncharged CPM |
Quality Control Rules¶
compute_reference_similarity¶
Compute pairwise sequence similarity matrix for the reference FASTA.
File: workflow/rules/aatrnaseq-qc.smk
| Property | Value |
|---|---|
| Input | Reference FASTA |
| Output | summary/qc/reference_similarity.tsv |
| GPU | No |
Command:
| Bash | |
|---|---|
1 | |
Notes:
- Uses Needleman-Wunsch global alignment to compute all-vs-all pairwise similarity
- Percent identity = matches / max(len_seq1, len_seq2) * 100
- Identifies potential cross-mapping issues from homologous tRNA sequences
base_calling_error¶
Extract per-position basecalling error metrics.
File: workflow/rules/aatrnaseq-qc.smk
| Property | Value |
|---|---|
| Input | Final BAM, reference FASTA |
| Output | summary/tables/{sample}/{sample}.bcerror.tsv.gz |
Command:
| Bash | |
|---|---|
1 | |
Output columns:
| Column | Description |
|---|---|
| Position | Reference position |
| Coverage | Read coverage |
| A_Freq, T_Freq, G_Freq, C_Freq | Base frequencies |
| MismatchFreq | Mismatch rate |
| InsertionFreq | Insertion rate |
| DeletionFreq | Deletion rate |
| BCErrorFreq | Combined error rate |
| MeanQual | Mean base quality |
align_stats¶
Summarize alignment statistics across pipeline stages.
File: workflow/rules/aatrnaseq-qc.smk
| Property | Value |
|---|---|
| Input | Unmapped BAM, aligned BAM, classified BAM |
| Output | summary/tables/{sample}/{sample}.align_stats.tsv.gz |
Command:
| Bash | |
|---|---|
1 2 3 4 5 | |
Output columns:
| Column | Description |
|---|---|
| bam_file | Source BAM path |
| id | Sample identifier |
| info | Pipeline stage |
| n_reads | Total reads |
| pct_mapped | Percent mapped |
| mean_length | Mean read length |
| mean_bq | Mean base quality |
| mean_mapq | Mean mapping quality |
remora_signal_stats¶
Extract raw signal metrics using Remora API.
File: workflow/rules/aatrnaseq-qc.smk
| Property | Value |
|---|---|
| Input | Final BAM, POD5 |
| Output | summary/tables/{sample}/{sample}.remora.tsv.gz |
| Parameters | remora_kmer_table, opts.remora |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 | |
Notes:
- Only runs if
remora_kmer_tableis configured - Uses custom Remora fork for metrics extraction
Modification Rules¶
bam_to_coverage¶
Generate BedGraph coverage tracks.
File: workflow/rules/aatrnaseq-modifications.smk
| Property | Value |
|---|---|
| Input | Final BAM |
| Output | summary/tables/{sample}/{sample}.{cpm,counts}.bg.gz (protected) |
| Threads | 4 |
| Parameters | opts.coverage |
Command:
| Bash | |
|---|---|
1 2 3 | |
modkit_pileup¶
Generate per-site modification consensus.
File: workflow/rules/aatrnaseq-modifications.smk
| Property | Value |
|---|---|
| Input | Final BAM |
| Output | summary/modkit/{sample}/{sample}.pileup.bed.gz |
| Parameters | fasta, modkit thresholds |
Command:
| Bash | |
|---|---|
1 | |
modkit_extract_calls¶
Extract per-read modification calls.
File: workflow/rules/aatrnaseq-modifications.smk
| Property | Value |
|---|---|
| Input | Final BAM |
| Output | summary/modkit/{sample}/{sample}.mod_calls.tsv.gz |
| Memory | 96 GB |
| Parameters | fasta, modkit thresholds |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 7 | |
modkit_extract_full¶
Export comprehensive modification information.
File: workflow/rules/aatrnaseq-modifications.smk
| Property | Value |
|---|---|
| Input | Final BAM |
| Output | summary/modkit/{sample}/{sample}.mod_full.tsv.gz |
| Threads | 12 |
| Memory | 48 GB |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 7 8 | |
Utility Rules¶
generate_squiggy_session¶
Generate a Squiggy session JSON file for loading pipeline outputs in Positron IDE.
File: workflow/rules/common.smk
| Property | Value |
|---|---|
| Input | All final BAMs, all merged POD5s, reference FASTA |
| Output | squiggy-session.json (at output root) |
| GPU | No |
Command:
| Bash | |
|---|---|
1 2 3 4 5 | |
Notes:
- Generates relative paths to POD5, BAM, and FASTA files for each sample
- Computes MD5 checksums for file integrity verification
- Includes default plot options for the Squiggy viewer (eventalign mode, z-normalization)
Odds Ratio Rules¶
compute_odds_ratios¶
Compute per-tRNA pairwise modification odds ratios.
File: workflow/rules/aatrnaseq-odds-ratios.smk
| Property | Value |
|---|---|
| Input | Modkit extract calls TSV, charging probability TSV |
| Output | summary/tables/{sample}/{sample}.odds_ratios.tsv.gz |
| Parameters | odds_ratios.ml_threshold (default: 200), odds_ratios.min_coverage (default: 10) |
Command:
| Bash | |
|---|---|
1 2 3 4 5 6 | |
Notes:
- For each tRNA, tests whether modification at position X is correlated with modification at position Y (and with charging status) via 2x2 contingency tables
- Uses Haldane correction for zero cells and Fisher's exact test
- Applies BH correction across all results
- Charging status is represented as position 999
Output columns:
| Column | Description |
|---|---|
tRNA |
Reference tRNA name |
pos1 |
First position |
pos2 |
Second position (999 = charging) |
n00, n01, n10, n11 |
Contingency table counts |
total_obs |
Total observations |
odds_ratio |
Odds ratio |
log_odds_ratio |
Log odds ratio |
se_log_or |
Standard error of log OR |
ci_lower, ci_upper |
95% confidence interval |
fisher_or |
Fisher's exact test OR |
p_value |
Fisher's exact test p-value |
p_adjusted |
BH-adjusted p-value |
Report Rules¶
render_combined_qc_report¶
Render a combined Quarto QC report with per-sample tabs.
File: workflow/rules/aatrnaseq-report.smk
| Property | Value |
|---|---|
| Input | Alignment stats, charging probabilities, charging CPM, basecalling errors (all samples) |
| Output | reports/qc_report.html |
| Parameters | ml-threshold, report.custom_include |
Command:
| Bash | |
|---|---|
1 2 3 4 | |
Notes:
- Requires the
reportpixi environment:pixi run -e report snakemake render_combined_qc_report - Generates faceted QC plots with per-sample patchwork tabs
- Supports optional custom Quarto include via
report.custom_includeconfig
Demultiplexing Rules¶
See Demultiplexing for detailed documentation.
warpdemux¶
Run WarpDemuX barcode prediction.
| Output | demux/warpdemux_output/{run_id}/ |
parse_warpdemux¶
Parse predictions to barcode mapping file.
| Output | demux/read_ids/{run_id}/barcode_mapping.tsv.gz |
extract_sample_reads¶
Filter read IDs for specific sample's barcode.
| Output | demux/read_ids/{sample}.txt |
split_pod5¶
Split merged POD5 by sample using read ID list.
| Output | demux/pod5/{sample}.pod5 |
detect_edx_adapters¶
Detect 3' adapter identity per read on unaligned BAM (before alignment). Only runs for EDX samples.
| Output | demux/edx/{sample}/{sample}.edx_adapters.tsv.gz |
extract_edx_read_ids¶
Extract read IDs matching the sample's EDX adapter assignment.
| Output | demux/edx/{sample}/{sample}.edx_read_ids.txt |
filter_fastq_by_edx¶
Extract FASTQ for reads matching the sample's EDX adapter.
| Output | demux/edx/fq/{sample}/{sample}.fq.gz |
filter_pod5_by_edx¶
Filter POD5 to keep only reads matching the sample's EDX adapter.
| Output | demux/edx/pod5/{sample}/{sample}.pod5 |
edx_concordance¶
Build concordance table of WDX vs EDX adapter identity from adapter detection TSVs.
| Output | summary/edx/edx_concordance.tsv.gz |
Rule Dependencies¶
flowchart LR
merge_pods --> rebasecall
rebasecall --> ubam_to_fastq
rebasecall --> detect_edx_adapters
detect_edx_adapters --> extract_edx_read_ids
extract_edx_read_ids --> filter_fastq_by_edx
extract_edx_read_ids --> filter_pod5_by_edx
ubam_to_fastq -.-> bwa_align
filter_fastq_by_edx -.-> bwa_align
bwa_align --> inject_ubam_tags
inject_ubam_tags --> classify_charging
filter_pod5_by_edx -.-> classify_charging
classify_charging --> transfer_bam_tags
transfer_bam_tags --> add_adapter_tags
add_adapter_tags --> finalize_bam
finalize_bam --> get_cca_trna
finalize_bam --> base_calling_error
finalize_bam --> align_stats
finalize_bam --> bam_to_coverage
finalize_bam --> modkit_pileup
finalize_bam --> modkit_extract_calls
get_cca_trna --> get_cca_trna_cpm
get_cca_trna --> compute_odds_ratios
modkit_extract_calls --> compute_odds_ratios
finalize_bam --> generate_squiggy_session
merge_pods --> generate_squiggy_session
finalize_bam --> compute_reference_similarity
align_stats --> render_combined_qc_report
get_cca_trna --> render_combined_qc_report
get_cca_trna_cpm --> render_combined_qc_report
base_calling_error --> render_combined_qc_report
Note: Dashed lines (-.->) indicate conditional paths. For EDX samples,filter_fastq_by_edxfeedsbwa_alignandfilter_pod5_by_edxfeedsclassify_charging. For non-EDX samples,ubam_to_fastqfeedsbwa_align` directly.