Output Files¶
This guide documents all output files produced by the pipeline.
Output Directory Structure¶
| Text Only | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 | |
Data Flow and Outputs¶
flowchart TB
subgraph Input
A[POD5 files]
end
subgraph Processing
B[pod5/{sample}/{sample}.pod5<br/>Merged POD5]
C[bam/rebasecall/{sample}/{sample}.rbc.bam<br/>Basecalled]
D[fq/{sample}/{sample}.fq.gz<br/>FASTQ]
E[bam/aln/{sample}/{sample}.aln.bam<br/>Aligned]
F[bam/charging/{sample}/{sample}.charging.bam<br/>Classified]
G[bam/final/{sample}/{sample}.bam<br/>Final BAM]
end
subgraph Outputs
H[summary/tables/<br/>Charging & Stats]
I[summary/modkit/<br/>Modifications]
J[summary/qc/<br/>Reference similarity]
K[summary/tables/<br/>Odds ratios]
L[reports/<br/>QC report]
end
A --> B --> C --> D --> E --> F --> G
G --> H
G --> I
G --> J
I --> K
H --> K
H --> L
Core Outputs¶
Final BAM¶
bam/final/{sample}/{sample}.bam
The final BAM file with charging classification and adapter position tags.
Tags:
| Tag | Type | Description |
|---|---|---|
CL |
B:C |
Charging likelihood (0-255 scale) |
CM |
Z |
Charging model metadata |
PT |
Z |
Adapter positions (5' and 3' boundaries) |
View tags:
| Bash | |
|---|---|
1 | |
Tag Renaming
The original Remora tags ML/MM are renamed to CL/CM to avoid conflicts with standard SAM modification tags.
Charging Probability Table¶
summary/tables/{sample}/{sample}.charging_prob.tsv.gz
Per-read charging likelihood scores.
| Column | Description |
|---|---|
read_id |
Nanopore read identifier |
tRNA |
Aligned tRNA reference |
charging_likelihood |
ML score (0-255) |
Interpretation:
- Score ≥ 200: Charged (aminoacylated)
- Score < 200: Uncharged
Example:
| Bash | |
|---|---|
1 | |
| Text Only | |
|---|---|
1 2 3 | |
Charging CPM Table¶
summary/tables/{sample}/{sample}.charging.cpm.tsv.gz
Per-tRNA aggregated charging counts, normalized to CPM (counts per million).
| Column | Description |
|---|---|
tRNA |
tRNA reference name |
counts_charged |
Number of charged reads |
counts_uncharged |
Number of uncharged reads |
cpm_charged |
Charged CPM |
cpm_uncharged |
Uncharged CPM |
Example:
| Bash | |
|---|---|
1 | |
| Text Only | |
|---|---|
1 2 3 | |
Quality Control Outputs¶
Alignment Statistics¶
summary/tables/{sample}/{sample}.align_stats.tsv.gz
Read counts through pipeline stages.
| Column | Description |
|---|---|
bam_file |
BAM file path |
id |
Sample identifier |
info |
Pipeline stage |
n_reads |
Total reads |
pct_mapped |
Percent mapped |
mapped_reads |
Number mapped |
pos_reads |
Positive strand reads |
mapq0_reads |
MAPQ 0 reads |
mean_length |
Mean read length |
mean_bq |
Mean base quality |
mean_mapq |
Mean mapping quality |
Base Calling Errors¶
summary/tables/{sample}/{sample}.bcerror.tsv.gz
Per-position base calling error metrics.
| Column | Description |
|---|---|
Position |
Reference position |
Coverage |
Read coverage |
A_Freq, T_Freq, G_Freq, C_Freq |
Base frequencies |
MismatchFreq |
Mismatch frequency |
InsertionFreq |
Insertion frequency |
DeletionFreq |
Deletion frequency |
BCErrorFreq |
Combined error frequency |
MeanQual |
Mean base quality |
Coverage Tracks¶
summary/tables/{sample}/{sample}.{cpm,counts}.bg.gz
BedGraph coverage tracks for visualization.
.cpm.bg.gz- CPM-normalized coverage.counts.bg.gz- Raw count coverage
Load in IGV:
| Bash | |
|---|---|
1 2 | |
Modification Outputs¶
Modification Pileup¶
summary/modkit/{sample}/{sample}.pileup.bed.gz
Per-site modification consensus in BED format.
| Column | Description |
|---|---|
chrom |
Reference name |
start |
Start position |
end |
End position |
mod |
Modification type |
score |
Modification score |
strand |
Strand |
| Additional | Modkit-specific columns |
Per-Read Modification Calls¶
summary/modkit/{sample}/{sample}.mod_calls.tsv.gz
Individual modification calls per read.
Full Modification Export¶
summary/modkit/{sample}/{sample}.mod_full.tsv.gz
Comprehensive modification information including all modkit fields.
Reference Similarity Matrix¶
summary/qc/reference_similarity.tsv
Pairwise sequence similarity matrix for the reference FASTA, useful for identifying potential cross-mapping issues.
Separate invocation
This rule is not part of the default pipeline outputs. Run it explicitly:
| Bash | |
|---|---|
1 | |
Format: Square TSV matrix with sequence names as row and column headers, values are percent identity (0-100).
Modification Odds Ratios¶
summary/tables/{sample}/{sample}.odds_ratios.tsv.gz
Per-tRNA pairwise modification odds ratios testing whether modification at one position is correlated with modification at another position (or with charging status).
Separate invocation
This rule is not part of the default pipeline outputs. Run it explicitly:
| Bash | |
|---|---|
1 | |
| Column | Description |
|---|---|
tRNA |
Reference tRNA name |
pos1 |
First position |
pos2 |
Second position (999 = charging) |
n00, n01, n10, n11 |
2x2 contingency table counts |
total_obs |
Total observations |
odds_ratio |
Odds ratio |
log_odds_ratio |
Log odds ratio |
se_log_or |
Standard error of log OR |
ci_lower, ci_upper |
95% confidence interval |
fisher_or |
Fisher's exact test OR |
p_value |
Fisher's exact test p-value |
p_adjusted |
BH-adjusted p-value |
QC Report¶
reports/qc_report.html
A combined Quarto HTML report with per-sample QC tabs, including alignment statistics, charging distributions, and basecalling error metrics.
Separate invocation
This report requires the report pixi environment:
| Bash | |
|---|---|
1 | |
Squiggy Session File¶
squiggy-session.json
A JSON session file generated at the root of the output directory for loading pipeline outputs in the Squiggy extension for Positron IDE.
Contents:
- Relative paths to POD5, BAM, and reference FASTA files for each sample
- MD5 checksums and file metadata for integrity verification
- Default plot options (eventalign mode, z-normalization)
JSON structure:
| JSON | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Usage:
Open the squiggy-session.json file in Positron to load all samples with their associated POD5, BAM, and reference files.
Intermediate Files¶
These files are produced but typically not used directly:
Merged POD5¶
pod5/{sample}/{sample}.pod5
Merged POD5 file containing all raw signal data for the sample.
Rebasecalled BAM¶
bam/rebasecall/{sample}/{sample}.rbc.bam
Dorado output with basecalls and move tables.
Aligned BAM¶
bam/aln/{sample}/{sample}.aln.bam
BWA MEM alignment output.
Charging BAM¶
bam/charging/{sample}/{sample}.charging.bam
Remora classification output with ML/MM tags (before renaming).
FASTQ¶
fq/{sample}/{sample}.fq.gz
Extracted reads for alignment.
Demultiplexing Outputs¶
When demultiplexing is enabled:
Barcode Mapping¶
demux/read_ids/{run_id}/barcode_mapping.tsv.gz
Read ID to barcode assignments.
Per-Sample Read Lists¶
demux/read_ids/{sample}.txt
Read IDs belonging to each sample.
Split POD5¶
demux/pod5/{sample}.pod5
Per-sample POD5 files after demultiplexing.
Signal Metrics (Optional)¶
If remora_kmer_table is configured:
summary/tables/{sample}/{sample}.remora.tsv.gz
Remora signal metrics per read per position.
Log Files¶
logs/{rule}/{sample}.log
Standard output and error for each rule execution.
File Sizes¶
Approximate file sizes for a typical sample:
| File | Size |
|---|---|
| Merged POD5 | 5-50 GB |
| Final BAM | 100-500 MB |
| Charging CPM | 10-50 KB |
| Charging Prob | 1-10 MB |
| Modkit pileup | 1-5 MB |
| Odds ratios | 100 KB-1 MB |
| Reference similarity | 10-500 KB |
| QC report (HTML) | 1-5 MB |
Cleanup¶
Remove intermediate files to save space:
| Bash | |
|---|---|
1 2 3 4 5 | |
Keep Final Outputs
Do not delete bam/final/, summary/, or pod5/ directories - these are primary outputs.
Next Steps¶
- Workflow Overview - Understand the pipeline stages
- Rules Reference - Detailed rule documentation