Demultiplexing¶

Guide for using WarpDemuX barcode demultiplexing with pooled/multiplexed samples.

Overview¶

WarpDemuX enables barcode demultiplexing for pooled Nano-tRNAseq sequencing runs. Multiple samples can be sequenced together and separated computationally based on barcode signal patterns.

flowchart LR
    A[Pooled POD5<br/>4 barcoded samples] --> B[WarpDemuX]
    B --> C[Sample A<br/>barcode03]
    B --> D[Sample B<br/>barcode04]
    B --> E[Sample C<br/>barcode05]
    B --> F[Sample D<br/>barcode07]

When to Use Demultiplexing¶

Use WarpDemuX when:

Multiple samples were pooled in a single sequencing run
Samples were prepared with WarpDemuX barcodes
Using the Nano-tRNAseq protocol

Do not use when:

Samples were sequenced individually (1 sample per run)
Using Thomas splint adapter (incompatible)
Barcodes were not used during library prep

Setup¶

1. Install WarpDemuX¶

Bash
pixi run setup

This installs WarpDemuX along with other pipeline tools (dorado, remora).

2. Create YAML Sample File¶

Create a sample file in YAML format (required for demultiplexing):

config/samples-demux.yml

YAML
runs:
  # Pooled run with 4 barcoded samples
  - path: /data/sequencing/pooled_run
    barcode_kit: "WDX4_tRNA_rna004_v1_0"
    samples:
      charged_rep1: "barcode03"
      uncharged_rep1: "barcode04"
      charged_rep2: "barcode05"
      uncharged_rep2: "barcode07"

3. Enable in Configuration¶

Create a config file with demux enabled:

config/config-demux.yml

YAML
samples: config/samples-demux.yml
output_directory: "results/demux"

warpdemux:
    enabled: true
    barcode_kit: "WDX4_tRNA_rna004_v1_0"
    save_boundaries: true
    threads: 8

Barcode Kits¶

WarpDemuX provides adapter-based barcode demultiplexing for Oxford Nanopore direct RNA sequencing. This pipeline uses tRNA-specific WarpDemuX models trained for the Nano-tRNAseq protocol.

Naming Convention¶

Model names follow the format: WDX[n_barcodes][alt_set]_tRNA_rna004_v1_0

WDX — WarpDemuX prefix
[n_barcodes] — number of barcodes in the set (e.g., 4)
[alt_set] — optional letter for alternative adapter sets (e.g., b)
_tRNA_ — indicates tRNA-specific model
rna004_v1_0 — ONT RNA004 chemistry version

Available Kits¶

Kit	# Barcodes	Barcode IDs	Notes
`WDX4_tRNA_rna004_v1_0`	4	barcode03, barcode04, barcode05, barcode07	Recommended, +3-7% recovery
`WDX4b_tRNA_rna004_v1_0`	4	barcode04, barcode05, barcode07, barcode11	Alternative adapter set

Standard RNA004 Models

WarpDemuX also offers standard RNA004 models (WDX4, WDX6, WDX10) for mRNA and other direct RNA applications. See the WarpDemuX README for details. This pipeline requires the _tRNA_ variants.

Protocol Compatibility

WarpDemuX-tRNA models are developed specifically for the Nano-tRNAseq protocol. They do NOT work with data using the Thomas splint adapter.

Sample File Format¶

YAML Structure¶

YAML
runs:
  - path: /path/to/run           # Run directory
    barcode_kit: "kit_name"      # Optional, uses config default
    samples:
      sample_name: "barcode_id"  # Map sample to barcode

Multiple Runs¶

YAML
runs:
  # First pooled run
  - path: /data/run1
    samples:
      sample1: "barcode03"
      sample2: "barcode04"

  # Second pooled run
  - path: /data/run2
    samples:
      sample3: "barcode03"
      sample4: "barcode04"

Mixed Runs (Demux + Direct)¶

YAML
runs:
  # Pooled run
  - path: /data/pooled_run
    samples:
      pooled_sample1: "barcode03"
      pooled_sample2: "barcode04"

  # Direct sequencing (no demux)
  - path: /data/direct_run
    samples:
      direct_sample: ~  # null = skip demux

Dual Barcoding (WDX + EDX)¶

The pipeline supports dual barcoding — combining WDX (5' signal-based) and EDX (3' adapter sequence-based) barcodes for two-axis demultiplexing.

WDX (WarpDemuX): 5' signal barcode predicted from the raw nanopore signal by WarpDemuX. This is the primary demultiplexing barcode used to split POD5 reads into samples.
EDX: 3' adapter sequence variant (e.g., edx01, edx02). Different adapter sequences at the 3' end identify which adapter was used during library prep. EDX filtering happens early — right after basecalling, before alignment — so downstream rules only process matching reads.

When to Use Dual Barcoding¶

Use dual barcoding when samples are multiplexed with both WDX adapters at the 5' end and different EDX adapter sequences at the 3' end. This enables true two-axis demultiplexing: WDX splits reads at the POD5 level, then EDX splits both FASTQ and POD5 before alignment based on 3' adapter identity. An optional concordance analysis can verify agreement between the two axes.

Dict Format for Samples¶

When using dual barcoding, specify sample values as a dict with wdx and edx keys instead of a plain barcode string. The edx value must match a name from adapters.three_prime in the config (e.g., edx01, edx02):

YAML
runs:
  - path: /data/pooled_run
    barcode_kit: "WDX4_tRNA_rna004_v1_0"
    samples:
      # Dict format: wdx + edx
      # edx values must match adapter names from adapters.three_prime config
      sample_bc03:
        wdx: "barcode03"
        edx: "edx01"
      sample_bc04:
        wdx: "barcode04"
        edx: "edx02"

EDX Early Splitting¶

When a sample has an edx assignment, the pipeline detects 3' adapter identity on the unaligned BAM right after basecalling, then splits both FASTQ and POD5 by adapter before alignment. This avoids redundant processing when two samples share a WDX barcode but have different EDX adapters.

The EDX splitting flow:

Text Only
rebasecall → uBAM → detect_edx_adapters → extract_edx_read_ids
                                            ├── filter_fastq_by_edx → bwa_align → ...
                                            └── filter_pod5_by_edx → classify_charging

Reads with no detected 3' adapter get "none" in the adapter detection TSV and are excluded from all samples. For samples without an edx assignment, the pipeline flow is unchanged.

EDX Concordance Output (QC)¶

When samples have EDX assignments and edx.enabled: true, the edx_concordance rule produces a QC concordance table at summary/edx/edx_concordance.tsv.gz. This table shows how reads assigned to each WDX sample distribute across EDX adapter identities, useful for verifying demultiplexing accuracy. The concordance is computed from the pre-alignment adapter detection TSVs (which contain ALL reads), not from final BAMs.

Output columns:

Column	Description
`sample`	WDX sample name
`edx_adapter`	3' adapter identity detected (e.g., `edx01`, `edx02`, `none`)
`n_reads`	Number of reads with this adapter
`pct`	Percentage of the sample's reads with this adapter

Enable EDX concordance

EDX concordance output requires edx.enabled: true in the pipeline config. The rule runs automatically when enabled and at least one sample has an edx assignment.

Debugging unmatched reads

The full adapter detection TSV at demux/edx/{sample}/{sample}.edx_adapters.tsv.gz records every read's adapter assignment including "none", useful for debugging.

Pipeline Flow¶

With demultiplexing enabled, the pipeline adds these steps before standard processing:

flowchart TB
    subgraph Input
        A[Pooled POD5 files]
    end

    subgraph Demux[Demultiplexing Steps]
        B[warpdemux<br/>Predict barcodes]
        C[parse_warpdemux<br/>Create mapping]
        D[extract_sample_reads<br/>Filter by barcode]
        E[split_pod5<br/>Split per sample]
    end

    subgraph Standard[Standard Pipeline]
        F[rebasecall]
        G[bwa_align]
        H[classify_charging]
        I[...]
    end

    A --> B --> C --> D --> E --> F --> G --> H --> I

Demux Rules¶

warpdemux¶

Runs WarpDemuX barcode prediction directly on raw POD5 files.

Property	Value
Input	Raw POD5 files from run directory
Output	`demux/warpdemux_output/{run_id}/`
Threads	Configurable (default: 8)

parse_warpdemux¶

Parses WarpDemuX predictions to create barcode mapping.

Property	Value
Input	WarpDemuX output directory
Output	`demux/read_ids/{run_id}/barcode_mapping.tsv.gz`

Output format:

Column	Description
read_id	Nanopore read identifier
predicted_barcode	Assigned barcode (e.g., "barcode03")

extract_sample_reads¶

Extracts read IDs for a specific sample's barcode.

Property	Value
Input	Barcode mapping file
Output	`demux/read_ids/{sample}.txt`

split_pod5¶

Filters raw POD5 files by sample using read ID list.

Property	Value
Input	Raw POD5 files from run, read ID list
Output	`demux/pod5/{sample}.pod5`

detect_edx_adapters¶

Detects 3' adapter identity per read on the unaligned BAM (before alignment). Produces a gzipped TSV mapping each read_id to its best-matching 3' adapter name. Only runs for samples with an edx assignment.

Property	Value
Input	Rebasecalled uBAM
Output	`demux/edx/{sample}/{sample}.edx_adapters.tsv.gz`
Script	`workflow/scripts/detect_3p_adapters.py`

extract_edx_read_ids¶

Extracts read IDs matching the sample's expected EDX adapter from the detection TSV.

Property	Value
Input	Adapter detection TSV
Output	`demux/edx/{sample}/{sample}.edx_read_ids.txt`

filter_fastq_by_edx¶

Extracts FASTQ for reads matching the sample's EDX adapter from the uBAM.

Property	Value
Input	Rebasecalled uBAM + read IDs
Output	`demux/edx/fq/{sample}/{sample}.fq.gz`

filter_pod5_by_edx¶

Filters POD5 to keep only reads matching the sample's EDX adapter.

Property	Value
Input	WDX-split (or merged) POD5 + read IDs
Output	`demux/edx/pod5/{sample}/{sample}.pod5`

edx_concordance¶

Builds a concordance table of WDX sample assignment vs EDX (3' adapter) identity. Uses pre-alignment adapter detection TSVs (which contain ALL reads) rather than final BAMs. Only runs when edx.enabled: true and samples have EDX assignments.

Property	Value
Input	Adapter detection TSVs for all EDX-assigned samples
Output	`summary/edx/edx_concordance.tsv.gz`
Script	`workflow/scripts/edx_concordance.py`

Running¶

Dry Run¶

Bash
pixi run snakemake -n --configfile=config/config-demux.yml

Execute¶

Bash
# Local
pixi run snakemake --cores 12 --configfile=config/config-demux.yml

# Cluster
pixi run snakemake --profile cluster/lsf --configfile=config/config-demux.yml

Output Structure¶

With demultiplexing, outputs include:

Text Only
{output_directory}/
├── demux/
│   ├── warpdemux_output/{run_id}/
│   │   └── warpdemux_*/            # WarpDemuX results
│   ├── read_ids/
│   │   ├── {run_id}/
│   │   │   ├── barcode_mapping.tsv.gz
│   │   │   └── demux_summary.tsv.gz
│   │   └── {sample}.txt            # Per-sample WDX read IDs
│   ├── pod5/
│   │   └── {sample}.pod5           # Per-sample WDX POD5
│   └── edx/                        # EDX early splitting (if edx assigned)
│       ├── {sample}/
│       │   ├── {sample}.edx_adapters.tsv.gz  # All reads → adapter mapping
│       │   └── {sample}.edx_read_ids.txt     # Matching read IDs
│       ├── fq/{sample}/
│       │   └── {sample}.fq.gz      # EDX-filtered FASTQ
│       └── pod5/{sample}/
│           └── {sample}.pod5       # EDX-filtered POD5
├── bam/
│   └── ...                         # Standard outputs
└── summary/
    ├── edx/
    │   └── edx_concordance.tsv.gz  # EDX concordance (if edx.enabled)
    └── ...                         # Standard outputs

Configuration Options¶

YAML
warpdemux:
    enabled: true                        # Enable/disable demux
    barcode_kit: "WDX4_tRNA_rna004_v1_0" # Default kit
    save_boundaries: true                # Save boundary info
    threads: 8                           # Worker threads

Option	Description	Default
`enabled`	Enable demultiplexing	`false`
`barcode_kit`	Default barcode kit	`WDX4_tRNA_rna004_v1_0`
`save_boundaries`	Save demux boundaries	`true`
`threads`	WarpDemuX threads	`8`

Troubleshooting¶

No Reads for Sample¶

If a sample has zero reads after demux:

Check barcode assignment in sample file
Verify barcode kit matches library prep
Check demux_summary.tsv.gz for barcode distribution

Bash
zcat results/demux/read_ids/{run_id}/demux_summary.tsv.gz

WarpDemuX Fails¶

Common issues:

Memory: Increase mem_mb in cluster profile for warpdemux rule
Model not found: Verify barcode_kit name is correct
Incompatible data: WarpDemuX-tRNA only works with Nano-tRNAseq protocol

Unbalanced Barcodes¶

If barcode distribution is very unbalanced:

Check library prep QC
Review loading concentrations
Consider if samples have different RNA amounts

Best Practices¶

Verify barcode distribution before running full pipeline:

Bash
pixi run snakemake demux/read_ids/{run_id}/demux_summary.tsv.gz \
    --configfile=config/config-demux.yml

Use recommended kit (WDX4_tRNA_rna004_v1_0) for best recovery
Check sample file format carefully - YAML indentation matters
Monitor memory - WarpDemuX can require 32GB+ for large runs