Sample Files¶
The pipeline supports two sample file formats: TSV (simple) and YAML (with demultiplexing).
TSV Format¶
Use TSV format for non-multiplexed samples:
| Text Only | |
|---|---|
1 | |
Example¶
| Text Only | |
|---|---|
1 2 3 | |
Rules¶
- No header row - data starts on line 1
- Tab-separated - use actual tab characters, not spaces
- Two columns: sample ID and run directory path
- Same sample ID can appear multiple times - POD5 files are merged
POD5 Discovery¶
The pipeline searches each run directory for POD5 files in:
flowchart LR
A[run_directory/] --> B[pod5_pass/]
A --> C[pod5_fail/]
A --> D[pod5/]
B --> E[*.pod5 files]
C --> E
D --> E
All discovered POD5 files are merged per sample before processing.
YAML Format¶
Use YAML format for multiplexed samples with barcode demultiplexing:
| YAML | |
|---|---|
1 2 3 4 5 | |
Example¶
| YAML | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Dual Barcoded Samples (WDX + EDX)¶
When samples use both WDX (5' signal) and EDX (3' adapter) barcodes, specify sample values as a dict with wdx and edx keys:
| YAML | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
Both formats (plain string and dict) can be mixed across runs within the same file. See the Demultiplexing guide for details on EDX concordance analysis.
Structure¶
| Field | Required | Description |
|---|---|---|
runs |
Yes | List of sequencing runs |
runs[].path |
Yes | Path to run directory |
runs[].barcode_kit |
No | Barcode kit (uses config default if omitted) |
runs[].samples |
Yes | Map of sample_id → barcode value (string or dict) |
Sample values can be:
- String: WDX barcode name only (e.g.,
"barcode03") - Dict:
wdxand/oredxkeys (e.g.,{wdx: "barcode03", edx: "edx1"}) - Null (
~): Skip demultiplexing for this sample
Barcode Names¶
For WarpDemuX-tRNA kits:
| Kit | Available Barcodes |
|---|---|
WDX4_tRNA_rna004_v1_0 |
barcode03, barcode04, barcode05, barcode07 |
WDX4b_tRNA_rna004_v1_0 |
barcode04, barcode05, barcode07, barcode11 |
Non-Multiplexed Samples in YAML¶
Use ~ (YAML null) for samples that don't need demultiplexing:
| YAML | |
|---|---|
1 2 3 4 | |
Data Flow¶
Without Demultiplexing (TSV)¶
flowchart LR
A[samples.tsv] --> B[parse_samples]
B --> C[find_raw_inputs]
C --> D[POD5 files per sample]
D --> E[merge_pods rule]
With Demultiplexing (YAML)¶
flowchart LR
A[samples.yml] --> B[parse_samples]
B --> C[find_raw_inputs per run]
C --> D[warpdemux]
D --> E[split_pod5 per sample]
E --> F[Continue to rebasecall...]
Validation¶
Check Sample Detection¶
Run a dry-run to verify samples are detected:
| Bash | |
|---|---|
1 | |
Look for:
| Text Only | |
|---|---|
1 2 3 4 5 6 | |
The merge_pods count should match your number of samples.
Common Issues¶
No POD5 files found
| Text Only | |
|---|---|
1 | |
Verify:
- Run directory path is correct
- POD5 files exist in
pod5_pass/,pod5_fail/, orpod5/subdirectory - Files have
.pod5extension
Sample not detected
Check for:
- Extra whitespace in TSV file
- Wrong tab character (use actual tabs, not spaces)
- Missing YAML indentation
Mixing Formats¶
You cannot mix TSV and YAML formats. Choose one based on your needs:
| Scenario | Format |
|---|---|
| Simple, non-multiplexed | TSV |
| Barcoded/pooled samples | YAML |
| Mix of multiplexed and direct | YAML (use ~ for direct) |
Next Steps¶
- Configuration - Configure demultiplexing options
- Running Pipeline - Execute the pipeline
- Demultiplexing - Detailed demux guide