Multi-Sample Comparison Guide¶

Complete guide to comparing multiple POD5 datasets using the Squiggy multi-sample comparison feature.

Overview¶

The multi-sample comparison feature allows you to:

Load 2-6+ datasets simultaneously in a single session
Compare signal characteristics between samples
Visualize delta tracks showing differences in aggregate statistics
Analyze read overlaps to find common and unique reads
Explore signal distributions across multiple basecallers or conditions

Use Cases¶

Basecaller Comparison: Compare output from different guppy versions (v3.0, v5.0, v6.0)
Model Evaluation: Test multiple pre-trained models on the same flowcell data
QC Between Runs: Compare signal quality across different sequencing runs
Protocol Optimization: Evaluate performance of different library prep methods
Multi-Condition Analysis: Compare treated vs control samples

Key Concepts¶

Sample¶

A sample is a complete analysis unit consisting of:

POD5 file (required): Raw nanopore signal data
BAM file (optional): Aligned reads with base annotations
FASTA file (optional): Reference sequences for motif analysis
Sample name (user-defined): Unique identifier (e.g., "guppy_v5.0", "model_a")

Session¶

The session is a container managing all loaded samples. You can have one session with multiple samples, or close everything and start fresh.

Delta Track¶

A delta track visualizes the difference between two samples:

Delta Signal: B - A (sample B minus sample A)
Color Coding:
🔴 Red: Sample B has higher signal (B > A)
🔵 Blue: Sample A has higher signal (A > B)
⚫ Gray: No significant difference (≈0)
Confidence Bands: Show uncertainty range (±1 std dev)

Quick Start¶

1. Load First Sample¶

Command Palette → "Load Sample (Multi-Sample Comparison)"
├─ Enter sample name: "v5.0"
├─ Select POD5 file: data/run1.pod5
├─ Select BAM file (optional): data/run1.bam
└─ Select FASTA file (optional): skip

2. Load Second Sample¶

Command Palette → "Load Sample (Multi-Sample Comparison)"
├─ Enter sample name: "v6.0"
├─ Select POD5 file: data/run2.pod5
├─ Select BAM file (optional): data/run2.bam
└─ Confirm

3. View Samples¶

In the Squiggy sidebar, find the Sample Comparison Manager panel showing:

Loaded Samples (2)

☑ v5.0
  POD5: /data/run1.pod5
  Reads: 1,234
  BAM FASTA
  [Unload]

☑ v6.0
  POD5: /data/run2.pod5
  Reads: 1,234
  BAM FASTA
  [Unload]

Selected: 2 sample(s)
[Start Comparison]

4. Run Comparison¶

Click "Start Comparison" button → Delta plot appears in Plots pane

Sample Management¶

Loading Samples¶

Method 1: Command Palette

Command Palette (Cmd/Ctrl+Shift+P)
→ Search: "Load Sample"
→ Enter name: e.g., "basecaller_v5"
→ Select POD5 file
→ (Optional) Select BAM file
→ (Optional) Select FASTA file

Method 2: Keyboard Shortcut

(If configured in your keybindings.json)

{
  "key": "cmd+shift+l",
  "command": "squiggy.loadSample"
}

Sample Requirements¶

File	Required	Format	Notes
POD5	✅ Yes	`.pod5`	Raw signal data from Oxford Nanopore
BAM	❌ No	`.bam` + `.bai`	Must be indexed; contains alignments
FASTA	❌ No	`.fa`, `.fasta`, `.fna`	Reference for motif analysis

Viewing Sample Details¶

The Sample Comparison Manager panel shows:

Sample name: User-defined identifier
POD5 path: Location of signal file
Read count: Number of reads in the POD5
BAM badge: Present if BAM file loaded
FASTA badge: Present if FASTA file loaded

Removing Samples¶

Sample Comparison Manager panel
→ Click [Unload] button next to sample name
→ Confirm in dialog
→ Sample removed from session

Note: Removing a sample doesn't delete the files, only closes them in the session.

Running Comparisons¶

Selecting Samples¶

In the Sample Comparison Manager panel:

Check the checkboxes next to samples to compare
Need minimum 2 samples for comparison
Can select all samples (2, 3, 4, etc.)

☐ v5.0
☑ v6.0     ← Selected
☑ v7.0     ← Selected
☐ v8.0

Generating Delta Plot¶

⚠️ Requirements: - Both samples must have BAM files loaded - BAM files are required to align signals to reference positions for meaningful comparison - If you get "BAM files are required" error, reload samples with BAM files

Option 1: Via Panel

Sample Comparison Manager
→ Check 2+ samples
→ Click [Start Comparison]
→ Plot appears in Plots pane

Option 2: Via Command Palette

Command Palette → "Plot Delta Comparison"
→ Multi-select samples in quickpick
→ Confirm selection
→ Plot appears

Plot Customization¶

Access Plot Options panel to adjust:

Normalization: ZNORM (default), MAD, MEDIAN, NONE
Theme: Auto-detect light/dark from VSCode
Export Format: PNG, SVG, or HTML

Interpreting Results¶

Delta Signal Track¶

What it shows: Difference in aggregate signal between samples

Δ Signal = Signal_B - Signal_A

Reading the plot:

Red region (positive): Sample B has higher signal at this position
Indicates higher amplitude, cleaner signal, or different pore characteristic
Blue region (negative): Sample A has higher signal
Indicates Sample A's basecaller or model is more sensitive
Gray region (near zero): Similar signal between samples
Good agreement or similar basecaller characteristics

Confidence bands: Shaded area around the delta line

Shows variability in the difference
Wider bands = more inconsistent differences across reads
Narrow bands = consistent differences (likely systematic)

Delta Statistics Track¶

Shows coverage comparison:

Coverage A: Number of reads mapped at each position in sample A
Coverage B: Number of reads mapped at each position in sample B
Difference: Indicates alignment differences or sample quality variations

Example Interpretation¶

Comparing: v5.0 vs v6.0

Signal:   v6.0 shows +0.2 to +0.5 pA higher signal (red region)
          → v6.0 basecaller produces stronger signals

Coverage: Both samples have similar coverage
          → Good read overlap, comparable sequencing depth

Conclusion: v6.0 improved signal quality without losing reads

Advanced Usage¶

Python API¶

For notebook-based analysis:

from squiggy import load_sample, compare_samples, plot_delta_comparison

# Load samples
load_sample("model_a", "/data/model_a.pod5", bam_path="/data/model_a.bam")
load_sample("model_b", "/data/model_b.pod5", bam_path="/data/model_b.bam")

# Get comparison statistics
comparison = compare_samples(["model_a", "model_b"])
print(f"Common reads: {len(comparison['common_reads'])}")
print(f"Unique to A: {len(comparison['unique_to_a'])}")
print(f"Unique to B: {len(comparison['unique_to_b'])}")

# Generate delta plot
html = plot_delta_comparison(
    sample_names=["model_a", "model_b"],
    normalization="ZNORM",
    theme="LIGHT"
)

Reading Overlap Analysis¶

Check which reads are present in both samples:

from squiggy import get_common_reads, get_unique_reads

# Reads in both samples
common = get_common_reads("model_a", "model_b")
print(f"{len(common)} reads in both samples")

# Reads only in sample A
unique_a = get_unique_reads("model_a", "model_b")
print(f"{len(unique_a)} reads only in A")

# Reads only in sample B
unique_b = get_unique_reads("model_b", "model_a")
print(f"{len(unique_b)} reads only in B")

Signal Distribution Comparison¶

Compare statistical properties:

from squiggy import compare_signal_distributions

dist = compare_signal_distributions(signal_a, signal_b)
print(f"Mean A: {dist['mean_a']:.2f} pA")
print(f"Mean B: {dist['mean_b']:.2f} pA")
print(f"Difference: {dist['mean_a'] - dist['mean_b']:.2f} pA")

Batch Comparison Workflow¶

from squiggy import load_sample, plot_delta_comparison

# Load multiple basecaller versions
versions = {
    "v5.0": "/data/guppy_v5.0.pod5",
    "v6.0": "/data/guppy_v6.0.pod5",
    "v7.0": "/data/guppy_v7.0.pod5"
}

for name, pod5_path in versions.items():
    load_sample(name, pod5_path)

# Compare v6.0 vs v5.0
plot_delta_comparison(["v5.0", "v6.0"], theme="LIGHT")

# Compare v7.0 vs v6.0
plot_delta_comparison(["v6.0", "v7.0"], theme="LIGHT")

Troubleshooting¶

"Delta comparison requires at least 2 loaded samples"¶

Problem: Trying to start comparison with <2 samples

Solution: 1. Load at least one more sample 2. Both samples need different POD5 files (same file loaded twice is allowed but unusual)

Plot appears empty or shows no data¶

Problem: Delta plot shows but lacks visualization

Causes & Solutions:

Cause	Solution
No common reads between samples	Verify BAM files are mapped to same reference
Samples too different in depth	Try samples with similar read counts
Very short region	Pan/zoom the plot to see detail

Check sample compatibility:

from squiggy import compare_samples

comparison = compare_samples(["sample_a", "sample_b"])

# Should have reasonable overlap
if len(comparison['common_reads']) < 10:
    print("⚠️ Warning: Very few common reads!")
    print(f"Sample A: {comparison['total_reads_a']} reads")
    print(f"Sample B: {comparison['total_reads_b']} reads")

Red outline around delta values means alignment issues¶

Check:

BAM files must be indexed (.bai file present)
BAM files should be aligned to the same reference
CIGAR strings must be valid

Verify BAM files:

# Check if indexed
samtools index sample.bam  # Creates sample.bam.bai

# Check reference compatibility
samtools view -H sample.bam | grep @SQ

Unload not working or gives error¶

Solution:

Close the sample in the panel

If stuck, clear all state:

Command Palette → "Clear All State (After Kernel Restart)"
→ Restart Python kernel in Positron

Symptom: Positron hangs or extension becomes unresponsive

Solutions:

For POD5 files > 1GB: Load in batches
Close other samples before loading new ones
Use a subset of reads if possible
For BAM files: Ensure they're indexed
```
samtools index large.bam
```

Restart: Kill kernel and reload extension

Python kernel restart button (🔄) in Positron

Performance Tips¶

Best Practices¶

Start with 2-3 samples before loading more
Use BAM files from same reference for consistency
Keep file sizes under 500MB when possible
Close unused samples to free memory

Memory Usage¶

Approximate memory per sample:

File Type	Typical Size	Memory Impact
POD5	100-500 MB	~200 MB in session
BAM (indexed)	50-200 MB	~100 MB in session
FASTA	<10 MB	Negligible

Total for 3 samples: ~900 MB RAM (varies by complexity)

For Large Datasets¶

If working with very large POD5 files (>500 MB):

# Load full file
load_sample("full", "/data/large.pod5")

# Or load subset of reads (if framework allows)
# This would require custom Python code to sample reads

User Guide - Basic Squiggy usage
API Reference - Python API documentation
Developer Guide - Architecture and extension development
Issue #61 - Feature implementation details