Streaming alignment with bwa
sracha can stream FASTQ directly to an aligner via -Z, eliminating
intermediate files and reducing disk I/O.
Prerequisites
Install bwa, samtools, and ucsc-twobittofa:
Quick start: align to chr22
Stream chr22 from the hs1 (T2T-CHM13v2.0) 2bit file, gzip it, and index:
Stream directly from NCBI to a sorted BAM — no intermediate files:
sracha downloads the SRA file to a hidden temp file, streams
interleaved FASTQ to stdout, then auto-deletes the temp file.
No .sra or .fastq.gz files are left on disk.
Verify:
Full human genome workflow
Download and index hs1:
Stream alignment:
Two-step workflow
If you already have an SRA file on disk, use fastq -Z:
How it works
-Zstreams uncompressed FASTQ to stdout; pair it with--split interleavedso paired reads come out as a single interleaved streambwa mem -preads interleaved paired-end FASTQ from stdinsamtools sortreads SAM from stdin and writes a coordinate-sorted BAM- Backpressure flows naturally through the Unix pipe
Performance tips
- Split threads between sracha and bwa (e.g.,
-t 4for sracha,-t 8for bwa on 12 cores) - Streaming avoids writing intermediate FASTQ files (saves disk I/O and space)
- For maximum throughput, use
--format sraliteif quality scores aren't critical