Arrest Motif Analysis
C-terminal arrest motifs in prokaryotic and viral proteomes
Overview
Arrest peptides are short sequences that cause ribosomes to stall during translation. Unlike 2A peptides that cause ribosomal skipping, arrest peptides physically pause the ribosome, often regulating downstream gene expression. This analysis examines the distribution of C-terminal arrest motifs across prokaryotic and viral proteomes.
Motif Patterns Analyzed
Eight arrest motif patterns were searched based on known stalling sequences:
| Motif | Pattern | Known Examples |
|---|---|---|
| RAGP | R-A-G-P | SecM (E. coli) |
| RAPG | R-A-P-G | VemP-like |
| QAPP | Q-A-P-P | aaeB, ydhK |
| QGPP | Q-G-P-P | tcyC |
| HAPP | H-A-P-P | - |
| HGPP | H-G-P-P | yaaX |
| RAPP | R-A-P-P | yhfW |
| RPPP | R-P-P-P | - |
Methods
Extraction Parameters
parameters:
upstream_residues: 40 # Context upstream of motif
max_c_term_distance: 50 # Maximum distance from C-terminus
min_sequence_length: 45 # Minimum protein lengthAnalysis Pipeline
- Extraction: C-terminal motifs extracted from proteomes
- Clustering: MMseqs2 clustering at 70% identity to reduce redundancy
- Alignment: Multiple sequence alignment per cluster
- HMM Building: Profile HMMs built from alignments
- Cross-database Search: Host HMMs searched against viral databases
Distribution Analysis
Motifs by Database
Motif Type Distribution
Summary Statistics
Host vs Phage Comparison
Arrest motifs are 23 times more abundant in phage/viral proteomes compared to host (bacterial/archaeal) proteomes. This striking enrichment suggests phage may encode ribosome-manipulating peptides to hijack host translation.
Enrichment by Motif Type
Enrichment Table
Top Enriched Motifs
The three most enriched motifs in phage:
- QAPP (96x enrichment): The most dramatically enriched, with 7,509 instances in phage vs 78 in hosts
- RAPP (68x enrichment): 2,111 phage instances vs 30 in hosts
- RPPP (47x enrichment): 1,864 phage instances vs 39 in hosts
Motif Heatmap
Validation Against Known Arrest Peptides
The arrest motif HMMs were validated by searching for known stalling peptides characterized in the literature.
Validation Results
Known Peptides Recovered
Notable Recoveries
- SecM (RAGP): The canonical SecA-regulated stalling peptide from E. coli was correctly identified
- yaaX (HGPP): Strong hit with E-value 1.3e-17
- yhfW (RAPP): E-value 1.3e-11
- tcyC (QGPP): B. subtilis cysteine transporter subunit
Biological Implications
Why Are Arrest Motifs Enriched in Phage?
Several hypotheses could explain the 23x enrichment of arrest motifs in phage proteomes:
Translation Regulation: Phage may use arrest peptides to coordinate temporal gene expression during infection
Host Ribosome Manipulation: Arrest peptides could stall host ribosomes on phage mRNAs, giving phage transcripts priority access to translation machinery
Anti-Defense Mechanism: Ribosome stalling may help phage evade host defense systems that target actively translating foreign mRNAs
Lysis Timing: Controlled ribosome stalling could help coordinate the timing of host cell lysis
Motif-Specific Patterns
The differential enrichment across motif types suggests functional specialization:
- QAPP/RAPP/RPPP (highest enrichment): May represent phage-specific regulatory elements
- RAPG/RAGP (lower enrichment): Closer to cellular stoichiometry, possibly shared functions
Output Files
results/prokaryotic/
├── analysis/
│ ├── arrest_distribution_matrix.tsv # Motif counts per database
│ ├── host_phage_comparison.tsv # Enrichment analysis
│ └── arrest_distribution_summary.txt # Summary statistics
│
├── arrest_analysis/{database}/
│ ├── arrest_motifs.tsv.gz # All extracted motifs
│ ├── arrest_sequences.fasta.gz # Motif sequences
│ ├── motif_summary.tsv # Per-motif statistics
│ ├── arrest_motifs.hmm # Profile HMMs
│ └── known_peptide_validation.tsv # Validation results
│
└── arrest_analysis/refined/
└── arrest_motifs.hmm # Refined HMMs from cross-search
Methods Summary
Databases Searched
| Database | Type | Motifs Found |
|---|---|---|
| uniprot_viruses | Phage/Viral | 14,750 |
| ncbi_viral_refseq | Phage/Viral | 1,344 |
| bacteria | Host | 628 |
| archaea | Host | 66 |
| Total | 16,788 |
Analysis Parameters
- Motif patterns: 8 C-terminal arrest sequences (RAGP, RAPG, QAPP, QGPP, HAPP, HGPP, RAPP, RPPP)
- Upstream context: 40 residues
- C-terminal distance: Maximum 50 residues from protein end
- Clustering: 70% sequence identity (MMseqs2)
- Alignment: MAFFT
- HMM construction: HMMER3 hmmbuild