Arrest Motif Analysis

C-terminal arrest motifs in prokaryotic and viral proteomes

Published

January 18, 2026

Overview

Arrest peptides are short sequences that cause ribosomes to stall during translation. Unlike 2A peptides that cause ribosomal skipping, arrest peptides physically pause the ribosome, often regulating downstream gene expression. This analysis examines the distribution of C-terminal arrest motifs across prokaryotic and viral proteomes.

Motif Patterns Analyzed

Eight arrest motif patterns were searched based on known stalling sequences:

Motif Pattern Known Examples
RAGP R-A-G-P SecM (E. coli)
RAPG R-A-P-G VemP-like
QAPP Q-A-P-P aaeB, ydhK
QGPP Q-G-P-P tcyC
HAPP H-A-P-P -
HGPP H-G-P-P yaaX
RAPP R-A-P-P yhfW
RPPP R-P-P-P -

Methods

Extraction Parameters

parameters:
  upstream_residues: 40    # Context upstream of motif
  max_c_term_distance: 50  # Maximum distance from C-terminus
  min_sequence_length: 45  # Minimum protein length

Analysis Pipeline

  1. Extraction: C-terminal motifs extracted from proteomes
  2. Clustering: MMseqs2 clustering at 70% identity to reduce redundancy
  3. Alignment: Multiple sequence alignment per cluster
  4. HMM Building: Profile HMMs built from alignments
  5. Cross-database Search: Host HMMs searched against viral databases

Distribution Analysis

Motifs by Database

Motif Type Distribution

Summary Statistics

Host vs Phage Comparison

ImportantKey Finding: 23x Enrichment in Phage

Arrest motifs are 23 times more abundant in phage/viral proteomes compared to host (bacterial/archaeal) proteomes. This striking enrichment suggests phage may encode ribosome-manipulating peptides to hijack host translation.

Enrichment by Motif Type

Enrichment Table

Top Enriched Motifs

The three most enriched motifs in phage:

  1. QAPP (96x enrichment): The most dramatically enriched, with 7,509 instances in phage vs 78 in hosts
  2. RAPP (68x enrichment): 2,111 phage instances vs 30 in hosts
  3. RPPP (47x enrichment): 1,864 phage instances vs 39 in hosts

Motif Heatmap

Validation Against Known Arrest Peptides

The arrest motif HMMs were validated by searching for known stalling peptides characterized in the literature.

Validation Results

Known Peptides Recovered

Notable Recoveries

  • SecM (RAGP): The canonical SecA-regulated stalling peptide from E. coli was correctly identified
  • yaaX (HGPP): Strong hit with E-value 1.3e-17
  • yhfW (RAPP): E-value 1.3e-11
  • tcyC (QGPP): B. subtilis cysteine transporter subunit

Biological Implications

Why Are Arrest Motifs Enriched in Phage?

Several hypotheses could explain the 23x enrichment of arrest motifs in phage proteomes:

  1. Translation Regulation: Phage may use arrest peptides to coordinate temporal gene expression during infection

  2. Host Ribosome Manipulation: Arrest peptides could stall host ribosomes on phage mRNAs, giving phage transcripts priority access to translation machinery

  3. Anti-Defense Mechanism: Ribosome stalling may help phage evade host defense systems that target actively translating foreign mRNAs

  4. Lysis Timing: Controlled ribosome stalling could help coordinate the timing of host cell lysis

Motif-Specific Patterns

The differential enrichment across motif types suggests functional specialization:

  • QAPP/RAPP/RPPP (highest enrichment): May represent phage-specific regulatory elements
  • RAPG/RAGP (lower enrichment): Closer to cellular stoichiometry, possibly shared functions

Output Files

results/prokaryotic/
├── analysis/
│   ├── arrest_distribution_matrix.tsv    # Motif counts per database
│   ├── host_phage_comparison.tsv         # Enrichment analysis
│   └── arrest_distribution_summary.txt   # Summary statistics
│
├── arrest_analysis/{database}/
│   ├── arrest_motifs.tsv.gz              # All extracted motifs
│   ├── arrest_sequences.fasta.gz         # Motif sequences
│   ├── motif_summary.tsv                 # Per-motif statistics
│   ├── arrest_motifs.hmm                 # Profile HMMs
│   └── known_peptide_validation.tsv      # Validation results
│
└── arrest_analysis/refined/
    └── arrest_motifs.hmm                 # Refined HMMs from cross-search

Methods Summary

Databases Searched

Database Type Motifs Found
uniprot_viruses Phage/Viral 14,750
ncbi_viral_refseq Phage/Viral 1,344
bacteria Host 628
archaea Host 66
Total 16,788

Analysis Parameters

  • Motif patterns: 8 C-terminal arrest sequences (RAGP, RAPG, QAPP, QGPP, HAPP, HGPP, RAPP, RPPP)
  • Upstream context: 40 residues
  • C-terminal distance: Maximum 50 residues from protein end
  • Clustering: 70% sequence identity (MMseqs2)
  • Alignment: MAFFT
  • HMM construction: HMMER3 hmmbuild