2A Peptide Discovery

Systematic identification of viral 2A self-cleaving peptides and prokaryotic ribosome stalling sequences

Author

Hesselberth Lab

Published

January 18, 2026

Overview

This project identifies 2A peptides across protein sequence databases using profile hidden Markov models (HMMs). 2A peptides are short (~15-20 residue) cis-acting oligopeptides that cause ribosomal skipping during translation at conserved Gly-Pro (GP) dipeptides.

What are 2A Peptides?

2A peptides were first discovered in picornaviruses (foot-and-mouth disease virus) where they enable production of multiple proteins from a single open reading frame without requiring a stop codon. The ribosome “skips” peptide bond formation at the Gly-Pro junction, releasing the upstream protein while continuing translation of the downstream sequence.

NoteKey Features of 2A Peptides
  • Conserved C-terminal motif: DxExNPGP
  • Self-cleavage occurs between Gly and Pro
  • No external protease required
  • Widely used in biotechnology for multi-cistronic expression

Two Classes of 2A Peptides

We distinguish two classes based on conserved sequence features:

Class N-terminal Central Motif C-terminal
Class 1 Leucine stretch GDVE NPGP
Class 2 Invariant Trp EEGIE PNPGP/PHPGP

Pipeline Overview

flowchart TB
    subgraph inputs[" "]
        direction TB
        Seeds["🧬 Seed Alignments<br/><small>Curated 2A sequences</small>"]
        style Seeds fill:#e1f5fe
    end

    subgraph databases[" "]
        direction TB
        UniProt[(UniProt)]
        RefProt[(RefProt)]
        UniParc[(UniParc)]
        MGnify[(MGnify)]
        IMGVR[(IMG/VR)]
        style UniProt fill:#fff3e0
        style RefProt fill:#fff3e0
        style UniParc fill:#fff3e0
        style MGnify fill:#fff3e0
        style IMGVR fill:#fff3e0
    end

    subgraph iteration1["Iteration 1"]
        direction TB
        HMM1["hmmbuild<br/><small>Seed HMMs</small>"]
        Search1["hmmsearch<br/><small>All databases</small>"]
        Filter1["Filter<br/><small>E-value, length</small>"]
        style HMM1 fill:#e8f5e9
        style Search1 fill:#e8f5e9
        style Filter1 fill:#e8f5e9
    end

    subgraph iteration2["Iteration 2"]
        direction TB
        HMM2["hmmbuild<br/><small>Refined HMMs</small>"]
        Search2["hmmsearch<br/><small>Expanded search</small>"]
        Curate["Auto-curate<br/><small>Motif validation</small>"]
        style HMM2 fill:#fce4ec
        style Search2 fill:#fce4ec
        style Curate fill:#fce4ec
    end

    subgraph outputs["Outputs"]
        direction TB
        FinalHMM["📊 Final Models<br/><small>Class 1 & 2 HMMs</small>"]
        Logos["🎨 Sequence Logos"]
        Report["📝 Analysis Report"]
        style FinalHMM fill:#f3e5f5
        style Logos fill:#f3e5f5
        style Report fill:#f3e5f5
    end

    Seeds --> HMM1
    HMM1 --> Search1
    databases --> Search1
    Search1 --> Filter1
    Filter1 --> HMM2
    HMM2 --> Search2
    databases --> Search2
    Search2 --> Curate
    Curate --> FinalHMM
    FinalHMM --> Logos
    FinalHMM --> Report
Figure 1: Complete workflow for 2A peptide and ribosomal stalling peptide discovery

Databases Searched

Eukaryotic Pipeline

Database Description Size
UniProt Swiss-Prot Curated protein sequences ~570k
Reference Proteomes Representative proteomes Medium
UniParc Non-redundant archive Very large
MGnify Metagenomic proteins ~300M

Prokaryotic Pipeline

Database Description Size
UniProt Bacteria Swiss-Prot bacterial proteins ~500k
UniProt Archaea Swiss-Prot archaeal proteins ~50k
NCBI Viral RefSeq Viral protein sequences ~100MB
IMG/VR Phage proteins Millions
INPHARED Curated phage genomes ~19k genomes

Key Analyses

Eukaryotic 2A Peptides

Prokaryotic Discovery

Methods

Background

Ribosomal Stalling in Prokaryotes

Prokaryotes use related but distinct mechanisms for ribosomal stalling:

  • SecM: Stalls at FXXXXWIXXXXGIRAGP to regulate secretion
  • TnaC: Tryptophan-dependent stalling
  • MifM: Regulates membrane protein insertion in B. subtilis

These peptides share the GP motif but have different upstream contexts and biological functions.

Biotechnology Applications

2A peptides are widely used for:

  • Multi-gene expression from single vectors
  • Stoichiometric co-expression of protein complexes
  • CAR-T cell engineering
  • Vaccine development

Resources

  • Source Code: GitHub repository
  • Seed Alignments: resources/seed-alignments/
  • Final Models: results/models/final/
TipRunning the Pipeline
# Full pipeline
pixi run snakemake --cores 12

# Prokaryotic discovery only
pixi run snakemake prokaryotic_discovery --cores 12