Trained on 8.8 trillion DNA bases from OpenGenome2 dataset covering all three domains of life?

Trained on 8.8 trillion DNA bases from OpenGenome2 dataset covering all three domains of life

Uses StripedHyena 2 convolutional neural network with 40 billion parameters in full version?

Uses StripedHyena 2 convolutional neural network with 40 billion parameters in full version

Identifies complex eukaryotic features like regulatory sequences and splice sites that challenge existing tools?

Identifies complex eukaryotic features like regulatory sequences and splice sites that challenge existing tools

Developer Tools

Evo 2 AI model analyzes trillions of DNA bases to decode complex genomes

Ars Technica March 05, 2026

⚡Open source AI trained on 8.8 trillion DNA bases identifies regulatory sequences humans struggle to spot.

Deep Dive

The research team behind the original Evo AI system has released Evo 2, a groundbreaking open source model trained on genomic data from bacteria, archaea, and eukaryotes. This represents a significant leap from their previous work focused solely on bacterial genomes, directly addressing the challenge of analyzing complex eukaryotic DNA structures. The model was trained on the massive OpenGenome2 dataset containing 8.8 trillion DNA bases, though notably excluded eukaryotic viruses to prevent potential misuse for creating human threats.

The system uses a StripedHyena 2 convolutional neural network trained in two stages: first on 8,000-base chunks to identify local features, then on million-base sequences to recognize large-scale patterns. Researchers created two versions—a 7-billion-parameter model trained on 2.4 trillion bases and a full 40-billion-parameter version using the complete dataset. By learning evolutionary conservation patterns across species, Evo 2 can identify subtle genomic features like weakly defined splice sites and regulatory sequences that are notoriously difficult for humans to detect in complex eukaryotic genomes.

Key Points

Trained on 8.8 trillion DNA bases from OpenGenome2 dataset covering all three domains of life
Uses StripedHyena 2 convolutional neural network with 40 billion parameters in full version
Identifies complex eukaryotic features like regulatory sequences and splice sites that challenge existing tools

Why It Matters

Enables more accurate analysis of human and complex genomes, accelerating biomedical research and therapeutic development.

Read Original Article

Evo 2 AI model analyzes trillions of DNA bases to decode complex genomes

Why It Matters

Related Articles

🚀 Stay Ahead in AI