Research & Papers

Analysis and Explainability of LLMs Via Evolutionary Methods

New method treats LLM weights like DNA to reconstruct training family trees...

Deep Dive

A team led by Shannon Gallagher at Carnegie Mellon University (arXiv:2605.02930) has repurposed evolutionary biology techniques to analyze and explain large language models. By drawing an analogy between neural network weights and DNA genotypes, and output text and biological phenotypes, the researchers built phylogenetic trees that map the lineage of models trained on different datasets. In a controlled experiment, their estimated evolutionary trees reliably recovered the true training tree topology, demonstrating the method's validity. They also identified the most critical weight layers based on weight differences and found that one specific training dataset contributed more useful information than others.

Beyond controlled setups, the team generated an unsupervised evolutionary tree of several black-box foundation models (like GPT-4 and Claude), revealing surprising relationships and data dependencies. This approach offers a powerful new tool for LLM explainability, helping developers understand which datasets and training steps matter most. Visualizations produced by the method give a clear, intuitive view of how models relate to each other—opening the door for better auditing, fair comparison, and more informed model selection in enterprise deployments.

Key Points
  • Controlled experiments accurately reconstruct ground-truth training tree topology from weight and output comparisons.
  • Method identifies the most important weight layers by measuring weight differences across models.
  • Unsupervised evolutionary tree of black-box foundation models reveals unexpected lineage and data influence patterns.

Why It Matters

Brings rigorous explainability to LLM lineages, helping teams audit training data impact and model relationships at scale.