Research & Papers

Ekka diagnoses silent LLM errors with 80% accuracy

New system finds hidden bugs in LLM serving frameworks by comparing execution states.

Deep Dive

Large language model serving frameworks are built on complex software stacks with aggressive optimizations, making them prone to silent errors—bugs that degrade output quality without any explicit error signals. These are notoriously hard to diagnose because high-level symptoms (e.g., worse accuracy) are far removed from low-level root causes (e.g., incorrect tensor operations). A new paper from researchers at University of Michigan, Microsoft, and other institutions proposes Ekka, an automated diagnosis system that reframes silent error detection as a differential debugging problem. By leveraging semantically correct reference implementations (e.g., Hugging Face’s transformers), Ekka systematically aligns and compares intermediate execution states between a target framework (like vLLM or TensorRT-LLM) and the reference, pinpointing where deviations occur.

In experiments on a benchmark of real-world silent errors from popular serving frameworks, Ekka achieved 80% pass@1 diagnosis accuracy (correct root cause in first guess) and 88% pass@5 accuracy, outperforming state-of-the-art baselines. More importantly, it identified 4 previously unknown silent errors in production frameworks, all of which have been acknowledged by developers. The work, accepted at ICML 2026, highlights a practical approach to maintaining model quality as LLM deployment scales. For teams running LLMs in production, Ekka offers a way to catch subtle bugs early—before they impact user trust.

Key Points
  • Ekka uses differential debugging by comparing intermediate execution states between a target framework and a correct reference implementation.
  • Achieves 80% pass@1 and 88% pass@5 diagnosis accuracy on a benchmark of real-world silent errors.
  • Discovered 4 new silent errors in popular serving frameworks, all confirmed by developers.

Why It Matters

As LLM deployment scales, Ekka automates detection of silent bugs that degrade output quality, preventing cascading failures.