Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
New open-source dataset shows optimal GPU selection can slash AI energy costs by up to 70%.
A team of researchers has launched Watt Counts, a groundbreaking open-source benchmark and dataset designed to tackle the massive energy footprint of large language models. The project addresses a critical gap: while the AI community acknowledges LLMs' high energy consumption, system operators lack concrete data to make energy-efficient deployment decisions across different hardware. Watt Counts fills this void with the largest publicly available dataset of its kind, featuring over 5,000 experiments that measure the energy use of 50 different LLMs across 10 different NVIDIA GPU architectures in both batch and server inference scenarios.
Leveraging this extensive data, the researchers conducted a system-level study revealing that GPU selection is paramount for energy efficiency. Their key finding is that the optimal hardware choice varies dramatically depending on the specific LLM and the deployment context—whether it's a batch job or an interactive server. This underscores the necessity of hardware-aware deployment strategies in heterogeneous computing environments. The practical payoff is substantial: by using the insights from Watt Counts to guide hardware selection, practitioners can achieve energy reductions of up to 70% for server-based inference with negligible effect on user experience, and up to 20% for batch processing tasks.
- Created the largest open-access LLM energy dataset: 5,000+ experiments across 50 models and 10 NVIDIA GPUs.
- Reveals optimal GPU choice is model and scenario-dependent, critical for cutting energy use in heterogeneous systems.
- Enables up to 70% energy savings in server inference and 20% in batch processing with minimal performance loss.
Why It Matters
Provides data-driven guidance to slash the massive energy costs and environmental impact of running AI models at scale.