AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework
Treating AI queries as movable electricity loads could cut costs and carbon...
In a new preprint on arXiv (2604.27855), researchers Xubin Luo and Yang Cheng introduce a framework that reframes AI inference as a form of relocatable electricity demand. Unlike traditional electrical loads, inference workloads can be executed away from the user-facing service location, as long as latency, state locality, capacity, and regulatory constraints are met. The paper develops a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over multiple variables: electricity prices, marginal carbon intensity, power usage effectiveness (PUE), compute capacity, network latency, and migration frictions. The central concept is the energy-latency frontier — the marginal cost and carbon benefit unlocked by relaxing inference latency budgets.
The paper makes four key contributions. First, it clearly distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it introduces a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it defines operational metrics such as relocatable inference demand, energy return on latency, and carbon return on latency, plus a relocation break-even condition. Fourth, it provides a stylized simulation over representative global compute regions, showing how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. Results indicate that relaxing latency expands feasible geography, but migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce the realized benefits. The framework offers a theoretical foundation for making AI inference more energy-efficient and carbon-aware.
- Introduces the energy-latency frontier: the marginal cost and carbon benefit from relaxing inference latency budgets
- Formulates inference placement as an optimization problem over electricity prices, carbon intensity, PUE, and network latency
- Simulation shows latency relaxation expands geography but frictions like egress costs and capacity limits sharply reduce benefits
Why It Matters
This framework could enable AI companies to lower energy costs and carbon footprints by relocating inference workloads intelligently.