Research & Papers

HAF Framework Boosts AI-RAN Resource Sharing by 20.5%

LLM-driven scheduler hits 90% SLO fulfillment, doubling AI service request success rates

Deep Dive

AI-RAN converges AI services and Radio Access Network (RAN) functions on shared GPU-accelerated edge infrastructure, but coordinating real-time RAN tasks with diverse AI workloads across mismatched timescales is notoriously difficult. A new paper from Haiyuan Li, Yulei Wu, and Dimitra Simeonidou introduces the Hierarchical Agentic Framework (HAF) to solve this. HAF uses a large language model (LLM) agent for slow-timescale placement decisions (which services go where) and a closed-form, deadline-aware convex algorithm for fast-timescale GPU/CPU allocation. Additionally, a predictive critic evaluates potential migrations, filtering those where service interruption outweighs expected SLO benefits.

Experimental results are striking: HAF achieves 90.0% overall SLO fulfillment, a 20.5% improvement over the strongest baseline. AI service request fulfillment jumps from 51% to 85.3%. The framework retains its advantage under diverse load conditions, and the critic consistently improves SLO fulfillment across multiple open-source LLM agents. For professionals, this means more reliable, efficient edge AI deployment — enabling real-time AI applications alongside critical network functions without sacrificing performance.

Key Points
  • Combines an LLM-based agent for slow-timescale placement with a convex algorithm for fast-timescale GPU/CPU allocation
  • Predictive critic filters migrations when service interruption outweighs SLO benefit, improving overall efficiency
  • Achieves 90% SLO fulfillment (20.5% lift) and raises AI service request success from 51% to 85.3%

Why It Matters

Enables reliable AI-RAN edge deployment, boosting performance for real-time AI and network functions together.