Research & Papers

AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

New baremetal AI framework eliminates OS overhead, achieving 9.2x higher compute efficiency than Linux-based systems.

Deep Dive

A team of researchers has unveiled AEG, a novel baremetal framework designed to run AI inference directly on heterogeneous hardware accelerators like AI Engine (AIE) arrays, completely bypassing the need for an operating system. The core innovation is its 'Control as Data' paradigm, which flattens complex control logic into simple, linear Runtime Control Blocks (RCBs). These RCBs are executed by a generic engine through a minimal Runtime Hardware Abstraction Layer (RHAL), fundamentally decoupling the software runtime from hardware specifics. The framework also integrates Runtime Platform Management (RTPM) for system orchestration and a Runtime In-Memory File System (RIMFS) for data handling in OS-free environments.

The performance gains are substantial. In a benchmark running the ResNet-18 image classification model on Xilinx/Kria hardware, AEG demonstrated a 9.2x higher compute efficiency (throughput per AIE tile) compared to a standard Linux-based deployment using AMD's Vitis AI stack. It also reduced data movement overhead by 3-7x and showed near-zero latency variance. Crucially, AEG achieved 68.78% Top-1 accuracy on ImageNet using only 28 AI Engine tiles, whereas the Vitis AI implementation required 304 tiles—a 90% reduction in resource usage for comparable accuracy. This validates the framework's ability to deliver high-performance AI with extreme hardware efficiency.

The implications are significant for the edge AI and embedded systems space, where resources are constrained and every milliwatt and millisecond counts. By eliminating the overhead and complexity of a Real-Time Operating System (RTOS) or general-purpose OS, AEG opens the door for more powerful AI models to run on smaller, cheaper, and more power-efficient devices. This could accelerate the deployment of advanced computer vision, sensor fusion, and real-time decision-making in everything from industrial IoT and robotics to autonomous vehicles and smart sensors.

Key Points
  • Achieved 9.2x higher compute efficiency vs. Linux-based Vitis AI deployment on AMD/Xilinx AI Engines.
  • Ran ResNet-18 with 90% fewer AI Engine tiles (28 vs. 304) while maintaining 68.78% ImageNet accuracy.
  • Uses a 'Control as Data' paradigm with Runtime Control Blocks (RCBs) to eliminate OS overhead and reduce data movement by 3-7x.

Why It Matters

Enables powerful, efficient AI on ultra-constrained edge devices, reducing cost and power for real-time inference in IoT and robotics.