CARMEN chip cuts AI inference cycles by 33% with adaptive precision
New CORDIC-based engine slashes power by 21% while hitting 4.83 TOPS/mm2.
A research team led by Sonu Kumar, Mukul Lokhande, Santosh Kumar Vishvakarma, and Adam Teman has unveiled CARMEN, a novel inference engine that leverages CORDIC (Coordinate Rotation Digital Computer) arithmetic to deliver runtime-adaptive multi-precision computation for deep learning. The key innovation lies in controlling precision by adjusting the CORDIC iteration depth, allowing the chip to seamlessly switch between high-accuracy and approximate modes without any hardware reconfiguration. The architecture integrates a low-resource iterative CORDIC-based MAC unit with a time-multiplexed multi-activation function block, supporting flexible 8/16-bit precision and achieving high hardware utilization.
Implemented in 28nm CMOS, a 256-processing-element configuration of CARMEN delivers exceptional efficiency: up to 33% reduction in computation cycles and 21% power savings per MAC stage compared to conventional designs. The engine achieves a compute density of 4.83 TOPS/mm² and an energy efficiency of 11.67 TOPS/W. When deployed on a PynqZ2 FPGA, the system validated 154.6 ms latency at just 0.43 W for real-time object detection tasks. These results position CARMEN as a promising candidate for edge AI applications that demand both performance and energy frugality, especially in robotics and embedded vision systems.
- Dynamic precision control via adjustable CORDIC iteration depth enables seamless switching between approximate and accurate computation modes.
- 28nm CMOS test chip with 256 processing elements achieves 33% fewer cycles and 21% power savings per MAC operation.
- FPGA deployment on PynqZ2 confirms 154.6 ms latency at 0.43 W for real-time object detection, yielding 11.67 TOPS/W energy efficiency.
Why It Matters
Brings efficient, adaptive deep learning inference to energy-constrained edge devices like drones and robots.