Dynamic precision control via adjustable CORDIC iteration depth enables seamless switching between approximate and accurate computation modes?

Dynamic precision control via adjustable CORDIC iteration depth enables seamless switching between approximate and accurate computation modes.

28nm CMOS test chip with 256 processing elements achieves 33% fewer cycles and 21% power savings per MAC operation?

28nm CMOS test chip with 256 processing elements achieves 33% fewer cycles and 21% power savings per MAC operation.

FPGA deployment on PynqZ2 confirms 154.6 ms latency at 0.43 W for real-time object detection, yielding 11.67 TOPS/W energy efficiency?

FPGA deployment on PynqZ2 confirms 154.6 ms latency at 0.43 W for real-time object detection, yielding 11.67 TOPS/W energy efficiency.

Image & Video

CARMEN chip cuts AI inference cycles by 33% with adaptive precision

arXiv eess.IV May 11, 2026

⚡New CORDIC-based engine slashes power by 21% while hitting 4.83 TOPS/mm2.

Deep Dive

A research team led by Sonu Kumar, Mukul Lokhande, Santosh Kumar Vishvakarma, and Adam Teman has unveiled CARMEN, a novel inference engine that leverages CORDIC (Coordinate Rotation Digital Computer) arithmetic to deliver runtime-adaptive multi-precision computation for deep learning. The key innovation lies in controlling precision by adjusting the CORDIC iteration depth, allowing the chip to seamlessly switch between high-accuracy and approximate modes without any hardware reconfiguration. The architecture integrates a low-resource iterative CORDIC-based MAC unit with a time-multiplexed multi-activation function block, supporting flexible 8/16-bit precision and achieving high hardware utilization.

Implemented in 28nm CMOS, a 256-processing-element configuration of CARMEN delivers exceptional efficiency: up to 33% reduction in computation cycles and 21% power savings per MAC stage compared to conventional designs. The engine achieves a compute density of 4.83 TOPS/mm² and an energy efficiency of 11.67 TOPS/W. When deployed on a PynqZ2 FPGA, the system validated 154.6 ms latency at just 0.43 W for real-time object detection tasks. These results position CARMEN as a promising candidate for edge AI applications that demand both performance and energy frugality, especially in robotics and embedded vision systems.

Key Points

Dynamic precision control via adjustable CORDIC iteration depth enables seamless switching between approximate and accurate computation modes.
28nm CMOS test chip with 256 processing elements achieves 33% fewer cycles and 21% power savings per MAC operation.
FPGA deployment on PynqZ2 confirms 154.6 ms latency at 0.43 W for real-time object detection, yielding 11.67 TOPS/W energy efficiency.

Why It Matters

Brings efficient, adaptive deep learning inference to energy-constrained edge devices like drones and robots.

Read Original Article

CARMEN chip cuts AI inference cycles by 33% with adaptive precision

Why It Matters

Related Articles

🚀 Stay Ahead in AI