Image & Video

CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications

New ASIC design achieves 4.83 TOPS/mm² density and 11.67 TOPS/W efficiency for AIoT devices.

Deep Dive

A team of researchers has unveiled CORVET, a novel vector processing engine designed specifically for high-throughput AIoT applications. The architecture introduces a runtime-adaptive, CORDIC-based MAC unit that enables dynamic switching between approximate and accurate computational modes, allowing developers to exploit the latency-accuracy trade-off for different workloads. This flexibility is crucial for edge devices where power and computational resources are constrained.

Technical implementation shows significant efficiency gains: each MAC stage saves up to 33% in processing time and 21% in power consumption compared to conventional designs. The 256-PE (Processing Element) configuration achieves remarkable compute density of 4.83 TOPS/mm² and energy efficiency of 11.67 TOPS/W, outperforming previous state-of-the-art solutions. The engine leverages vectorized, time-multiplexed execution and flexible precision scaling (4/8/16-bit) to achieve up to 4x throughput improvement within the same hardware footprint.

The design includes a time-multiplexed multi-AF (Activation Function) block and lightweight pooling/normalization unit, supporting a hardware-software co-design methodology validated on Pynq-Z2 platforms for object detection and classification tasks. This approach demonstrates how specialized architectures can overcome the limitations of general-purpose processors in edge AI scenarios where real-time processing, energy efficiency, and cost constraints are paramount.

CORVET represents a significant advancement in edge AI hardware, offering a scalable solution that bridges the gap between computational demands and resource limitations in IoT devices. Its mixed-precision capabilities and adaptive computation modes make it particularly suitable for applications ranging from smart sensors to autonomous edge devices that require efficient AI inference without cloud dependency.

Key Points
  • Achieves 4.83 TOPS/mm² compute density and 11.67 TOPS/W energy efficiency with 256-PE configuration
  • Saves 33% time and 21% power per MAC stage using CORDIC-based iterative computation
  • Enables 4x throughput improvement via vectorized execution and 4/8/16-bit mixed-precision scaling

Why It Matters

Enables powerful AI on resource-constrained edge devices, reducing cloud dependency and latency for real-time IoT applications.