Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering
A human-AI team cracked Apple's private ANE framework, achieving direct hardware access and revealing true performance metrics.
A collaborative research team comprising a human engineer and Anthropic's Claude Opus 4.6 AI has successfully reverse-engineered Apple's M4 Neural Engine (ANE), bypassing the official CoreML framework to achieve direct hardware access. Over several days, they mapped the entire software stack from CoreML down to the IOKit kernel driver, discovered how to compile and execute neural network programs directly on the ANE without CoreML's overhead, and cracked the binary program format. This work, documented as Part 1 of a three-part series, represents the first public achievement of direct _ANEClient API access on the M4, cracking the in-memory MIL compilation path, and measuring true peak throughput by circumventing CoreML's optimization layers.
The technical breakthrough centers on the _ANEClient class within Apple's private AppleNeuralEngine.framework, which provides a direct compile → load → evaluate pipeline that CoreML merely sits atop. The team discovered over 40 private classes and confirmed the M4's ANE (codename H16G) is a 16-core, fixed-function graph execution engine—not a GPU or CPU—that processes entire neural networks as atomic operations. Their scaling analysis to infer hardware topology revealed that Apple's marketed "38 TOPS" performance figure is misleading when measured against the actual hardware capabilities without CoreML abstraction. This foundational work paves the way for subsequent parts exploring true performance benchmarking and, ultimately, training neural networks on a chip Apple designed exclusively for inference.
- First direct _ANEClient API access on M4 ANE, bypassing CoreML's abstraction and overhead layers.
- Mapped 40+ private classes in AppleNeuralEngine.framework and cracked the in-memory MIL compilation path.
- Reveals M4's ANE (H16G) is a 16-core graph engine and that Apple's "38 TOPS" marketing is misleading.
Why It Matters
Enables deeper hardware optimization, independent performance verification, and could unlock on-device training capabilities.