Apple's CoreAI replaces CoreML for on-device inference with 20B model support
CoreAI supports up to 20B parameter MoE models on phones and tablets.
Apple quietly unveiled CoreAI at WWDC, a new on-device inference engine designed to replace CoreML for Apple Silicon. CoreAI supports models up to 20B parameters (via lazy Mixture-of-Experts) and expands the Neural Engine operations pool significantly. It works as an alternative to existing frameworks like MLX, llama.cpp, and PyTorch, requiring model weight conversion via Python scripts—similar to CoreML’s workflow.
Currently, CoreAI supports a limited set of models from mid-2025, but Apple’s third-generation Foundation Models (20B param MoE) are already touted for on-device deployment. No performance data is available yet, but early speculation suggests pure GPU inference via MLX may still outperform CoreAI. The update implies a major revamp of ANE ops, enabling larger model deployments within apps without cloud reliance.
- CoreAI replaces CoreML with support for 20B parameter MoE models on device
- Expanded ANE ops enable models beyond CoreML's previous few-billion-param limit
- Currently compatible only with mid-2025 models; no performance benchmarks released yet
Why It Matters
Enables larger AI models to run locally on iPhones/iPads, reducing cloud dependency and latency.