Developer Tools

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

Hacker News March 11, 2026

⚡A complete on-device STT+LLM+TTS pipeline runs natively on Apple Silicon with 38 macOS voice actions.

Deep Dive

RunAnywhere, a Y Combinator W26 company, has launched RCLI, a terminal-based voice AI application that brings a complete on-device AI pipeline to Apple Silicon Macs. The tool combines speech-to-text (STT), a large language model (LLM), and text-to-speech (TTS) into a single local application, powered by the company's proprietary MetalRT GPU inference engine. This architecture enables sub-200ms end-to-end latency for voice interactions and eliminates any dependency on cloud services or external API keys, offering full privacy and offline functionality.

RCLI's core feature set includes voice control for 38 macOS actions—like controlling Spotify, adjusting system volume, or opening apps—via natural conversation. It also includes a local RAG (retrieval-augmented generation) system for querying personal documents with a hybrid vector+BM25 retrieval latency of ~4ms. The application requires macOS 13+ on Apple Silicon, with the MetalRT engine specifically requiring an M3 chip or later (M1/M2 Macs fall back to using llama.cpp). Installation is available via a one-line curl command or Homebrew, with an initial ~1GB model download.

The technical pipeline is sophisticated, running three concurrent threads on the Metal GPU for voice activity detection, streaming STT, LLM inference with KV cache and Flash Attention, and double-buffered TTS. The interactive TUI (terminal user interface) provides push-to-talk functionality, live hardware monitoring, and model management. For developers and power users, this represents a significant leap in making performant, private, and actionable AI a standard feature of the macOS environment.

Key Points

Full local AI pipeline: Runs STT (Zipformer/Whisper), LLM (Qwen3/LFM2), and TTS entirely on-device with sub-200ms latency.
Proprietary MetalRT engine: A custom GPU inference engine for Apple Silicon, with STT reportedly 714x faster than real-time on M3 Max.
38 actionable voice commands: Control macOS apps (Spotify, Messages), system functions (volume, dark mode), and perform local RAG document Q&A.

Why It Matters

It enables fast, private, and actionable AI assistants on personal devices, moving complex interactions from the cloud to your laptop.

Read Original Article

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

Why It Matters

Stay Ahead in AI