Phosphene — local video and audio generation for Apple Silicon ( LTX2.3 )
Runs LTX 2.3 natively on MLX with frame-synced audio and one-click install.
Phosphene brings native, local video and audio generation to Apple Silicon Macs by packaging Lightricks' LTX 2.3 model on Apple's MLX framework. Unlike other leading local models (Wan, Hunyuan, Mochi) that output silent video requiring post-hoc audio syncing, LTX 2.3 generates both video and audio in a single forward pass. This means footsteps land on the exact frame, lip movements match dialogue, and ambient sound is conditioned on visual content. The tool is completely free and offers a one-click install via Pinokio.
Phosphene supports four generation modes: text-to-video, image-to-video, first-frame/last-frame interpolation, and extension of existing clips. Quality is adjustable per job—Draft (half resolution, ~2 min), Standard (1280x704, 7 min, Q4 distilled ~25 GB), and High (Q8 two-stage with TeaCache, ~12 min, adds ~25 GB). It also includes local prompt rewriting via a Gemma 3 12B 4-bit text encoder. Hardware compatibility is strictly Apple Silicon; RAM detection gates features automatically (32 GB compact up to 128+ GB pro). Audio quality improves dramatically when prompts include explicit sound cues (e.g., "whispered chant, ember crackle"). Intel Macs and other platforms are unsupported due to MLX's Apple-only design.
- Generates video & audio together in one diffusion pass, ensuring perfect frame-level sync of footsteps, lip movements, and ambient sounds.
- Four modes: text-to-video, image-to-video, first-frame/last-frame interpolation, and extension of existing clips.
- RAM-gated quality tiers (Draft, Standard, High) with local prompt rewriting via Gemma 3 12B, all running offline on Apple Silicon.
- Free and one-click install via Pinokio; uses MLX framework – no Intel Mac or other platform support.
Why It Matters
Enables professionals to generate synced video+audio locally on Macs, cutting post-production time and running fully offline.