Character training from 30–80 photos yields face LoRA and optional voice LoRA in ~3 hours on M4 Max 64 GB?

Character training from 30–80 photos yields face LoRA and optional voice LoRA in ~3 hours on M4 Max 64 GB.

Supports text-to-video+audio, image-to-video+audio, audio-to-video, first-frame-last-frame interpolation, and video extension?

Supports text-to-video+audio, image-to-video+audio, audio-to-video, first-frame-last-frame interpolation, and video extension.

Image Studio includes Qwen-Image-Edit, HiDream-O1 (MLX port in 5 days), and FLUX family for multi-subject compositions?

Image Studio includes Qwen-Image-Edit, HiDream-O1 (MLX port in 5 days), and FLUX family for multi-subject compositions.

Image & Video

Phosphene 3.0 lets you train AI characters on Apple Silicon locally

r/StableDiffusion May 22, 2026

⚡Train custom face and voice LoRAs from 30–80 photos — all local and free.

Deep Dive

Phosphene 3.0, launched by developer AIBizarro, is an open-source AI media suite that runs entirely on Apple Silicon. Unlike typical wrappers around LTX-Video, it introduces native character training: users can train a custom face LoRA and an optional voice LoRA from a single dataset of 30–80 photos plus a voice clip. The training process takes roughly three hours on an M4 Max with 64 GB of unified memory, and all auto-captioning is handled locally via Gemma 3 12B. Once trained, characters can be animated with text-to-video+audio, image-to-video+audio, or audio-driven video, and clips can be extended or interpolated between keyframes.

Technically, Phosphene 3.0 bundles LTX-Video 2.3 (by Lightricks) with an MLX port for Apple Silicon, and includes an Image Studio featuring three engines: Qwen-Image-Edit-2511, HiDream-O1 (ported to MLX just five days after its release), and the FLUX.1 family. The HiDream-O1 model delivers photorealistic portraits and multi-subject composition at about 67 seconds per 1024² image on a 64 GB Mac. The suite auto-detects hardware tiers: 16/24 GB Macs get 512 px video, 32 GB gets 768 px, and 64 GB+ unlocks 1024×576 video, full HD images, and character training. A 7-second character clip with synced audio renders in ~6 minutes on the same high-end hardware.

Installation is one-click via Pinokio or by cloning the repo. The first run downloads ~28 GB of model weights. Honest limitations: Apple Silicon only (no Intel, Windows, or Linux), dialogue audio quality is inconsistent (ambient sound performs better), and character LoRAs are video-only (image LoRAs use a separate stack). The project is MIT-licensed for the panel, while LTX-Video weights follow Lightricks' license and HiDream uses its own licensing.

Key Points

Character training from 30–80 photos yields face LoRA and optional voice LoRA in ~3 hours on M4 Max 64 GB.
Supports text-to-video+audio, image-to-video+audio, audio-to-video, first-frame-last-frame interpolation, and video extension.
Image Studio includes Qwen-Image-Edit, HiDream-O1 (MLX port in 5 days), and FLUX family for multi-subject compositions.

Why It Matters

Enables professional-grade AI character creation entirely on-device for Apple users, no cloud or subscription required.

Read Original Article

Phosphene 3.0 lets you train AI characters on Apple Silicon locally

Why It Matters

Related Articles

🚀 Stay Ahead in AI