Open Source

Reddit user benchmarks 21 GPUs for OmniVoice TTS model, RTX 3090 as baseline

5GB VRAM peak TTS model tested across consumer GPUs at varying real-time speeds.

Deep Dive

A Reddit user rented different GPUs on vast.ai to benchmark OmniVoice, a small TTS model with about 5 GB peak VRAM. Using their own RTX 3090 as a baseline, they measured xRT (times real-time) for voice cloning from a short paragraph, averaging three runs. The results give a rough estimate of how these mostly consumer GPUs compare, offering a practical reference for AI hobbyists choosing cost-effective hardware for local TTS inference.

Key Points
  • 21 GPUs tested including RTX 3090 baseline, with OmniVoice TTS model using 5GB VRAM peak.
  • Performance measured in xRT (times real-time) averaged over 3 runs of a short voice cloning task.
  • Results provide a practical cost-performance guide for running small TTS models on consumer hardware.

Why It Matters

Real-world benchmarks help AI developers pick cost-effective GPUs for local TTS inference without overspending.