Voice AI may be computing's third shift: from GUI to talking
Why WeChat's walkie-talkie feature reveals voice as the default human interface
A viral Reddit post recounts an observation on a Chinese subway: older passengers hold their phones and talk into them, while younger ones type. This isn't about poor typing skills—it reflects a deep preference for voice, accelerated by WeChat's early embrace of walkie-talkie-style voice messages. The post argues that humans have spoken for 100,000 years, while mass literacy and typing are mere centuries old. Voice is the default; text is the exception. Products like Wispr Flow already let users speak to generate text, and heavy adoption suggests the input side is shifting.
But the real frontier is voice for machine interaction. For a century, we've talked to computers via numbers, text, or code. Siri-era voice only triggered preset commands. Large language models change this: a vague request can be parsed and acted on by agents. Owlfy is building this for desktops, and Rabbit's "Large Action Model" pitched the same idea (though execution fell short). If voice agents become reliable, this could be the third major computing shift—from command line to GUI to simply talking. Yet downsides remain: voice is hard to skim, slower than reading, and awkward in public. The question: will professionals reach for voice or keyboard first?
- WeChat's early voice-note feature drove adoption among older Chinese users, revealing a strong preference for voice communication.
- Humans have spoken for ~100,000 years vs. writing's 5,000 years and typing's mere centuries—voice is the biological default.
- LLM-powered agents like Owlfy and Rabbit's Large Action Model aim to make voice the primary interface for computers, potentially a third paradigm shift.
Why It Matters
Voice AI could make computing accessible to billions who cannot type, transforming human-machine interaction.