ElevenLabs Launches Generative Voice AI Tool for Custom Synthetic Voices
Ted Hisokawa Mar 06, 2026 12:43
ElevenLabs deploys new generative model letting users design entirely new synthetic voices from scratch, targeting audiobooks, games, and content creators.
ElevenLabs has deployed a generative AI model that creates entirely new synthetic voices from scratch, addressing what the company calls a "severely underhyped" segment of the AI market. The Voice Generator tool lets users design custom voices by setting parameters including gender, age, accent, pitch, and speaking style.
The feature, rolling out through the company's Voice Lab, generates unique voices with each use—even when identical base parameters are selected. This solves a practical problem: ElevenLabs found its existing speaker bank too limited for users who needed exclusive voices for their projects.
How It Works
The technical approach emerged from ElevenLabs' existing speech synthesis and voice cloning infrastructure. Both processes rely on speaker embeddings—vector representations that encode a voice's characteristics. By training a dedicated model to sample from the distribution of these embeddings, the company can now generate infinite variations.
The conditioning layer adds control. Users aren't just rolling dice on random outputs; they're specifying core identity markers that shape the generated voice.
Target Applications
The company is positioning the tool across several verticals:
Publishing: Book authors can convert text to audio while maintaining artistic control over narration design—potentially expanding the audiobook market to titles that couldn't justify traditional recording costs.
News Media: Publishers experimenting with audio content can create distinctive, exclusive voices for their brands. The exclusivity angle matters here—a voice representing one outlet won't show up elsewhere.
Game Development: Studios can voice NPCs that would otherwise remain silent, with voices unique to their virtual worlds. The cost-efficiency argument is straightforward: more voiced content without proportional budget increases.
Advertising: Creatives can prototype multiple voice styles instantly during early campaign development, before committing resources.
Industry Context
The launch arrives as voice AI advances rapidly across the sector. Late 2024 saw Azure release its gpt-4o-mini-tts model, while early 2026 brought the open-sourced Qwen3-TTS family emphasizing voice design and multilingual streaming. The broader trend points toward orchestrated speech systems combining speech-to-text, large language models, and text-to-speech—plus emerging speech-to-speech models that bypass text conversion entirely.
ElevenLabs is also telegraphing its next move: combining voice generation with voice cloning to let users enhance their own voices. The pitch involves manipulating cloned voices to sound more natural or varied—targeting anyone who records presentations or audio messages but dislikes how they sound.
Safety Measures
The company outlined several safeguards against misuse: terms prohibiting illegal or harmful applications, watermarking to trace generated audio back to the platform, and review processes for reported infringements. On the economic displacement concern, ElevenLabs argues voice actors could license their voices for AI training while participating in more projects simultaneously.
Whether that framing satisfies working voice actors remains an open question as synthetic voice quality continues approaching human parity.
Image source: Shutterstock