Klatt Voice Synthesis

How to work with GS_Play Klatt voice synthesis — text-to-speech with 3D spatial audio, voice profiles, and inline parameter control.

Tags:

Docs

GS_Play includes a built-in text-to-speech system based on Klatt formant synthesis. The KlattVoiceComponent converts text to speech in real time with configurable voice parameters. Voices are positioned in 3D space and attenuate with distance, making synthesized speech feel like it comes from the character speaking.

For component properties, voice parameter details, and the phoneme mapping system, see the Framework API reference.

Klatt Voice Profile asset in the O3DE Asset Editor

How It Works
Voice Configuration
KTT Tags
Phoneme Maps
3D Spatial Audio
Quick Reference
Glossary
See Also

How It Works

Configure a voice using a KlattVoiceProfile — set frequency, speed, waveform, formants, and pitch variance.
Assign a KlattPhonemeMap — maps text characters to ARPABET phonemes for pronunciation.
Speak text from ScriptCanvas or C++ — the system converts text to phonemes and synthesizes audio in real time.
Position in 3D — the voice component uses KlattSpatialConfig for 3D audio positioning relative to the entity.

Voice Configuration

Parameter	What It Controls
Frequency	Base voice pitch.
Speed	Speech rate.
Waveform	Voice quality — Saw, Triangle, Sin, Square, Pulse, Noise, Warble.
Formants	Vocal tract resonance characteristics.
Pitch Variance	Random pitch variation for natural-sounding speech.
Declination	Pitch drop over the course of a sentence.

KTT Tags

KTT (Klatt Text Tags) allow inline parameter changes within speech text for expressive delivery:

"Hello <speed=0.5>world</speed>, how are <pitch=1.2>you</pitch>?"

The KlattCommandParser processes these tags during speech synthesis, enabling mid-sentence changes to speed, pitch, and other voice parameters.

For the complete tag reference — all attributes, value ranges, and reset behavior — see the Framework API: KTT Voice Tags.

Phoneme Maps

Two base phoneme maps are available:

Map	Description
SoLoud_Default	Simple default mapping.
CMU_Full	Full CMU pronunciation dictionary mapping.

Custom phoneme overrides allow project-specific word pronunciations (character names, fantasy terms) without modifying the base map.

3D Spatial Audio

The KlattSpatialConfig controls how synthesized speech is positioned in 3D:

Voices attenuate with distance from the listener.
The KlattVoiceSystemComponent tracks the listener position and updates all active voices.
Multiple characters can speak simultaneously with correct spatial positioning.

Quick Reference

Need	Bus	Method
Control a voice	`KlattVoiceRequestBus`	Voice synthesis methods (entity-addressed)
System-level voice control	`KlattVoiceSystemRequestBus`	Listener tracking, engine management

Glossary

Term	Meaning
Klatt Synthesis	A formant-based speech synthesis method that generates voice from frequency parameters
KTT Tags	Inline text tags that modify voice parameters mid-sentence during synthesis
Phoneme Map	A mapping from text characters to ARPABET phonemes for pronunciation
KlattSpatialConfig	Configuration for 3D audio positioning and distance attenuation of synthesized speech

For full definitions, see the Glossary.

Get GS_Audio

GS_Audio — Explore this gem on the product page and add it to your project.