Klatt Voice Synthesis

How to work with GS_Play Klatt voice synthesis — text-to-speech with 3D spatial audio, voice profiles, and inline parameter control.

GS_Play includes a built-in text-to-speech system based on Klatt formant synthesis. The KlattVoiceComponent converts text to speech in real time with configurable voice parameters. Voices are positioned in 3D space and attenuate with distance, making synthesized speech feel like it comes from the character speaking.

For component properties, voice parameter details, and the phoneme mapping system, see the Framework API reference.

Klatt Voice Profile asset in the O3DE Asset Editor

 

Contents


How It Works

  1. Configure a voice using a KlattVoiceProfile — set frequency, speed, waveform, formants, and pitch variance.
  2. Assign a KlattPhonemeMap — maps text characters to ARPABET phonemes for pronunciation.
  3. Speak text from ScriptCanvas or C++ — the system converts text to phonemes and synthesizes audio in real time.
  4. Position in 3D — the voice component uses KlattSpatialConfig for 3D audio positioning relative to the entity.

Voice Configuration

ParameterWhat It Controls
FrequencyBase voice pitch.
SpeedSpeech rate.
WaveformVoice quality — Saw, Triangle, Sin, Square, Pulse, Noise, Warble.
FormantsVocal tract resonance characteristics.
Pitch VarianceRandom pitch variation for natural-sounding speech.
DeclinationPitch drop over the course of a sentence.

KTT Tags

KTT (Klatt Text Tags) allow inline parameter changes within speech text for expressive delivery:

"Hello <speed=0.5>world</speed>, how are <pitch=1.2>you</pitch>?"

The KlattCommandParser processes these tags during speech synthesis, enabling mid-sentence changes to speed, pitch, and other voice parameters.

For the complete tag reference — all attributes, value ranges, and reset behavior — see the Framework API: KTT Voice Tags.


Phoneme Maps

Two base phoneme maps are available:

MapDescription
SoLoud_DefaultSimple default mapping.
CMU_FullFull CMU pronunciation dictionary mapping.

Custom phoneme overrides allow project-specific word pronunciations (character names, fantasy terms) without modifying the base map.


3D Spatial Audio

The KlattSpatialConfig controls how synthesized speech is positioned in 3D:

  • Voices attenuate with distance from the listener.
  • The KlattVoiceSystemComponent tracks the listener position and updates all active voices.
  • Multiple characters can speak simultaneously with correct spatial positioning.

Quick Reference

NeedBusMethod
Control a voiceKlattVoiceRequestBusVoice synthesis methods (entity-addressed)
System-level voice controlKlattVoiceSystemRequestBusListener tracking, engine management

Glossary

TermMeaning
Klatt SynthesisA formant-based speech synthesis method that generates voice from frequency parameters
KTT TagsInline text tags that modify voice parameters mid-sentence during synthesis
Phoneme MapA mapping from text characters to ARPABET phonemes for pronunciation
KlattSpatialConfigConfiguration for 3D audio positioning and distance attenuation of synthesized speech

For full definitions, see the Glossary.


See Also

For the full API, component properties, and C++ extension guide:

For related systems:


Get GS_Audio

GS_Audio — Explore this gem on the product page and add it to your project.