Klatt Voice Synthesis

Custom text-to-speech via Klatt formant synthesis with 3D spatial audio, phoneme mapping, and voice profiling.

Tags:

The Klatt Voice Synthesis system provides custom text-to-speech for GS_Play projects using Klatt formant synthesis with full 3D spatial audio. It uses SoLoud internally for speech generation and MiniAudio for spatial positioning.

The system has two layers:

KlattVoiceSystemComponent – A singleton that manages the shared SoLoud engine instance and tracks the 3D audio listener position.
KlattVoiceComponent – A per-entity component that generates speech, queues segments, applies voice profiles, and emits spatialized audio from the entity’s position.

Voice characteristics are defined through KlattVoiceProfile assets containing frequency, speed, waveform, formant, and phoneme mapping configuration. Phoneme maps convert input text to ARPABET phonemes for the Klatt synthesizer, with support for custom pronunciation overrides.

For usage guides and setup examples, see The Basics: GS_Audio.

Klatt Voice Profile asset in the O3DE Asset Editor

Components
API Reference
Data Types
Enumerations
KTT Voice Tags
Combined Example
See Also

Components

KlattVoiceSystemComponent

Singleton component that manages the shared SoLoud engine and 3D listener tracking.

Field	Value
TypeId	`{F4A5D6E7-8B9C-4D5E-A1F2-3B4C5D6E7F8A}`
Extends	`AZ::Component`, `AZ::TickBus::Handler`
Bus	`KlattVoiceSystemRequestBus` (Single/Single)

KlattVoiceComponent

Per-entity voice component with spatial audio, phoneme mapping, and segment queue.

Field	Value
TypeId	`{4A8B9C7D-6E5F-4D3C-2B1A-0F9E8D7C6B5A}`
Extends	`AZ::Component`, `AZ::TickBus::Handler`
Request Bus	`KlattVoiceRequestBus` (Single/ById, entity-addressed)
Notification Bus	`KlattVoiceNotificationBus` (Multiple/Multiple)

API Reference

Request Bus: `KlattVoiceSystemRequestBus`

System-level voice management. Singleton bus – Single address, single handler.

Method	Parameters	Returns	Description
`GetSoLoudEngine`	–	`SoLoud::Soloud*`	Returns a pointer to the shared SoLoud engine instance.
`SetListenerPosition`	`const AZ::Vector3& position`	`void`	Updates the 3D audio listener position for spatial voice playback.
`SetListenerOrientation`	`const AZ::Vector3& forward, const AZ::Vector3& up`	`void`	Updates the 3D audio listener orientation.
`GetListenerPosition`	–	`AZ::Vector3`	Returns the current listener position.
`IsEngineReady`	–	`bool`	Returns whether the SoLoud engine has been initialized and is ready.

Request Bus: `KlattVoiceRequestBus`

Per-entity voice synthesis controls. Entity-addressed bus – Single handler per entity ID.

Method	Parameters	Returns	Description
`Speak`	`const AZStd::string& text`	`void`	Converts text to speech and plays it. Uses the component’s configured voice profile.
`SpeakWithParams`	`const AZStd::string& text, const KlattVoiceParams& params`	`void`	Converts text to speech using the specified voice parameters instead of the profile defaults.
`StopSpeaking`	–	`void`	Immediately stops any speech in progress and clears the segment queue.
`IsSpeaking`	–	`bool`	Returns whether this entity’s voice is currently producing speech.
`QueueSegment`	`const AZStd::string& text`	`void`	Adds a speech segment to the queue. Queued segments play in order after the current segment finishes.
`ClearQueue`	–	`void`	Clears all queued speech segments without stopping current playback.
`SetVoiceProfile`	`const AZ::Data::Asset<KlattVoiceProfile>& profile`	`void`	Changes the voice profile used by this component.
`GetVoiceProfile`	–	`AZ::Data::Asset<KlattVoiceProfile>`	Returns the currently assigned voice profile asset.
`SetSpatialConfig`	`const KlattSpatialConfig& config`	`void`	Updates the 3D spatial audio configuration for this voice.
`GetSpatialConfig`	–	`KlattSpatialConfig`	Returns the current spatial audio configuration.
`SetVolume`	`float volume`	`void`	Sets the output volume for this voice (0.0 to 1.0).
`GetVolume`	–	`float`	Returns the current output volume.

Notification Bus: `KlattVoiceNotificationBus`

Events broadcast by voice components. Multiple handler bus – any number of components can subscribe.

Event	Parameters	Description
`OnSpeechStarted`	`const AZ::EntityId& entityId`	Fired when an entity begins speaking.
`OnSpeechFinished`	`const AZ::EntityId& entityId`	Fired when an entity finishes speaking (including all queued segments).
`OnSegmentStarted`	`const AZ::EntityId& entityId, int segmentIndex`	Fired when a new speech segment begins playing.
`OnSegmentFinished`	`const AZ::EntityId& entityId, int segmentIndex`	Fired when a speech segment finishes playing.

Data Types

KlattVoiceParams

Core voice synthesis parameters controlling the Klatt formant synthesizer output.

Field	Value
TypeId	`{8A9C7F3B-4E2D-4C1A-9B5E-6D8F9A2C1B4E}`

Field	Type	Description
Base Frequency	`float`	Fundamental frequency (F0) in Hz. Controls the base pitch of the voice.
Speed	`float`	Speech rate multiplier. 1.0 is normal speed.
Declination	`float`	Pitch declination rate. Controls how pitch drops over the course of an utterance.
Waveform	`KlattWaveform`	Glottal waveform type used by the synthesizer.
Formant Shift	`float`	Shifts all formant frequencies up or down. Positive values raise pitch character, negative values lower it.
Pitch Variance	`float`	Amount of random pitch variation applied during speech for natural-sounding intonation.

KlattVoiceProfile

A voice profile asset combining synthesis parameters with a phoneme mapping.

Field	Value
TypeId	`{2CEB777E-DAA7-40B1-BFF4-0F772ADE86CF}`
Reflection	Requires `GS_AssetReflectionIncludes.h` — see Serialization Helpers

Field	Type	Description
Voice Params	`KlattVoiceParams`	The synthesis parameters for this voice profile.
Phoneme Map	`AZ::Data::Asset<KlattPhonemeMap>`	The phoneme mapping asset used for text-to-phoneme conversion.

KlattVoicePreset

A preset configuration for quick voice setup.

Field	Value
TypeId	`{2B8D9E4F-7C6A-4D3B-8E9F-1A2B3C4D5E6F}`

Field	Type	Description
Preset Name	`AZStd::string`	Display name for this preset.
Profile	`KlattVoiceProfile`	The voice profile configuration stored in this preset.

KlattSpatialConfig

3D spatial audio configuration for voice positioning.

Field	Value
TypeId	`{7C9F8E2D-3A4B-5F6C-1E0D-9A8B7C6D5E4F}`

Field	Type	Description
Enable 3D	`bool`	Whether this voice uses 3D spatialization. When false, audio plays as 2D.
Min Distance	`float`	Distance at which attenuation begins. Below this distance the voice plays at full volume.
Max Distance	`float`	Distance at which the voice reaches minimum volume.
Attenuation Model	`int`	The distance attenuation curve type (linear, inverse, exponential).
Doppler Factor	`float`	Intensity of the Doppler effect applied to this voice. 0.0 disables Doppler.

KlattPhonemeMap

Phoneme mapping asset for text-to-ARPABET conversion with custom overrides.

Field	Value
TypeId	`{F3E9D7C1-2A4B-5E8F-9C3D-6A1B4E7F2D5C}`
Reflection	Requires `GS_AssetReflectionIncludes.h` — see Serialization Helpers

Field	Type	Description
Base Map	`BasePhonemeMap`	The base phoneme dictionary to use as the foundation for conversion.
Overrides	`AZStd::vector<PhonemeOverride>`	Custom pronunciation overrides for specific words or patterns.

PhonemeOverride

A custom pronunciation rule that overrides the base phoneme map for a specific word or pattern.

Field	Value
TypeId	`{A2B5C8D1-4E7F-3A9C-6B2D-1F5E8A3C7D9B}`

Field	Type	Description
Word	`AZStd::string`	The word or pattern to match.
Phonemes	`AZStd::string`	The ARPABET phoneme sequence to use for this word.

Enumerations

KlattWaveform

Glottal waveform types available for the Klatt synthesizer.

Field	Value
TypeId	`{8ED1DABE-3347-44A5-B43A-C171D36AE780}`

Value	Description
`Saw`	Sawtooth waveform. Bright, buzzy character.
`Triangle`	Triangle waveform. Softer than sawtooth, slightly hollow.
`Sin`	Sine waveform. Pure tone, smooth and clean.
`Square`	Square waveform. Hollow, reed-like character.
`Pulse`	Pulse waveform. Variable duty cycle for varied timbres.
`Noise`	Noise waveform. Breathy, whisper-like quality.
`Warble`	Warble waveform. Modulated tone with vibrato-like character.

BasePhonemeMap

Available base phoneme dictionaries for text-to-ARPABET conversion.

Field	Value
TypeId	`{D8F2A3C5-1B4E-7A9F-6D2C-5E8A1B3F4C7D}`

Value	Description
`SoLoud_Default`	The default phoneme mapping built into SoLoud. Covers standard English pronunciation.
`CMU_Full`	The full CMU Pronouncing Dictionary. Comprehensive English phoneme coverage with over 130,000 entries.

KTT Voice Tags

KTT (Klatt Text Tags) are inline commands embedded in strings passed to KlattVoiceComponent::SpeakText. They are parsed by KlattCommandParser::Parse and stripped from the spoken text before synthesis begins — they are never heard.

Format: <ktt attr1=value1 attr2=value2>

Multiple attributes can be combined in a single tag. Attribute names are case-insensitive. String values may optionally be wrapped in quotes. An empty value (e.g. speed=) resets that parameter to the voice profile default.

`speed=X`

Override the speech speed multiplier from this point forward.


Range	`0.1` – `5.0`
Default reset	`speed=` (restores profile default)
1.0	Normal speed

Normal speech <ktt speed=2.0> fast bit <ktt speed=> back to default.

`decl=X` / `declination=X`

Pitch declination — how much pitch falls over the course of the utterance. Both decl and declination are accepted.


Range	`0.0` – `1.0`
0.0	Steady pitch (no fall)
0.8	Strong downward drift

Rising <ktt decl=0.0> steady <ktt decl=0.8> falling voice.

`waveform="TYPE"`

Change the glottal waveform used by the synthesizer, setting the overall character of the voice.

Value	Character
`saw`	Default, neutral voice
`triangle`	Softer, smoother
`sin` / `sine`	Pure tone, robotic
`square`	Harsh, mechanical
`pulse`	Raspy, textured
`noise`	Whispered, breathy
`warble`	Wobbly, character voice

<ktt waveform="noise"> whispered section <ktt waveform="saw"> normal voice.

`vowel=X`

First formant (F1) frequency multiplier. Shifts the quality of synthesised vowel sounds.


1.0	Normal
> 1.0	More open vowel quality
< 1.0	More closed vowel quality

<ktt vowel=1.4> different vowel colour here.

`accent=X`

Second formant (F2) frequency multiplier. Shifts accent or dialect colouration.


1.0	Normal
< 1.0	Shifted accent colouring

<ktt accent=0.8> shifted accent here.

`pitch=X`

F0 pitch variance amount. Controls how much pitch varies during synthesis.


1.0	Normal variance
> 1.0	More expressive intonation
< 1.0	Flatter, more monotone

<ktt pitch=2.0> very expressive speech <ktt pitch=0.1> flat monotone.

`pause=X`

Insert a pause of X seconds at this position in the voice playback. Value is required — there is no default.

Hello.<ktt pause=0.8> How are you?

Combined Example

Dialogue string using typewriter text commands and KTT voice tags together:

[b]Warning:[/b] [color=#FF0000]do not[/color] proceed.[pause=1]
<ktt waveform="square" pitch=1.8>This is a mechanical override.<ktt pause=0.5><ktt waveform="saw" pitch=1.0>
[speed=3]Resuming normal protocol.[/speed]

Get GS_Audio

GS_Audio — Explore this gem on the product page and add it to your project.

Klatt Voice Synthesis

Categories:

Tags:

Contents

Components

KlattVoiceSystemComponent

KlattVoiceComponent

API Reference

Request Bus: `KlattVoiceSystemRequestBus`

Request Bus: `KlattVoiceRequestBus`

Notification Bus: `KlattVoiceNotificationBus`

Data Types

KlattVoiceParams

KlattVoiceProfile

KlattVoicePreset

KlattSpatialConfig

KlattPhonemeMap

PhonemeOverride

Enumerations

KlattWaveform

BasePhonemeMap

KTT Voice Tags

`speed=X`

`decl=X` / `declination=X`

`waveform="TYPE"`

`vowel=X`

`accent=X`

`pitch=X`

`pause=X`

Combined Example

See Also

Get GS_Audio

Klatt Voice Synthesis

Contents

Components

KlattVoiceSystemComponent

KlattVoiceComponent

API Reference

Request Bus: KlattVoiceSystemRequestBus

Request Bus: KlattVoiceRequestBus

Notification Bus: KlattVoiceNotificationBus

Data Types

KlattVoiceParams

KlattVoiceProfile

KlattVoicePreset

KlattSpatialConfig

KlattPhonemeMap

PhonemeOverride

Enumerations

KlattWaveform

BasePhonemeMap

KTT Voice Tags

speed=X

decl=X / declination=X

waveform="TYPE"

vowel=X

accent=X

pitch=X

pause=X

Combined Example

See Also

Get GS_Audio

Request Bus: `KlattVoiceSystemRequestBus`

Request Bus: `KlattVoiceRequestBus`

Notification Bus: `KlattVoiceNotificationBus`

`speed=X`

`decl=X` / `declination=X`

`waveform="TYPE"`

`vowel=X`

`accent=X`

`pitch=X`

`pause=X`