Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

CS · 17 ch

Chapter 07: Cloud API TTS Services Compared

Chapter 8 of 17 · ~17 min read

Overview

This chapter compares the major commercial cloud text-to-speech (TTS) APIs you can call from your own software to read your texts aloud in English and Czech. The reader's hard filter is that Czech (cs-CZ) must sound genuinely natural — and this is exactly where most "best" TTS models fall down, because the headline-grabbing systems (Deepgram Aura, Cartesia Sonic in practice, PlayHT's flagship) are overwhelmingly English-first.

For each provider we grade five things explicitly:

English naturalness (reputation/benchmarks)
Czech support — does cs-CZ exist, in which voice tier, and how good is it (this is graded honestly and separately)
Voice cloning availability
Streaming / latency
Free tier + 2026 pricing per character or per minute

The bottom line up front: for the free-or-near-free + natural Czech sweet spot, the realistic cloud contenders are Google Cloud TTS (Chirp 3 HD), Microsoft Azure Neural, and ElevenLabs (Multilingual v2 / Turbo v2.5 / Flash v2.5), with Amazon Polly as a budget fallback and OpenAI as a wildcard (great quality, undocumented/inconsistent Czech, no cs-CZ voice picker). Deepgram, Cartesia, and PlayHT's flagship models are essentially disqualified for Czech as of mid-2026.

Pricing and language tables below were verified live against vendor pages on 2026-05-31. TTS pricing and free tiers change often — treat all numbers as "verified at writing time, re-check the linked pricing page before you commit."

Content

How cloud TTS is billed (so the table makes sense)

Almost every cloud TTS API bills by characters of input text (not audio minutes), usually quoted per 1 million characters. Rough conversions to keep in your head:

1 million characters ≈ 160,000–180,000 English words ≈ roughly of speech.

Provider / model	English naturalness	Czech (cs-CZ)	Voice cloning	Streaming / latency	Free tier	Price /1M chars (paid)
ElevenLabs (Multilingual v2 / Turbo / Flash)	Top-tier	✅ Best cloud Czech; v2/Turbo/Flash all cover cs	✅ Instant + Professional	✅ Flash ~75 ms	~10k credits/mo, non-commercial	~$150–300 (credit-based, priciest)
Google Cloud TTS (Chirp 3 HD / Neural2)	Excellent (Chirp 3 HD)	✅ Native, incl. Chirp 3 HD	🟡 Instant Custom Voice (gated)	✅ Streaming, low latency	1M chars/mo (WaveNet/Neural2/Chirp); 4M (Standard)	Std ~$4 · Neural2/WaveNet ~$16 · Chirp 3 HD ~$30 · Studio ~$160
Azure AI Speech (Neural / HD)	Excellent	✅ Native: Antonin, Vlasta	✅ Custom Neural Voice (gated)	✅ Real-time SDK	0.5M chars/mo	Neural ~$15 · HD ~$30
OpenAI (gpt-4o-mini-tts / tts-1-hd)	Excellent, steerable	⚠️ Unofficial, accent-prone, no cs picker	❌	✅ Streaming/Realtime	None (signup credit only)	tts-1 ~$15 · tts-1-hd ~$30 · mini-tts ~$0.015/min
Amazon Polly (Neural / Generative)	Good–excellent (Generative)	🟡 Jitka, standard-engine only	❌	✅	12-mo: 1M neural, 5M std	Std ~$4 · Neural ~$16 · Generative ~$30 · Long-form ~$100
PlayHT (Play 3.0 / PlayDialog)	Excellent (English)	🟡 Legacy voices only, utility-grade	✅	✅ Low latency	Limited trial	~$10–30 (varies)
Deepgram Aura-2	Good (agents)	❌ English-only (+some Spanish)	❌	✅ Very low latency	Trial credits	~$30
Cartesia Sonic-3	Excellent, lowest latency	⚠️ Not confirmed — verify	✅ Instant	✅ Best-in-class latency	Limited free credits	~$15–40

# pip install google-cloud-texttospeech ; set GOOGLE_APPLICATION_CREDENTIALS
from google.cloud import texttospeech as tts

client = tts.TextToSpeechClient()
resp = client.synthesize_speech(
    input=tts.SynthesisInput(text="Dobrý den, toto je test českého hlasu."),
    voice=tts.VoiceSelectionParams(
        language_code="cs-CZ",
        name="cs-CZ-Chirp3-HD-Charon",  # verify exact Chirp 3 HD cs-CZ voice name in docs
    ),
    audio_config=tts.AudioConfig(audio_encoding=tts.AudioEncoding.MP3),
)
open("out.mp3", "wb").write(resp.audio_content)

# pip install azure-cognitiveservices-speech
import azure.cognitiveservices.speech as speechsdk

cfg = speechsdk.SpeechConfig(subscription="YOUR_KEY", region="westeurope")
cfg.speech_synthesis_voice_name = "cs-CZ-VlastaNeural"  # or cs-CZ-AntoninNeural
synth = speechsdk.SpeechSynthesizer(speech_config=cfg)
synth.speak_text_async("Dobrý den, toto je test českého hlasu.").get()

# pip install elevenlabs
from elevenlabs.client import ElevenLabs
from elevenlabs import save

client = ElevenLabs(api_key="YOUR_KEY")
audio = client.text_to_speech.convert(
    voice_id="VOICE_ID",                 # pick a multilingual voice from your library
    model_id="eleven_multilingual_v2",   # or eleven_turbo_v2_5 / eleven_flash_v2_5 (0.5 credit/char)
    text="Dobrý den, toto je test českého hlasu.",
    output_format="mp3_44100_128",
)
save(audio, "out.mp3")

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 07: Cloud API TTS Services Compared

Overview

Content

How cloud TTS is billed (so the table makes sense)

Provider-by-provider

1. ElevenLabs — best naturalness, genuinely good Czech, but the priciest

2. OpenAI TTS (`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`) — great voices, murky Czech

3. Google Cloud Text-to-Speech — the strongest "natural Czech + near-free" pick

4. Microsoft Azure AI Speech — best breadth of Czech voices + strong free tier

5. Amazon Polly — cheapest, but Czech is older/standard-only

6. PlayHT (play.ht) — flagship is English-first; Czech via legacy voices only

7. Deepgram Aura / Aura-2 — fast and cheap, but English-only (and a bit of Spanish)

8. Cartesia (Sonic / Sonic-2 / Sonic-3) — ultra-low latency, multilingual, but Czech unconfirmed

Notable others (brief)

Big comparison table

Which are actually best for Czech?

Minimal API snippets for the top 3

Key Takeaways

Key References

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 07: Cloud API TTS Services Compared

Overview

Content

How cloud TTS is billed (so the table makes sense)

Provider-by-provider

1. ElevenLabs — best naturalness, genuinely good Czech, but the priciest

2. OpenAI TTS (tts-1, tts-1-hd, gpt-4o-mini-tts) — great voices, murky Czech

3. Google Cloud Text-to-Speech — the strongest "natural Czech + near-free" pick

4. Microsoft Azure AI Speech — best breadth of Czech voices + strong free tier

5. Amazon Polly — cheapest, but Czech is older/standard-only

6. PlayHT (play.ht) — flagship is English-first; Czech via legacy voices only

7. Deepgram Aura / Aura-2 — fast and cheap, but English-only (and a bit of Spanish)

8. Cartesia (Sonic / Sonic-2 / Sonic-3) — ultra-low latency, multilingual, but Czech unconfirmed

Notable others (brief)

Big comparison table

Which are actually best for Czech?

Minimal API snippets for the top 3

Key Takeaways

Key References

2. OpenAI TTS (`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`) — great voices, murky Czech