Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

CS · 17 ch

Chapter 15: Master Scorecard & Decision Tree

Chapter 16 of 17 · ~15 min read

Overview

This is the "just tell me what to use" chapter. Everything covered earlier in the report — local/offline models, self-hosted projects, cloud APIs, and the browser — is consolidated here into one master scorecard, graded explicitly on the two hard filters that matter for this project: natural-sounding English AND natural-sounding Czech, plus cost, hosting model, voice cloning, streaming, licensing, and setup effort.

Then comes a decision tree that routes you to a concrete pick based on your real constraints (budget, whether Czech is critical, whether you can self-host, whether you need cloning, and how tight your latency budget is), followed by the headline recommendation restated for your exact situation: a developer adding TTS to their own software, who wants the most natural English and Czech that stays free or near-free.

A recurring theme from the research that you must keep in mind while reading this scorecard: Czech is the filter that eliminates most of the "best" English models. Many of the highest-MOS open models (Kokoro, Chatterbox, Sesame CSM, Orpheus in its base form) ship with no Czech at all, or with Czech that is unverified/poor. Grade every row on the Czech column first.

Content

How to read the grades

Quality grades are coarse on purpose — published MOS numbers are not comparable across vendors and almost none of them are measured on Czech. The grades below are an evidence-tiered synthesis of the per-option chapters:

A = consistently natural, near-human, low artifacts in normal use
B = clearly good / usable for production, occasional prosody or pronunciation slips
C = intelligible and acceptable but obviously synthetic / robotic or dated
D / — = poor, or not supported at all (— means "no Czech model exists")

"Cost" is the realistic cost for this project (reading your own texts), not a theoretical list price.

Master scorecard — every option, graded

Option	Type	EN quality	CZ quality	Realistic cost	Hosting	Voice clone

Microsoft `edge-tts`	Cloud (unofficial)	A−	A−	Free (no key)	Their server, free Python lib	No	Yes (chunked)	Gray area — Edge "Read aloud" backend, no commercial ToS	Very low	The free sweet spot for EN+CZ if you accept the gray-area dependency
Azure AI Speech (Neural)	Cloud API	A	A	Free 500k chars/mo, then ~$15–16/M (Neural)	Microsoft cloud	Custom Neural Voice (gated, paid)	Yes (real-time + SDK)	Clear commercial ToS	Low–med	Production EN+CZ with a license you can stand behind
Google Cloud TTS	Cloud API	A	A−	Free 1M (Std) / lower free tier for premium; Neural2 ~$16/M, Chirp 3 HD priced higher	Google cloud	Instant Custom Voice (gated)	Yes	Clear commercial ToS	Low–med	Generous always-free Czech via Standard/WaveNet; scale up later
OpenAI `gpt-4o-mini-tts`	Cloud API	A	B+ (multilingual, CZ usable but not native-tuned)	~$0.015/min audio (token-based, est.)	OpenAI cloud	No (preset + steerable style)	Yes	Clear commercial ToS	Very low	Easiest API; great EN, decent CZ; steerable tone
ElevenLabs	Cloud API	A	A− (Multilingual v2 / v3 supports Czech)	Free 10k chars/mo (non-commercial); paid tiers from ~$5/mo	ElevenLabs cloud	Yes, best-in-class	Yes (WebSocket)	Free tier non-commercial; paid = commercial	Low	Highest naturalness + cloning if budget allows; CZ is good
Kokoro-82M	Local model	A−	— (no Czech)	Free	Local CPU/GPU	No	Partial (short chunks)	Apache-2.0 (very permissive)	Med	English-only projects; fails the Czech filter
XTTS v2 (Coqui / forks)	Local / self-host	B+	B (Czech is a supported language)	Free	Local GPU (or CPU slow)	Yes (3–6s sample)	Yes	CPML — non-commercial (original weights); check fork terms	Med–high	Local Czech voice cloning for personal/non-commercial use
Piper	Local model	B	B− (Czech voices exist, dated)	Free	Local CPU (tiny)	No	Yes (fast)	MIT (permissive)	Low–med	Fully offline, low-resource, embedded; CZ acceptable not great
Coqui VITS / other CZ checkpoints	Local model	B−	B−	Free	Local	No	Yes	Varies (often MPL/Apache)	Med–high	Tinkerers wanting fully-open Czech weights
Fish Speech / OpenAudio	Local / self-host	A−	B (multilingual incl. CZ claimed)	Free (self-host)	Local GPU	Yes	Yes	Check current license (has shifted)	High	Self-hosted multilingual cloning if you verify Czech yourself
Orpheus / Sesame CSM / Chatterbox	Local model	A	— (English-first; CZ absent/unverified)	Free	Local GPU	Some (Chatterbox yes)	Yes	Apache/MIT (varies)	High	Best-sounding English locally; fails Czech filter
Web Speech API (`speechSynthesis`)	Browser built-in	C–B (OS-dependent)	C (OS-dependent, often poor)	Free	Client device	No	Yes (native)	N/A (browser API)	Trivial	Zero-cost accessibility fallback; CZ inconsistent

Rank	Option	Why it wins	The catch
1	`edge-tts`	Truly free, no key, Azure-grade EN+CZ neural voices, trivial setup	Unofficial/gray-area, no SLA, no commercial license
2	Azure AI Speech	Same voices as #1 but licensed; 500k chars/mo free	Needs an Azure account/key; paid beyond free tier
3	Google Cloud TTS	Large always-free tier including Czech; clean ToS	Premium Czech (Chirp/Neural2) costs money
4	Piper (local)	100% offline, free forever, no quotas, CZ voices exist	Czech voices are dated/robotic vs cloud neural
5	XTTS v2 (local)	Free local Czech voice cloning	Non-commercial license; needs a GPU to be fast

START: I need TTS that reads my texts in natural English AND Czech.
│
├─ Q1: Is hard offline / data-never-leaves-my-box a requirement?
│   ├─ YES ─────────────────────────────────────────────────────────────────┐
│   │   ├─ Need voice CLONING (your own/a custom voice)?                      │
│   │   │   ├─ YES → XTTS v2 (Czech + cloning). NOTE: weights are            │
│   │   │   │        NON-COMMERCIAL (CPML). OK for personal/internal only.   │
│   │   │   │        Needs a GPU for snappy latency.                         │
│   │   │   └─ NO  → Piper (MIT, runs on a CPU/Raspberry-Pi-class box).      │
│   │   │            Czech voices exist but sound dated — acceptable, not    │
│   │   │            cloud-grade. Best fully-free offline pick.              │
│   │   └─ (If English-only after all → Kokoro-82M, Apache-2.0, best        │
│   │        open English — but it has NO Czech.)                            │
│   │                                                                        ┘
│   └─ NO (cloud is fine) ↓
│
├─ Q2: Is this commercial / shipped to real users (needs a clean license)?
│   ├─ YES ↓
│   │   ├─ Q2a: Top-tier naturalness + want cloning, budget OK?
│   │   │   └─ ElevenLabs (best naturalness + cloning; Czech is good;
│   │   │      paid tier = commercial license).
│   │   ├─ Q2b: Want the most defensible, enterprise-clean EN+CZ?
│   │   │   └─ Azure AI Speech (cs-CZ-Vlasta/Antonin neural, clear ToS,
│   │   │      500k chars/mo free). RECOMMENDED commercial default.
│   │   └─ Q2c: Want generous always-free + Google ecosystem?
│   │       └─ Google Cloud TTS (free Czech via Standard/WaveNet;
│   │          premium tiers cost more — verify Czech on the tier).
│   └─ NO (personal project / internal tool / prototype) ↓
│
├─ Q3: Want the absolute easiest free path that sounds great in EN+CZ?
│   └─ YES → edge-tts. Free, no key, Azure-grade Czech voices,
│            ~5 lines of Python. THE sweet-spot pick for this project.
│            (Gray-area/no-SLA — keep Azure as the drop-in fallback.)
│
└─ Q4: Pure browser/client, zero backend, accessibility-grade is enough?
    └─ YES → Web Speech API (speechSynthesis). Free, trivial, but Czech
             quality depends on the user's OS and is often weak. Use as a
             fallback, not the primary voice.

pip install edge-tts
# List Czech voices:
edge-tts --list-voices | grep cs-CZ
# Synthesize Czech:
edge-tts --voice cs-CZ-VlastaNeural \
  --text "Dobrý den, toto je test české syntézy řeči." \
  --write-media cz.mp3
# Synthesize English:
edge-tts --voice en-US-AriaNeural \
  --text "Hello, this is an English test." \
  --write-media en.mp3

import asyncio, edge_tts

async def speak(text, voice, out):
    await edge_tts.Communicate(text, voice).save(out)

asyncio.run(speak("Ahoj světe!", "cs-CZ-AntoninNeural", "hello_cs.mp3"))
asyncio.run(speak("Hello world!", "en-GB-RyanNeural", "hello_en.mp3"))

# Download a Czech voice model (.onnx + .onnx.json) from the Piper voices repo,
# then:
echo "Toto je test offline syntézy." | \
  piper --model cs_CZ-*.onnx --output_file cz_offline.wav

Microsoft / rany2. edge-tts (GitHub). 2025. https://github.com/rany2/edge-tts — Free Python library exposing Edge "Read aloud" neural voices including Czech cs-CZ-VlastaNeural/cs-CZ-AntoninNeural; documents usage and voice listing.
Microsoft. Azure AI Speech — Text to Speech voices & pricing. 2026. https://learn.microsoft.com/azure/ai-services/speech-service/language-support — Lists Czech neural voices and the licensed commercial terms / free tier that make Azure the licensed twin of edge-tts.
Microsoft. Azure AI Speech pricing. 2026. https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/ — Free monthly character allotment and per-million neural pricing used in the cost table.
Google Cloud. Cloud Text-to-Speech — pricing & supported voices. 2026. https://cloud.google.com/text-to-speech/pricing — Always-free Standard/WaveNet allotment, premium tier pricing, and Czech voice availability.
OpenAI. Text to speech — gpt-4o-mini-tts (API docs & pricing). 2025–2026. https://platform.openai.com/docs/guides/text-to-speech — Steerable multilingual TTS model, audio-per-minute pricing referenced in the cost section.
ElevenLabs. Pricing & supported languages. 2026. https://elevenlabs.io/pricing — Free 10k-char non-commercial tier, paid commercial tiers, Multilingual v2/v3 Czech support, and instant voice cloning.
hexgrad. Kokoro-82M (Hugging Face model card). 2025. https://huggingface.co/hexgrad/Kokoro-82M — Apache-2.0 English-quality open model; language list (no Czech) confirming it fails the Czech filter.
Coqui / idiap. XTTS v2 model card & Coqui TTS (GitHub). 2024–2025. https://huggingface.co/coqui/XTTS-v2 — Multilingual (incl. Czech) zero-shot voice cloning; Coqui Public Model License (non-commercial) terms.
rhasspy / OHF-Voice. Piper TTS (GitHub) & voice samples. 2025. https://github.com/OHF-Voice/piper1-gpl — MIT-licensed offline neural TTS with downloadable Czech voices; CPU-friendly.
MDN Web Docs. SpeechSynthesis — Web Speech API. 2025. https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis — Browser built-in TTS, OS-dependent voice availability/quality (Czech often weak), zero-cost fallback.

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 15: Master Scorecard & Decision Tree

Overview

Content

How to read the grades

Master scorecard — every option, graded

Free-or-near-free shortlist (the sweet spot)

Cost reality check

Decision tree

The same logic as quick "if… then" rules

Why the "best English" models keep losing here

Minimal proof-of-concept snippets

Final verdict for this reader

Key Takeaways

Key References