This chapter is the map of the open-source, fully-local TTS landscape as it stands in mid-2026. "Fully local" means the model weights run on your own machine (CPU or GPU) or your own Docker/server — no per-character cloud bill, no data leaving your infrastructure, and no rate limits beyond your hardware.
For the reader's goal — reading their own English and Czech texts aloud, naturally, for free or near-free — the single most important filter is Czech support. Many of the most celebrated 2024–2026 open models (Kokoro, Chatterbox, Orpheus, Parler, StyleTTS2, Bark) are English-first and either do not support Czech at all or only "support" it via low-quality phoneme fallbacks. So this chapter grades Czech explicitly and honestly, separately from English, for every model.
The goal here is breadth and triage: a survey of every serious option plus a master scorecard. Chapters 4a and 4b go deep on the winners (the Kokoro/Piper class of lightweight models, and the XTTS/F5 class of clone-capable models). Read this chapter to understand which tools are worth your time and which are dead ends for a Czech+English use case.
| Benefit | Cost |
|---|---|
| Zero marginal cost ($0 per character forever) | One-time setup + your own hardware |
| No data leaves your machine (privacy) | You own the ops (Docker, updates, drivers) |
| No rate limits, no vendor lock-in | Quality ceiling is lower than the top cloud models for some languages |
| Works offline | Czech quality is uneven across local models |
The trade-off that matters most for this reader: the very best Czech naturalness in 2026 still lives in cloud APIs (covered in later chapters). But several local models are good enough for reading texts aloud in Czech, and they are genuinely free. The realistic local shortlist for Czech is short: Piper (cs_CZ/jirka), XTTS-v2 (native cs), and F5-TTS via a Czech fine-tune. Everything else is English-only or English-first.
lang_codes and voice packs).kokoro-onnx, FastAPI servers) add chunked streaming.rhasspy/piper and newer OHF-Voice/piper1-gpl projects; note the GPL rename of the newer repo — check the exact repo you use). Voices are individually licensed (mostly permissive).cs_CZ voice (jirka, medium quality). It is the main, and effectively only, ready-made Czech voice. Quality is "clear and intelligible" rather than "indistinguishable from human," but it is real, offline, and free.idiap/coqui-ai-TTS) live on. If your project is non-commercial / personal, you're fine. If it's commercial, the weights license is a problem (the code is MPL/permissive, but the released weights are CPML).Thomcles/Chatterbox-TTS-Czech on Hugging Face) that adds Czech to the MIT-licensed Chatterbox — making it the only commercial-safe path to local Czech cloning, if its quality holds up. Treat it as promising-but-unvalidated and audition it yourself.<laugh>, <sigh> tags), conversational.The pattern is clear: the open-source frontier in 2025–2026 is racing on English (and Chinese) quality and on commercial-friendly licenses, not on Czech. Czech remains a niche that only a handful of models actually cover.
Ratings are 1–5 (5 = best) and are relative within the open-source local landscape, based on 2025–2026 evidence. "Czech" grades real, usable Czech quality — a low score means weak or absent. "License" grades commercial-friendliness (5 = fully permissive; low = non-commercial-only). "HW" grades how easy the hardware requirement is (5 = runs on a CPU/Pi; 1 = needs a beefy GPU).
| Model | Quality (EN) | Czech | English | License (comm.) | Hardware ease | Ease of use | Clone | Notes |
|---|---|---|---|---|---|---|---|---|
| Kokoro-82M | 5 | 1 (none) | 5 | 5 (Apache-2.0) | 5 | 4 | No | Best tiny EN; no Czech |
| Piper | 3.5 | 3.5 (cs_CZ jirka) | 4 | 4 (MIT/GPL*) | 5 | 5 | No | Best easy local Czech |
| XTTS-v2 | 4.5 | 4 (native cs) | 4.5 | 2 (CPML non-comm) | 3 | 4 | Yes | Best local Czech cloning, non-comm |
| F5-TTS | 5 | 2 (community FT only) | 5 | 2 (CC-BY-NC) | 2.5 | 3 | Yes | Top EN; Czech experimental |
| Chatterbox | 5 | 2.5 (community FT) | 5 | 5 (MIT) | 3 | 4 | Yes | Best MIT clone; Czech via community fine-tune |
| Orpheus-3B | 4.5 | 1 (none) | 4.5 | 5 (Apache-2.0) | 2 | 3.5 | Yes | Emotive EN; no Czech; heavy |
| MeloTTS | 3.5 | 1 (none) | 4 | 5 (MIT) | 5 | 4.5 | No | Light EN/CJK; no Czech |
| StyleTTS 2 | 4.5 | 1 (none) | 4.5 | 5 (MIT) | 3 | 2.5 | Yes | Kokoro's base; EN only |
| Bark | 3 | 1 (none) | 3.5 | 5 (MIT) | 2 | 3 | Semi | Expressive but unstable |
| Tortoise | 4 | 1 (none) | 4 | 5 (Apache-2.0) | 1.5 | 2 | Yes | Great but very slow; obsolete |
| eSpeak-NG | 1.5 | 2 (robotic) | 2 | 4 (GPLv3) | 5 | 5 | No | Baseline floor + G2P helper |
| Mimic3 | 2.5 | 2 (limited) | 3 | 3 (AGPLv3) | 5 | 3 | No | Unmaintained; use Piper |
* Piper: original rhasspy/piper is MIT; the newer OHF-Voice/piper1-gpl repo is GPL — check which you ship.
A short list of local models clears the Czech bar with real, usable quality. Two ship Czech in their base weights; the rest depend on community fine-tunes you must audition:
cs_CZ/jirka (low/medium). Slightly robotic but reliable. Best if you want zero friction and any hardware.Thomcles/Chatterbox-TTS-Czech fine-tune — the only MIT (commercial-safe) route to local Czech cloning. Community fine-tune, so validate quality first. Best if you need commercial use and Czech.For English, you are spoiled: Kokoro (tiny, Apache, runs on CPU) or Chatterbox (MIT, cloning, expressive) are both excellent.
A strong hybrid strategy falls out of this table: use a great English engine (Kokoro) for English text and a Czech-capable engine (Piper or XTTS) for Czech, routing by detected language. You get top English quality and real Czech, all free. Chapters 4a and 4b build exactly these pipelines.
A minimal language-router sketch:
# Pseudocode: route by detected language, free local engines only
from langdetect import detect # pip install langdetect
def synthesize(text: str, out_path: str):
lang = detect(text) # e.g. 'cs' or 'en'
if lang == "cs":
piper_say(text, voice="cs_CZ-jirka-medium", out=out_path)
# or: xtts_say(text, language="cs", speaker_wav="ref.wav", out=out_path)
else:
kokoro_say(text, voice="af_heart", out=out_path)
# Piper Czech, fully offline, no GPU:
echo "Dobrý den, toto je test české syntézy řeči." \
| piper --model cs_CZ-jirka-medium.onnx --output_file cz.wav
Thomcles/Chatterbox-TTS-Czech, etc.) that must be auditioned, and SpeechT5-base-cs-tts is a Czech-native base for training your own. Kokoro, Orpheus, MeloTTS, StyleTTS2, Bark, Tortoise, and Parler do not support Czech out of the box.cs_CZ/jirka) — MIT-ish, runs on a Raspberry Pi, slightly robotic but reliable.jirka medium-quality voice availability and quality tier.jirka.cs_CZ/jirka available in low and medium quality tiers.