Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

CS · 17 ch

Chapter 05b: Local Models — Best for Czech

Chapter 6 of 17 · ~15 min read

Overview

This chapter is the practical, Czech-focused companion to the general "local models" survey. The reader's hard filter is that Czech must sound natural — and this is where most of the celebrated 2025–2026 local TTS models fall away. Kokoro, MeloTTS, Dia, Sesame CSM, Chatterbox: all impressive, almost all English-first, none support Czech. So the field of genuinely usable, free, local Czech TTS narrows to a short list, and the quality ranges from "robotic but works" to "good with effort."

We cover, in order of practical usefulness for Czech:

Piper (cs_CZ-jirka) — the lightweight, CPU-friendly default; one real Czech voice, two quality tiers, with known pronunciation quirks.
XTTS-v2 (via the maintained coqui-tts fork) — the best natural local Czech option, gets there through voice cloning from a Czech sample; needs a GPU.
Meta MMS-TTS (facebook/mms-tts-ces) — a VITS single-speaker Czech model, trivially installable via transformers, mediocre but serviceable.
eSpeak-NG (cs) — the robotic formant baseline; almost never the right answer for "natural," but it underpins everything else as a phonemizer.
What does NOT work for Czech — so you don't waste a weekend (Kokoro, MeloTTS, Fish/OpenAudio, etc.).

For each: how to install, how to get and point at the Czech voice, copy-pastable code, an honest Czech-specific quality grade, and hardware needs. We end with a concrete recommended local Czech setup.

Honesty up front: there is no free, local, English-and-Czech, effortless, ElevenLabs-grade option in 2026. The realistic sweet spot for natural Czech locally is XTTS-v2 with a cloned Czech reference voice on a GPU, with Piper cs_CZ-jirka-medium as the lightweight CPU fallback. Everything else is a compromise on naturalness, on Czech specifically, or on the "local" constraint.

Content

Why Czech is the hard part

Czech has features that punish under-trained TTS models:

Voice	Quality tier	Sample rate	File size (approx)	Notes
`cs_CZ-jirka-low`	low	16 kHz	~20 MB	Fastest, noticeably rougher
`cs_CZ-jirka-medium`	medium	22.05 kHz	~60 MB	Use this one — best stock Czech

python -m venv .venv && source .venv/bin/activate
pip install piper-tts

# Download the Czech voice straight from Hugging Face
python -m piper.download_voices cs_CZ-jirka-medium
# (older CLI: python -m piper.download cs_CZ-jirka-medium)

# piper_cs.py — read Czech text to a WAV file, CPU only
import wave
from piper import PiperVoice

voice = PiperVoice.load("cs_CZ-jirka-medium.onnx")  # picks up the .json automatically

text = "Dobrý den. Toto je ukázka českého hlasu generovaného lokálně na vašem počítači."
with wave.open("out_cs.wav", "wb") as wav:
    voice.synthesize_wav(text, wav)
print("Wrote out_cs.wav")

echo "Dnes je krásný den a chci si přečíst svůj text nahlas." \
  | piper -m cs_CZ-jirka-medium.onnx -f out.wav

pip install coqui-tts        # the maintained idiap fork
# model auto-downloads on first use (~1.8 GB) to your HF cache

# xtts_cs.py — natural Czech via voice cloning
import torch
from TTS.api import TTS   # package is coqui-tts, import path is still TTS

device = "cuda" if torch.cuda.is_available() else "cpu"
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

tts.tts_to_file(
    text="Dobrý den, vítejte. Tohle je text předčítaný přirozeným českým hlasem.",
    file_path="out_cs_xtts.wav",
    speaker_wav="czech_reference.wav",   # 6–20 s of a CZECH speaker
    language="cs",                       # <-- Czech
)

Mode	VRAM / RAM	Speed
GPU inference (recommended)	~4 GB min, 6–8 GB comfortable (RTX 3060+)	near real-time or faster; streaming supported
CPU inference	8 GB+ RAM	works but slower than real-time — fine for batch "render the article to MP3," painful for interactive
Fine-tuning	12 GB+ VRAM	hours, plus a dataset

pip install transformers torch scipy

# mms_cs.py — Czech TTS via Hugging Face transformers
import torch, scipy.io.wavfile
from transformers import VitsModel, AutoTokenizer

model = VitsModel.from_pretrained("facebook/mms-tts-ces")
tok = AutoTokenizer.from_pretrained("facebook/mms-tts-ces")

inputs = tok("Toto je český hlas z modelu Meta MMS.", return_tensors="pt")
with torch.no_grad():
    wave = model(**inputs).waveform

scipy.io.wavfile.write("out_mms_cs.wav",
                       rate=model.config.sampling_rate,
                       data=wave.squeeze().cpu().numpy())

# Debian/Ubuntu
sudo apt-get install espeak-ng
# macOS
brew install espeak-ng

espeak-ng -v cs "Toto je robotický český hlas z programu eSpeak NG." -w out_espeak_cs.wav

Model	English quality	Czech support	Verdict for this reader
Kokoro (82M)	Excellent, very efficient	No Czech (EN, plus some zh/ja/fr/es/hi/it/pt)	Skip for Czech
MeloTTS	Very good	No Czech (EN/ES/FR/ZH/JA/KO)	Skip for Czech
Fish Speech / OpenAudio S1	Excellent, cloning	Czech not officially listed / unconfirmed — focuses on high-resource langs	Don't rely on it for Czech
Dia, Sesame CSM, Chatterbox, Parler	Strong (English)	English-only / no verified Czech	Skip for Czech
Piper `cs_CZ-jirka`	(EN voices excellent)	Yes — real Czech voice	Use (lightweight)
XTTS-v2	Very good	Yes — `cs` + cloning	Use (best natural)
MMS-TTS `ces`	n/a	Yes — single-speaker VITS	Backup/baseline
eSpeak-NG `cs`	n/a	Yes — but robotic	Phonemizer / last resort

OHF-Voice. piper1-gpl (GitHub). 2025–2026. https://github.com/OHF-Voice/piper1-gpl — current maintained Piper runtime, install (pip install piper-tts), Python API, and VOICES.md.
Rhasspy. piper-voices (Hugging Face). 2024–2026. https://huggingface.co/rhasspy/piper-voices — official voice library; Czech voice at cs/cs_CZ/jirka/{low,medium} (.onnx + .onnx.json).
Rhasspy. cs_CZ-jirka-medium voice — Discussion #487 (GitHub). 2024. https://github.com/rhasspy/piper/discussions/487 — origin/notes on the community-contributed Czech voice.
sptsfn. piper-czech-tts (GitHub). 2024–2025. https://github.com/sptsfn/piper-czech-tts — project that fixes cs_CZ-jirka mispronunciations and reads PDFs; evidence of the base voice's Czech G2P limits.
Coqui / idiap. coqui-ai-TTS — maintained fork (GitHub). 2024–2026. https://github.com/idiap/coqui-ai-TTS — pip install coqui-tts; XTTS-v2 usage, languages, and training recipes.
Coqui. XTTS-v2 (Hugging Face). 2023–2026. https://huggingface.co/coqui/XTTS-v2 — model card listing 17 languages incl. Czech (cs), ~6s cloning, cross-language synthesis.
Meta AI. facebook/mms-tts-ces (Hugging Face). 2023–2026. https://huggingface.co/facebook/mms-tts-ces — Czech VITS single-speaker model from the MMS project; transformers VitsModel usage.
Hugging Face. MMS / VITS model docs (Transformers). 2024–2026. https://huggingface.co/docs/transformers/model_doc/vits — how to run VITS/MMS TTS in transformers, including the stochastic (non-deterministic) duration predictor.
eSpeak-NG. espeak-ng (GitHub). 2024–2026. https://github.com/espeak-ng/espeak-ng — formant synthesizer with Czech (cs) support; widely used as the G2P/phonemizer backbone for neural TTS incl. Piper.
Hexgrad. Kokoro-82M (Hugging Face). 2025. https://huggingface.co/hexgrad/Kokoro-82M — supported-language list (no Czech), confirming exclusion for this reader's filter.
MyShell. MeloTTS (GitHub). 2024–2025. https://github.com/myshell-ai/MeloTTS — supported languages (EN/ES/FR/ZH/JA/KO), confirming no Czech.
FishAudio. fish-speech / OpenAudio (GitHub). 2025–2026. https://github.com/fishaudio/fish-speech — language coverage; Czech not officially listed/confirmed.

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 05b: Local Models — Best for Czech

Overview

Content

Why Czech is the hard part

1. Piper — the lightweight Czech default

The Czech voice: `cs_CZ-jirka`

Install and run (Python)

Czech quality grade: C+ / B− (intelligible, clearly synthetic)

2. XTTS-v2 — the best natural local Czech (via cloning)

Run XTTS-v2 in Czech with a cloned voice

How to get a good Czech result — the cloning recipe (this is the important part)

Czech quality grade: B / B+ with a good Czech reference; B− without

3. Meta MMS-TTS — the trivially-installable Czech VITS

Czech quality grade: C / C+ (functional, single voice, plain)

4. eSpeak-NG — the robotic Czech baseline (and the phonemizer underneath everything)

Czech quality grade: D (intelligible, very robotic)

5. What does NOT work for Czech (so you don't waste time)

Recommended local Czech setup

Key Takeaways

Key References

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 05b: Local Models — Best for Czech

Overview

Content

Why Czech is the hard part

1. Piper — the lightweight Czech default

The Czech voice: cs_CZ-jirka

Install and run (Python)

Czech quality grade: C+ / B− (intelligible, clearly synthetic)

2. XTTS-v2 — the best natural local Czech (via cloning)

Run XTTS-v2 in Czech with a cloned voice

How to get a good Czech result — the cloning recipe (this is the important part)

Czech quality grade: B / B+ with a good Czech reference; B− without

3. Meta MMS-TTS — the trivially-installable Czech VITS

Czech quality grade: C / C+ (functional, single voice, plain)

4. eSpeak-NG — the robotic Czech baseline (and the phonemizer underneath everything)

Czech quality grade: D (intelligible, very robotic)

5. What does NOT work for Czech (so you don't waste time)

Recommended local Czech setup

Key Takeaways

Key References

The Czech voice: `cs_CZ-jirka`