Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

CS · 17 ch

Chapter 04: Fully Local Open-Source TTS — Landscape & Scorecard

Chapter 4 of 17 · ~17 min read

Overview

This chapter is the map of the open-source, fully-local TTS landscape as it stands in mid-2026. "Fully local" means the model weights run on your own machine (CPU or GPU) or your own Docker/server — no per-character cloud bill, no data leaving your infrastructure, and no rate limits beyond your hardware.

For the reader's goal — reading their own English and Czech texts aloud, naturally, for free or near-free — the single most important filter is Czech support. Many of the most celebrated 2024–2026 open models (Kokoro, Chatterbox, Orpheus, Parler, StyleTTS2, Bark) are English-first and either do not support Czech at all or only "support" it via low-quality phoneme fallbacks. So this chapter grades Czech explicitly and honestly, separately from English, for every model.

The goal here is breadth and triage: a survey of every serious option plus a master scorecard. Chapters 4a and 4b go deep on the winners (the Kokoro/Piper class of lightweight models, and the XTTS/F5 class of clone-capable models). Read this chapter to understand which tools are worth your time and which are dead ends for a Czech+English use case.

Content

What "fully local" buys you (and what it costs)

Benefit	Cost
Zero marginal cost ($0 per character forever)	One-time setup + your own hardware
No data leaves your machine (privacy)	You own the ops (Docker, updates, drivers)
No rate limits, no vendor lock-in	Quality ceiling is lower than the top cloud models for some languages
Works offline	Czech quality is uneven across local models

The trade-off that matters most for this reader: the very best Czech naturalness in 2026 still lives in cloud APIs (covered in later chapters). But several local models are good enough for reading texts aloud in Czech, and they are genuinely free. The realistic local shortlist for Czech is short: Piper (cs_CZ/jirka), XTTS-v2 (native cs), and F5-TTS via a Czech fine-tune. Everything else is English-only or English-first.

The models, one by one

Model	Quality (EN)	Czech	English	License (comm.)	Hardware ease	Ease of use	Clone	Notes
Kokoro-82M	5	1 (none)	5	5 (Apache-2.0)	5	4	No	Best tiny EN; no Czech
Piper	3.5	3.5 (cs_CZ jirka)	4	4 (MIT/GPL*)	5	5	No	Best easy local Czech
XTTS-v2	4.5	4 (native cs)	4.5	2 (CPML non-comm)	3	4	Yes	Best local Czech cloning, non-comm
F5-TTS	5	2 (community FT only)	5	2 (CC-BY-NC)	2.5	3	Yes	Top EN; Czech experimental
Chatterbox	5	2.5 (community FT)	5	5 (MIT)	3	4	Yes	Best MIT clone; Czech via community fine-tune
Orpheus-3B	4.5	1 (none)	4.5	5 (Apache-2.0)	2	3.5	Yes	Emotive EN; no Czech; heavy
MeloTTS	3.5	1 (none)	4	5 (MIT)	5	4.5	No	Light EN/CJK; no Czech
StyleTTS 2	4.5	1 (none)	4.5	5 (MIT)	3	2.5	Yes	Kokoro's base; EN only
Bark	3	1 (none)	3.5	5 (MIT)	2	3	Semi	Expressive but unstable
Tortoise	4	1 (none)	4	5 (Apache-2.0)	1.5	2	Yes	Great but very slow; obsolete
eSpeak-NG	1.5	2 (robotic)	2	4 (GPLv3)	5	5	No	Baseline floor + G2P helper
Mimic3	2.5	2 (limited)	3	3 (AGPLv3)	5	3	No	Unmaintained; use Piper

# Pseudocode: route by detected language, free local engines only
from langdetect import detect  # pip install langdetect

def synthesize(text: str, out_path: str):
    lang = detect(text)              # e.g. 'cs' or 'en'
    if lang == "cs":
        piper_say(text, voice="cs_CZ-jirka-medium", out=out_path)
        # or: xtts_say(text, language="cs", speaker_wav="ref.wav", out=out_path)
    else:
        kokoro_say(text, voice="af_heart", out=out_path)

# Piper Czech, fully offline, no GPU:
echo "Dobrý den, toto je test české syntézy řeči." \
  | piper --model cs_CZ-jirka-medium.onnx --output_file cz.wav

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 04: Fully Local Open-Source TTS — Landscape & Scorecard

Overview

Content

What "fully local" buys you (and what it costs)

The models, one by one

Kokoro (Kokoro-82M)

Piper

Coqui XTTS-v2

F5-TTS

Chatterbox (Resemble AI)

Orpheus-TTS (Canopy Labs)

MeloTTS (MyShell.ai)

StyleTTS 2

Bark (Suno)

Tortoise-TTS

eSpeak-NG (baseline)

Mimic3 (Mycroft)

SpeechT5 Czech (fav-kky/SpeechT5-base-cs-tts)

Honorable mentions / newer 2025–2026 releases

Master scorecard

Reading the scorecard for this reader (Czech + English, free/near-free)

Key Takeaways

Key References