Picking a TTS engine that sounds good is only half the job. Before you ship a feature that reads your users' (or your own) texts aloud, you have to answer two legal/operational questions that can quietly sink a project:
Licensing — can you legally use this in your product? Many of the best open models are released under permissive licenses (Apache-2.0, MIT) that allow commercial use without payment. But several headline models — XTTS, Tortoise, some voice-cloning tools — carry non-commercial or otherwise restricted terms, and at least one (Coqui's XTTS) is in a legal limbo because the company that owned it shut down. Using the wrong one in a commercial product is a real liability.
Cloud data handling — what does the provider do with the text and audio you send? For a Czech/EU user this is a GDPR question first and a "do they train on my data" question second. The good news: the major cloud TTS APIs are far more conservative with API data than their consumer chatbot cousins. The details (default retention windows, training opt-outs, EU data residency) differ enough to matter.
This chapter verifies each model's license individually, walks through voice-cloning consent and the EU AI Act, then builds a provider-by-provider data-handling table and recommends a privacy-safe default for an EU builder.
Disclaimer: This is engineering research, not legal advice. Licenses and privacy terms change; always read the actual
LICENSEfile and the provider's current DPA before you ship. Dates and figures below were verified in May 2026.
For a product, three buckets matter:
A subtle but critical point: the code license and the model-weights license can differ, and the per-voice license can differ again. Piper's code is one thing; each downloaded voice has its own license. Always check all three layers.
The table below reflects licenses verified against each project's repository/model card in May 2026.
| Model | Code license | Weights license | Commercial OK? | Czech quality | Notes |
|---|---|---|---|---|---|
| Kokoro-82M | Apache-2.0 | Apache-2.0 | ✅ Yes, cleanly | ❌ No native Czech | Cleanest commercial story of any open model; no attribution required. Czech not in supported language list. |
| Piper (rhasspy/piper) | MIT (original) | per-voice (mostly permissive CC/MIT) | ✅ Yes (check each voice) | ⚠️ Yes, Czech voices exist; quality modest | The original rhasspy/piper is MIT. |
| Piper (OHF-Voice/piper1-gpl) | GPL | per-voice | ⚠️ Yes, but GPL obligations | ⚠️ same voices | The maintained fork moved code to GPL. Run it as a separate service to avoid copyleft contaminating your app. |
| MeloTTS | MIT | MIT | ✅ Yes | ❌ No Czech | EN/ES/FR/ZH/JA/KO only. |
| Chatterbox (Resemble AI) | MIT | MIT | ✅ Yes | ⚠️ Multilingual variant claims ~23 langs; verify Czech | Embeds Perth watermark by default (see below). MIT is genuine. |
| XTTS-v2 (Coqui) | MPL-2.0 (code) | CPML (non-commercial) | ❌ No (legally murky) | ✅ Strong Czech, voice cloning | Coqui shut down Jan 2024 — no one left to sell a commercial license. Avoid for commercial products. |
| Tortoise-TTS | Apache-2.0 | Apache-2.0 | ✅ Yes | ❌ English-focused, no real Czech | Permissive, but slow and English-centric. |
| Bark (Suno) | MIT | MIT | ✅ Yes | ⚠️ Multilingual incl. some Czech, inconsistent | Originally shipped with NC framing; later clarified to MIT. Quality/stability are the real limits, not the license. |
XTTS-v2 is one of the best open models for Czech voice cloning — which makes it very tempting. But its model weights are under the Coqui Public Model License (CPML), which "allows only non-commercial use of a machine learning model and its outputs." Coqui historically sold a commercial license (a 2023 post referenced ~$365/year for sub-$1M-revenue companies), but Coqui AI shut down in January 2024. That leaves XTTS in a uniquely bad spot:
Practical verdict: XTTS-v2 is excellent for personal projects, internal experiments, and research — all clearly non-commercial. For anything you sell, embed in a paid product, or use to generate audio for a commercial service, treat XTTS as off-limits and reach for an Apache/MIT model or a properly-licensed cloud API. If your "product" is genuinely just you reading your own texts with no commercial distribution, you're inside the CPML's non-commercial scope — but the moment money or third-party users enter, you're outside it.
Piper is the workhorse for free, fully-local TTS, and it does have Czech voices (modest quality, but real). Watch the licensing fork:
rhasspy/piper — the original — is MIT. Permissive, embed-anywhere.OHF-Voice/piper1-gpl — the actively maintained successor — moved the code to GPL.GPL is not a blocker for a commercial product, but the safe integration pattern is to run Piper as a separate process or container and talk to it over a CLI pipe or local HTTP. That keeps GPL's copyleft from reaching into your proprietary application code (the "mere aggregation"/separate-process boundary). If you statically link or import GPL Piper directly into a closed-source binary you distribute, you've created a derivative work and triggered the copyleft obligation. Also: each voice you download has its own license — most Piper voices are permissive (CC0/CC-BY/MIT), but a few have attribution or non-commercial restrictions, so verify the specific voice's card.
Chatterbox is MIT-licensed and it embeds a Perth ("Perceptual Threshold") neural watermark into every generated clip by default. Resemble open-sourced the Perth watermarker too. The watermark is designed to be imperceptible and to survive MP3 compression and light editing. Because the watermarking runs client-side in your deployment and the code is open, a technically capable user could disable it — but doing so removes a provenance signal and works against the "responsible AI" posture that may soon be a legal expectation (see EU AI Act, below). For most builders the right move is to leave the watermark on: it's a feature, not a tax.
If your feature only reads text in a built-in/default voice, you can largely skip this section. If you let users clone a voice from a sample (XTTS, Chatterbox, F5-TTS, ElevenLabs Voice Clone, etc.), you've entered regulated territory.
Under the GDPR, a natural person's voice can constitute personal data, and a voiceprint used to identify someone can be biometric data (a "special category" under Art. 9 requiring stricter handling). Cloning an identifiable person's voice without a lawful basis (usually explicit consent) is a GDPR problem on top of any IP/publicity problem.
Many jurisdictions protect a person's voice and likeness from unauthorized commercial use:
The EU AI Act, Article 50, imposes transparency obligations on AI-generated/manipulated content ("deepfakes"). Relevant for a TTS feature:
This is why Chatterbox's default watermark and similar provenance features matter: they help you meet the machine-readable-marking expectation almost for free.
If you build voice cloning into your product:
For the reader's stated use case — reading their own texts in standard voices — you avoid almost all of this. The lowest-risk design is to use built-in/default voices and skip arbitrary cloning entirely. If you want a distinctive voice, clone your own voice (you are the consenter) or license a voice you're allowed to use.
A recurring myth: "if I send my text to a cloud API, they'll train on it / keep it forever." For the business/API products of the major vendors, that's largely false by default — the aggressive data-use terms you've read about usually apply to consumer products (free chatbots), not the paid API/cloud tiers. Still, the specifics differ, and for an EU user data residency is the deciding factor as much as retention.
| Provider | Default retention of input text/audio | Used to train models? | EU region / residency | GDPR posture | Notes |
|---|---|---|---|---|---|
| Google Cloud TTS | Not stored by default; data logging is opt-in only | No (Cloud DPA; not used to train without permission) | Yes — EU regions + Cloud Data Processing Addendum | Strong; DPA + EU residency | Cleanest default of the big three: text isn't retained unless you opt into the (free-tier-discount) logging program. |
| Microsoft Azure AI Speech | Synchronous TTS: not retained after processing; Custom Neural Voice data stays in your resource region | No customer-data training without consent | Yes — West/North Europe etc.; region = data location | Strong; DPA, broad compliance certs, customer-managed keys | Custom Neural Voice is gated (responsible-AI approval) — a plus for misuse prevention. |
| Amazon Polly | May process content to "provide and improve" services unless you opt out | Yes by default for service improvement — opt out via AWS Organizations AI services opt-out policy | Yes — eu-west-1 (Ireland), eu-central-1 (Frankfurt) | Strong after you set the opt-out; DPA available | The only major where you must actively opt out to stop content being used for service improvement. Set the org policy before going live. |
| OpenAI API (tts/gpt-4o-audio) | Up to 30 days for abuse monitoring, then deleted; ZDR available for approved use | No training on API data by default | Limited EU data-residency options (improving) | DPA available; ZDR for eligible accounts | Note ongoing litigation-driven preservation orders may affect deletion timing — verify current status. |
| ElevenLabs | History stored in account by default; Zero Retention Mode for enterprise | No training on customer content for paid/enterprise by default; free tier differs | EU data residency for enterprise | DPA + GDPR program | Best-in-class quality, but EU residency + zero-retention are enterprise-tier features, not free-tier. |
Two recurring gotchas:
Putting licensing and data-handling together for the reader's profile (EN+CZ, free/near-free, technical, EU, reading their own texts):
Tier 1 — Default: run it locally, no cloud at all. Use an Apache-2.0 / MIT model you self-host. For English, Kokoro-82M (Apache-2.0) is the cleanest commercial + privacy story in existence — permissive license, no attribution, no data leaves your box. For Czech, Kokoro has no native support, so fall back to Piper Czech voices (run the GPL fork as a separate Docker service to keep copyleft contained) for an automated default voice. This combination is free, commercially clean (mind the Piper per-voice licenses and the GPL process boundary), and has zero third-party data exposure.
Tier 2 — If you want better Czech and accept a cloud dependency. Use Google Cloud TTS (privacy-friendly defaults, strong Czech neural voices, EU regions, DPA) or Azure AI Speech (equally strong, more enterprise controls). Sign the DPA, pick an EU region, and you have a GDPR-defensible setup at near-free volumes.
Avoid for commercial use: XTTS-v2 (CPML non-commercial, orphaned license), and any voice with a non-commercial per-voice license. For cloning, only clone your own voice or a properly-consented/licensed one, disclose synthetic output, and keep watermarks on.
One-line rule of thumb: Local Apache/MIT model = no license fee + no privacy exposure. That's the sweet spot. Reach for a cloud API only when Czech quality demands it, and then prefer Google/Azure with an EU region + DPA.
piper1-gpl code is GPL — run it as a separate process/container to keep copyleft out of your proprietary code, and check each voice's individual license.rhasspy/piper).