Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

CS · 17 ch

Chapter 08: Cost Analysis — Free Tiers, Pricing & Local-vs-Cloud Break-Even

Chapter 9 of 17 · ~16 min read

Overview

This is the money chapter. You want the most natural-sounding English-and-Czech TTS that stays free or near-free, and "near-free" only means something once you can compare apples to apples. So this chapter does three things:

Normalizes every commercial price to two common units — $ per 1 million characters and $ per hour of audio — so a per-character cloud API, a per-credit voice service, and a per-minute audio model can sit in the same table.
Builds the real local/self-host cost model — not the marketing "$0," but the honest one that includes hardware amortization, serverless GPU $/hr, and electricity.
Computes break-even points at concrete monthly volumes (100k, 1M, 10M characters/month) and ends with a flat "if your volume is X, do Y" decision guide tuned to a "reading my own texts" workload.

A note on the conversion factor used throughout: spoken audio runs ~150 words/min, English averages ~5 characters/word including the trailing space, so ~750 characters ≈ 1 minute and ~45,000 characters ≈ 1 hour of audio (Voices.com / Speechify converters confirm ~750 chars/min at 150 WPM). Czech words are slightly longer on average but the rate of speech is comparable, so the same ~45k chars/hour rule is a fine planning approximation for both languages. Every "$/hour of audio" figure below is derived as ($/1M chars) × 0.045. (If your TTS reads faster — ~180 WPM ≈ 900 chars/min — costs per audio-hour drop proportionally; I use the conservative 750 figure.)

⚠️ Prices change. All figures are list prices verified against vendor pricing pages in May 2026, rounded for readability. Treat them as planning numbers, re-check the live pricing page before committing, and note that several vendors bill in "credits" or "bytes" rather than characters — I flag those.

Content

1. Free tiers at a glance

These are the perpetual-or-trial free allowances. The distinction matters enormously: a perpetual monthly free tier can make a low-volume "read my texts" project genuinely $0 forever, while a 12-month trial just delays the bill.

Provider	Free allowance	Type	Czech?	Catch / notes
Google Cloud TTS	4M chars/mo (Standard & WaveNet); 1M chars/mo (Neural2 / Chirp 3 HD)	Perpetual, monthly	Yes (cs-CZ neural)	Per-product free tiers; needs billing account on file
Azure AI Speech	500k chars/mo (Neural) on Free (F0) tier	Perpetual, monthly	Yes (cs-CZ: Antonín, Vlasta)	F0 has rate limits; Neural HD voices not free
Amazon Polly	5M chars/mo Standard; 1M chars/mo Neural	12-month trial only	Standard only — no Neural Czech (see note)	After 12 months, full price
ElevenLabs	10,000 credits/mo (~10 min audio)	Perpetual, monthly	Yes (multilingual v2)	Non-commercial only; attribution required; no commercial use on Free
OpenAI	None (pay-as-you-go from char 1)	—	Partial (accent-tinted)	No free tier; cheap per-minute though
PlayAI (PlayHT)	~12,500 chars trial	Trial	Limited	Small one-time trial
Deepgram (Aura)	$200 free credit on signup	One-time credit	No Czech (English-only)	Burns down with use

Service / tier	Native price	$ / 1M chars	$ / hour audio	Czech natural?
Google Standard	$4 / 1M chars	$4	~$0.18	weak (robotic)
Google WaveNet	$4 / 1M chars	$4	~$0.18	mid (improved)
Google Neural2	$16 / 1M chars	$16	~$0.72	Yes
Google Chirp 3 HD	~$30 / 1M chars*	~$30	~$1.35	Yes (newest)
Google Studio	$160 / 1M chars	$160	~$7.20	limited langs
Azure Neural	$16 / 1M chars	$16	~$0.72	Yes
Azure Neural HD	$22 / 1M chars (from Mar 2026)	$22	~$0.99	partial
Amazon Polly Standard	$4 / 1M chars	$4	~$0.18	Czech = robotic
Amazon Polly Neural (NTTS)	$16 / 1M chars	$16	~$0.72	no Czech
Amazon Polly Generative	$30 / 1M chars	$30	~$1.35	no Czech
OpenAI gpt-4o-mini-tts	~$0.015 / min audio	~$12**	~$0.90	accent-tinted
OpenAI tts-1 / tts-1-hd	$15 / $30 / 1M chars	$15 / $30	~$0.68 / ~$1.35	accent-tinted
ElevenLabs Starter	$5/mo → 30k credits	~$167***	~$7.50	Yes
ElevenLabs Creator	$22/mo → ~121k credits	~$182***	~$8.20	Yes
ElevenLabs Pro	$99/mo → ~600k credits	~$165***	~$7.40	Yes
PlayAI (PlayHT) API	~$0.030 / 1k chars	~$30	~$1.35	limited
Deepgram Aura-2	~$0.030 / 1k chars	~$30	~$1.35	no Czech

Platform / GPU	Rate	~$ / GPU-hour
RunPod Serverless RTX 4090	~$0.00031/s	~$1.10
RunPod Serverless A100 80GB	~$0.00076/s	~$2.72
RunPod pod RTX 4090 (rented)	—	~$0.34–0.69
Modal A10G	—	~$1.10
Modal A100 40GB	—	~$2.50–3.00

Monthly volume	= hours audio	Cloud Neural2 ($16/1M)	Serverless GPU (~$0.30/hr)	Owned GPU box (amortized ~$30/mo + elec)	CPU-local (Piper)
100k chars/mo	~2.2 hr	$1.60 (or $0 in free tier)	~$0.66 + eng. effort	~$30 (silly)	$0
1M chars/mo	~22 hr	$16 (first 1M Neural2 free → $0; or use $4 WaveNet)	~$6.60	~$31	$0
10M chars/mo	~222 hr	$160 (minus 1M free ≈ $144)	~$66	~$33	$0
100M chars/mo	~2,222 hr	~$1,584	~$660	~$40–60 (if box keeps up)	$0 (if CPU keeps up)

Cache aggressively. TTS output for identical text never changes — store the generated audio keyed by hash(text + voice + settings). For "read my texts," re-reads are common; caching can cut billed characters by a large fraction.

import hashlib, os
def cache_key(text, voice, fmt="mp3"):
    h = hashlib.sha256(f"{voice}|{text}".encode("utf-8")).hexdigest()
    return f"tts_cache/{h}.{fmt}"
# synth only on cache miss:
path = cache_key(text, "cs-CZ-Neural2")
if not os.path.exists(path):
    audio = synth(text, voice="cs-CZ-Neural2")  # bills here
    open(path, "wb").write(audio)

Route by language and quality need. Use the free natural tier (Google/Azure) for Czech where naturalness matters; consider cheaper OpenAI or CPU-local for English where the bar is lower. A small router saves real money at scale.
Strip non-spoken text before synthesis (markdown syntax, code blocks, URLs, footnote markers). You're billed per character — don't pay to "speak" ** and https://.
Watch byte-billing on Czech. On byte-priced tiers (Google Chirp 3 HD), Czech diacritics inflate the bill 10–20%. Prefer character-billed Neural2/WaveNet for Czech.
Stay inside the free tier deliberately. Track monthly character usage; if you're near 1M on Google, that's your hard "free" ceiling — set an alert.

Google Cloud. Text-to-Speech AI pricing. 2026. https://cloud.google.com/text-to-speech/pricing — Standard $4 / WaveNet $4 / Neural2 $16 / Chirp 3 HD $30 / Studio $160 per 1M chars, and perpetual monthly free tiers (4M Standard+WaveNet, 1M Neural2/Chirp).
Microsoft Azure. Pricing — Azure Speech. 2026. https://azure.microsoft.com/en-us/pricing/details/speech/ — Neural TTS $16/1M chars, Neural HD $22/1M (from Mar 2026), F0 free tier 500k chars/mo.
Voices.com. Words to Time Conversion Calculator. 2026. https://www.voices.com/tools/words_to_time_conversion — basis for the ~150 WPM / ~750 chars-per-minute / ~45k chars-per-hour normalization factor.
Amazon Web Services. Amazon Polly Pricing. 2026. https://aws.amazon.com/polly/pricing/ — Standard $4/1M, Neural $16/1M, Generative/Long-form tiers, 12-month free-trial allowances; Czech voice availability by engine.
OpenAI. API Pricing. 2026. https://openai.com/api/pricing/ — tts-1 / tts-1-hd per-character rates and gpt-4o-mini-tts per-minute audio pricing.
ElevenLabs. Pricing. 2026. https://elevenlabs.io/pricing — Free (10k credits, non-commercial), Starter $5, Creator $22, Pro $99 plans; credit-to-character mapping and multilingual (incl. Czech) support.
Deepgram. Pricing / Aura Text-to-Speech. 2026. https://deepgram.com/pricing — Aura/Aura-2 per-1k-character rate, $200 signup credit, English-only voice coverage.
PlayAI (PlayHT). Pricing. 2026. https://play.ht/pricing/ — API per-character pricing and trial character allowance.
RunPod. Pricing. 2026. https://www.runpod.io/pricing — Serverless per-second GPU rates (RTX 4090, A100) and rented-pod hourly rates used for the self-host cost model.
Modal. Pricing. 2026. https://modal.com/pricing — Per-second serverless GPU pricing (A10G, A100) for the serverless break-even comparison.
Rhasspy. Piper — fast local neural TTS. 2026. https://github.com/rhasspy/piper — CPU-only / Raspberry-Pi-capable open TTS establishing the $0-marginal-cost local baseline; Czech voice availability.
Coqui / community forks. XTTS-v2 model card & requirements. 2025–2026. https://github.com/coqui-ai/TTS — ~4 GB VRAM, Czech support and voice cloning; basis for the GPU/serverless audio-hour cost figures.
getdeploying. Cloud GPU Pricing Comparison. 2026. https://getdeploying.com/reference/cloud-gpu — Cross-provider GPU $/hr reference cross-checking RunPod/Modal rates.

Text-to-Speech for Reading Your Texts — Natural Voices in English & Czech (2026)

Chapter 08: Cost Analysis — Free Tiers, Pricing & Local-vs-Cloud Break-Even

Overview

Content

1. Free tiers at a glance

2. Per-unit pricing, normalized

3. The local / self-host cost model

3.1 CPU-only local (Piper, Kokoro)

3.2 GPU box you own (XTTS-v2, larger open models)

3.3 Serverless GPU (RunPod, Modal, Replicate)

4. Break-even analysis

5. The "reading my texts" volume reality

6. Cost-cutting tactics that apply to any option

Key Takeaways

Key References