This chapter turns the project's goals into a concrete, testable requirements specification. It covers the functional shape of the system (a 1,000+ page corpus mixing static and editorial content, multi-author workflows, i18n-readiness) and — more importantly — the non-functional constraints that will actually decide whether the build succeeds: Core Web Vitals and performance budgets, the dual discipline of SEO and GEO (Generative Engine Optimization), accessibility under the now-enforced European Accessibility Act, security baselines, and a defensible budget. It closes with a build-vs-buy decision framework and a set of "keep-it-honest" guardrails so the architecture is chosen on evidence rather than on the loudest opinion in the room.
Goals ("a fast, AI-friendly, multilingual content site we control") are not requirements. A requirement is a statement that is specific, measurable, and verifiable. The translation discipline below is the spine of this chapter: each goal becomes one or more functional requirements (what the system does) and one or more non-functional requirements (how well it does it, expressed as a number with a measurement method).
The most reliable way to keep a CMS project honest is to write the non-functional requirements as budgets — hard numerical ceilings tied to an automated check in CI — before any vendor or framework is chosen. Budgets are vendor-neutral; they force every candidate stack to be measured against the same bar.
| Goal | Functional requirement (sample) | Non-functional requirement / budget (sample) | How it is verified |
|---|---|---|---|
| Scale to 1,000+ pages | Content model supports ≥3 content types; bulk import/export | Production build (or incremental publish) completes in <10 min for full corpus | CI build timer |
| Fast for users | — | LCP ≤2.5s, INP ≤200ms, CLS ≤0.1 at the 75th percentile (CrUX) | PageSpeed/CrUX, Lighthouse CI |
| Found by search + AI | Sitemap, robots, JSON-LD on every page | 100% of published URLs have valid structured data; 0 crawl errors |
| Rich Results Test, Search Console |
| Many authors | Roles (author/editor/admin), draft→review→publish | 0 unreviewed pages reach production; full audit trail | Workflow logs |
| Multilingual-ready | Locale field on every entry; fallback chain | hreflang present and reciprocal on 100% of localized URLs | hreflang validator |
| Accessible | Semantic templates, alt text required field | WCAG 2.2 AA, EN 301 549; 0 critical axe violations | axe-core in CI, manual audit |
| Affordable & owned | — | 5-year TCO ≤ budget; no single-vendor lock that blocks export | TCO model, export drill |
The remainder of the chapter expands each row.
A 1,000-page corpus is small for a database but large for a build step, and the mix matters more than the raw count. Three sub-populations behave very differently:
The architectural consequence is rendering strategy must be chosen per population, not for the whole site. At 1,000–5,000 pages a full static build is still fast and is the safest default; the cliff appears in the 10,000–100,000 range. Benchmarks in 2026 are blunt about this:
Requirement to write down: define a full-corpus publish-time budget (e.g. "<10 minutes for a clean rebuild at 5,000 pages; <60 seconds for an incremental publish of a single edited page"). This single number quietly eliminates whole categories of architecture (a pure full-rebuild SSG fails the incremental sub-budget once the corpus is large), and it is the first place a CMS choice should be stress-tested.
Core Web Vitals (CWV) are a confirmed Google ranking signal, folded into the page-experience signals since June 2021. The 2026 "good" thresholds, measured at the 75th percentile of real-user (CrUX field) data per URL, are:
| Metric | Measures | "Good" threshold (75th pct) |
|---|---|---|
| LCP — Largest Contentful Paint | Loading | ≤ 2.5 s |
| INP — Interaction to Next Paint | Responsiveness | ≤ 200 ms |
| CLS — Cumulative Layout Shift | Visual stability | ≤ 0.1 |
(Google Search Central; corewebvitals.io)
INP replaced First Input Delay as a Core Web Vital in March 2024 and is the metric most sites fail today — 2026 industry analyses report ~43% of sites still miss the 200ms INP bar, making it the most-failed vital. The practical lesson for a CMS build: INP is a JavaScript-discipline problem, which is exactly why "zero-JS-by-default" frameworks (Astro, Eleventy) and islands architectures have an inherent advantage over heavy client-side React apps.
Treat CWV as a budget enforced in CI, not a thing checked once at launch. A workable performance budget for a content site:
CWV are estimated to account for only ~10–15% of ranking weight — a tiebreaker, not the main event — but in 2026 they are table stakes: failing them caps the ceiling for everything else, and a slow site also degrades GEO crawl coverage and human conversion.
In 2026 discoverability is two disciplines that share infrastructure. SEO still drives organic traffic volume; GEO — getting cited inside answers from ChatGPT, Perplexity, Google AI Overviews, Gemini, and Copilot — drives high-intent, brand-shaping visibility (LLMrefs; Boston Institute of Analytics). Both impose requirements on the CMS content model and rendering layer, which is why they belong in this chapter and not as an afterthought.
Shared technical baseline (requirements):
Article, Organization, Person (author), and BreadcrumbList schema automatically. Note the moving target: Google removed visible FAQ rich results on 7 May 2026, though FAQ schema still aids AI extraction — a reminder to treat schema as a maintained surface, not a set-and-forget tag.Person entity with credentials), reviewer, last-updated date, and citations should be first-class fields, not free text, so they render as machine-readable author/authority signals.llms.txt — adopt with eyes open. The proposed /llms.txt (and llms-full.txt) Markdown manifest gives AI systems a clean index of key content; Anthropic and OpenAI are reportedly crawling it. But the evidence is thin — a study of ~300,000 domains found no statistical correlation between having an llms.txt file and being cited (c-sharpcorner; Lets Data Science). Requirement: generate it cheaply from the sitemap as a low-cost hedge, but do not over-invest or claim it as a ranking lever.Multilingual SEO (overlaps with §2.5): reciprocal hreflang tags, localized titles/meta, locale-specific canonicals, and per-language sitemaps are required to avoid duplicate-content dilution across locales.
The honest framing for the project: build the infrastructure (SSR HTML, schema-from-model, clean crawler access) as hard requirements; treat the more speculative GEO tactics (llms.txt weighting, prompt-style content) as low-cost experiments with measurement, not as guarantees.
"i18n-ready" is cheaper to design in than to retrofit, even if the site launches in one language. The requirement is not "translate everything now" — it is "make the content model and routing locale-aware so adding a language later is configuration, not surgery."
Concrete i18n requirements for the content model and frontend:
localizations relation so the frontend can programmatically emit accurate hreflang and canonical links (Strapi 5 i18n guide; Sanity).de-AT → de → en). The fallback policy must be explicit and per-field-aware (some fields fall back, some — like price or legal text — must not)./de/…, recommended default for SEO and simplicity), subdomain, or ccTLD. Sub-paths are the lowest-friction choice for a single-codebase content site.Even a single-language launch should ship with the locale field present and the fallback chain configured; this is the single highest-leverage "readiness" decision in the schema.
A 1,000+ page editorial corpus implies multiple contributors, which turns content into a governance problem. Requirements:
These requirements interact with the build-vs-buy decision: mature editorial workflow is one of the strongest arguments for buying (it is undifferentiated, hard to build well, and expensive to maintain).
Accessibility is now a legal requirement, not a nicety. The European Accessibility Act (EAA) enforcement deadline was 28 June 2025 and is now actively enforced across EU member states; it covers e-commerce, banking, transport ticketing, telecoms, and audiovisual media, and applies to non-EU providers selling into the EU (Level Access; OneTrust). The technical standard is EN 301 549, which currently maps to WCAG 2.1 AA and is being updated toward WCAG 2.2 AA. Penalties vary by member state, roughly €5,000–€500,000.
Requirement: target WCAG 2.2 Level AA (it is a superset of 2.1 AA and future-proofs against the EN 301 549 update). Enforce a subset automatically (axe-core / Pa11y in CI: 0 critical violations as a build gate) and budget for at least one manual audit (keyboard navigation, screen-reader pass, contrast, focus order) before launch — automated tools catch only ~30–40% of issues. Bake required alt-text and semantic-heading discipline into the content model (§2.6).
Security baseline (expanded in later chapters; stated here as requirements):
Budget is a non-functional requirement and the one most often understated, because the sticker price (license or "free" self-host) is a fraction of the real cost. The 2026 consensus framing: five-year TCO is the only honest lens, and hidden integration, training, and operations can add ~150–200% on top of a "buy" license fee over time (Neontri; Oceanscode CFO guide). For enterprise CMS, operational cost (the daily labor of creating/editing/publishing) is typically the largest slice of TCO, not licensing.
Indicative 2026 pricing for the headless options most relevant to an owned, AI-native build:
| Platform | Entry / notable tier (2026) | Model | TCO note |
|---|---|---|---|
| Strapi | Free self-hosted; Cloud from ~$29/mo; $299/mo for SSO/audit/custom roles | Open-source + usage | "Free" self-host ≈ 0.25–0.5 FTE/yr to operate reliably = $50K–$100K/yr in senior eng time (Pooya Golchian; Elmapi) |
| Payload | Self-host open-source; Payload Cloud Standard ~$35/mo | Open-source + usage | Code-first, Next.js-native; ops cost similar to other self-host |
| Sanity | Generous free tier; Growth ~$15–$199/seat-or-base + usage overages | Usage + per-seat | Predictable at small scale; document/API overages grow with corpus |
| Contentful | Limited free tier; Team plan ~$300/mo; enterprise much higher | Per-seat + usage | Workflow/SSO gated behind expensive tiers |
(Pricing from buildmvpfast.com Feb-2026 tracker, Strapi, Cosmic, Pooya Golchian comparisons — verify against vendor pages before committing; these tiers churn.)
The crucial TCO insight: "self-hosted and free" is rarely the cheapest. While Strapi-self-hosted has a $0 license, operating it reliably (upgrades, security patches, uptime, backups, scaling) consumes an estimated 0.25–0.5 FTE/year — $50K–$100K of senior engineering at 2026 rates — which often exceeds a SaaS subscription. Conversely, SaaS per-seat and usage pricing can balloon as authors and traffic grow. There is no universally cheap option; the cheapest option is the one whose cost shape matches your growth (predictable users → per-seat is fine; spiky/unbounded pages → watch usage metering).
Requirement: model 5-year TCO across at least three scenarios (build, buy-SaaS, hybrid) including license, implementation, operations (FTE), integration, training, and an exit/migration reserve. Write the budget ceiling as a hard number and re-check it at the 6- and 12-month marks.
The 2026 consensus principle is clean: buy for commodity, build for differentiation — buy the standard parts of your business, build the parts that make you different (Neontri 3-model framework; McCary Group). For a CMS, the "commodity" parts are editorial workflow, auth, media handling, and CDN delivery; the "differentiating" parts are usually the AI features and the specific content model. A hybrid ("composable") posture — buy a headless content store, build the AI layer and the frontend around it — is the default recommendation for most teams, and is the architecture this report develops in later chapters.
A weighted scoring rubric keeps the choice from becoming a personality contest. Score each candidate (custom build / SaaS / hybrid) 1–5 on each vector, multiply by a weight (1–3), and total:
| Decision vector | Weight (example) | What a high score means |
|---|---|---|
| 5-year TCO | 3 | Lower total cost over 5 years |
| Time to market | 2 | Ships sooner |
| Integration / API openness | 3 | Connects cleanly; no lock-in to export |
| Competitive advantage / control | 3 | More control over the differentiating layer |
| Operational burden / team capacity | 2 | Less ongoing ops load on your team |
| Accessibility & compliance fit | 2 | Meets WCAG 2.2 / EAA out of the box |
| AI extensibility | 3 | Easy to add MCP, RAG, generation hooks |
(Adapted from Neontri and AgileSoftLabs 2026 frameworks.)
Keep-it-honest guardrails — the practices that prevent a build-vs-buy decision from quietly rotting:
llms.txt as a cheap hedge, not a proven lever (no measured citation correlation across 300K domains).