This chapter is the architectural heart of an AI-native CMS. Before a single line of editor UI or rendering code is written, you decide what content is — its types, fields, relationships, validation rules, and lifecycle states. In 2026 that decision carries more weight than ever, because the same schema must now serve four distinct consumers at once: human editors, AI generation/agents, omnichannel delivery (web/app/email/voice/in-store), and answer engines (ChatGPT, Google AI Overviews, Perplexity, Claude). We cover schema-first design, content types vs. modular blocks, Portable Text, references and taxonomies, validation, slugs/redirects, draft/published state machines, and AI-specific modeling — closing with a concrete, copy-ready example schema.
A content model is the formal definition of how content is organized inside a CMS: the content types, the fields each type owns, the validation that governs those fields, and the relationships between types. Schema-first design means treating that model as the source of truth and the contract that every downstream system reads from — editors, front ends, AI agents, and search/answer engines alike. A well-designed model "prevents duplication, enforces consistency, and keeps delivery teams nimble," while a bad one quietly taxes every future feature (Cosmic, Content Modeling Best Practices, 2025).
The 2026 escalation is this: structured content has become "the primary input layer for AI search," and organizations that store content as queryable structured data hold a material advantage when AI assistants answer questions by citing sources (Storyblok, Structured Content for the AI Era, 2026). The same Sanity team that pioneered the headless approach now frames it bluntly: "your CMS is already an AI backend" — if the model is structured (Sanity, 2025). The schema is no longer just an editor convenience; it is an API for machines that reason.
A useful mental model: content as data, not documents. The old CMS world stored pages as HTML blobs ("HTML soup"). The AI-native world stores meaning — discrete, typed, addressable fields — and renders presentation last. Everything in this chapter follows from that inversion.
There are two fundamentally different ways to slice content, and mature models use both deliberately.
| Axis | "Content types as pages" (entity model) | "Modular blocks/components" (composition model) |
|---|---|---|
| Unit of thought | A thing: Article, Product, Author, Event | A reusable piece: Hero, FeatureGrid, Quote, CTA |
| Lifecycle | Has its own URL, slug, publish state | Lives inside a parent document |
| Reuse | Referenced by ID across the system | Assembled into a pageBuilder array |
| Best for | The nouns of your domain | Flexible layouts and landing pages |
The modern best practice is a modular content model: break content into small, reusable types — components assembled into pages — rather than treating each content type as a monolithic page. The payoff is that "the same component can be rendered across web, mobile, regional microsites, or partner portals without duplication" (Eight25Media; Storyblok, 2026). Storyblok calls these Bloks; Contentful calls them components assembled in a Compose-style layout; Sanity models them as objects inside a pageBuilder array of references or inline objects.
A practical rule: model the nouns of your business as entity types (these earn URLs, slugs, and publish states), and model layout/presentation as blocks (these never earn their own URL). Avoid the common anti-pattern of one giant Page type with 80 optional fields — it is unqueryable, un-AI-friendly, and a localization nightmare.
Rich text is where most content models leak. A WYSIWYG that emits HTML couples your content to one presentation forever and produces exactly the "HTML soup" that breaks on the next channel. The 2026 answer is Portable Text — a published, open JSON specification for block content (github.com/portabletext/portabletext). It stores rich text as an array of blocks, each block an array of children spans, with style, marks, and markDefs describing structure and meaning rather than visual styling.
[
{
"_type": "block",
"style": "h2",
"children": [{ "_type": "span", "text": "Why structure wins" }]
},
{
"_type": "block",
"style": "normal",
"markDefs": [{ "_key": "a1", "_type": "link", "href": "https://example.com" }],
"children": [
{ "_type": "span", "text": "Read the " },
{ "_type": "span", "marks": ["a1"], "text": "spec" },
{ "_type": "span", "text": "." }
]
},
{ "_type": "callout", "tone": "warning", "body": "Validate before publishing." }
]
Why this matters for an AI-native CMS:
callout, a productCard, an imageWithAlt — directly in the flow, "without needing to create specific renderers in the editor" (Sanity Docs).Portable Text is the de facto open standard, but the principle is portable: Contentful's Rich Text and Strapi's Blocks field are also JSON-AST formats. Reject any field that stores raw HTML as a string — it is the single most expensive modeling mistake for AI and omnichannel.
References turn isolated documents into a graph — and the graph is precisely what AI systems exploit. An entity graph is "a network of connected entities — people, organizations, products, concepts — and their relationships," and AI systems "use entity graphs to understand context, verify information, and determine citation confidence" (GEO Compass, 2026). Your reference model is your entity graph.
Reference, don't duplicate. A common pitfall is "storing slugs or categories in plain text fields instead of using reference types, which loses all the benefits of validated, queryable relationships" (Cosmic, 2025). Model author, category, and relatedProducts as references to real documents, not strings.
Reference cardinality:
| Pattern | Field shape | Example |
|---|---|---|
| One-to-one | reference | article.author → author |
| One-to-many / many-to-many | array of references | article.categories → [category] |
| Hierarchical (parent/child) | self-reference with filter | category.parent → category |
In Sanity, hierarchical taxonomies are expressed with a self-reference plus a validation filter — e.g., options: { filter: 'defined(parent)' } to restrict references to child documents only — and traversed with the GROQ dereference operator -> (Sanity Docs, Creating a Parent/Child Taxonomy). Most platforms offer the equivalent: Contentful uses linked entries, Strapi uses relations, Payload uses relationship fields.
Taxonomy design tips for AI and AEO:
synonyms array rather than letting editors free-type variants.sameAs URLs (Wikipedia, Wikidata, official site) to entity types like Author, Organization, and Brand. This is both a schema.org property and an entity-graph anchor for citation confidence.Validation is where a model stops being a suggestion and becomes a contract. In an AI-native CMS validation does double duty: it keeps humans honest and it constrains what an AI agent is allowed to write back through the API/MCP layer. An agent that can mutate content is only as safe as the schema's guardrails.
Layered validation strategy:
| Layer | Enforces | Tooling (2026) |
|---|---|---|
| Field-level | required, min/max length, regex, format, enum | Native CMS schema (validation rules) |
| Cross-field | conditional requirements, date ordering | Custom validators / schema functions |
| Referential | reference target type, existence, filter | Reference to/filter options |
| Type-safe consumer | shape of query results in app code | Sanity TypeGen, Zod validators on GROQ output |
| Pre-publish gate | SEO completeness, alt text, word count | Workflow rules / publish-time hooks |
Two patterns worth standardizing: (1) make altText required on every image object — it is an accessibility (WCAG 2.2) and AEO requirement, not an option; (2) validate slugs at the field level (lowercase, kebab-case, unique). End-to-end type safety closes the loop on the code side: Sanity TypeGen generates TypeScript types from your schema, and Zod can validate the actual GROQ response at runtime (Chin, End-to-end type safety for Sanity GROQ queries, 2025).
Slugs are deceptively load-bearing — they are the public identity of a document and a ranking/citation signal. Best practices:
slug field (with a source field for auto-generation and a uniqueness validation), not a free-text string.A robust pattern is a dedicated redirect content type (from, to, statusCode 301/302/410, isActive) that your edge/middleware reads. This keeps redirects as content — editable, auditable, and exportable to llms.txt-style discovery — instead of buried in code deploys.
Every entity type needs a lifecycle. The minimum is draft vs. published; mature teams model a state machine: Draft → In Review → Approved → Published → (Archived). Modeling it as a state machine "prevents invalid changes — for instance, content can't go from Draft straight to Published without passing Review" (Afteractive, 2025).
Implementation patterns differ by platform philosophy:
| Platform | Draft/publish mechanism |
|---|---|
| Sanity | Two parallel docs: drafts.<id> (mutable) and <id> (published); presence of a draft = unpublished changes |
| Contentful | Per-entry Draft/Changed/Published status + custom Workflows (state, role gating) |
| Strapi / Payload | publishedAt null vs. set; Payload adds drafts + versions |
For an AI-native CMS, three additions are non-negotiable:
aiGenerated/reviewStatus field so machine-authored content is flagged and never auto-published without a human gate — agentic platforms "generate, audit, and publish with minimal human review," which makes the review gate the safety mechanism (CMS Critic, 2026).publishAt timestamp) rather than a side-channel cron.This is the section that distinguishes an AI-native model from a merely headless one. The same schema must satisfy four audiences simultaneously.
1. Humans (editors). Field descriptions, sensible grouping, conditional fields, previews. Good descriptions are now dual-purpose — see consumer 2.
2. AI generation & agents. The breakthrough insight of 2025–26 is that your schema is the prompt. Sanity's Content Agent and Agent Context work by "compressing the Sanity schema" so an agent understands "field descriptions, document relationships, required vs. optional fields, and the semantic meaning of content," then translates natural-language requests into precise mutations (Sanity, Agent Context; CMS Critic, 2026). Practical implications for your model:
headline, summary, keyPoints[], targetAudience, relatedProducts[]. A structured model "gives AI agents everything they need" to generate and audit (Sanity, 2026)._updatedAt.3. Omnichannel delivery. Because content is structured and presentation is decoupled (Portable Text + blocks), the same documents render to web, native app, email, voice, and in-store screens. The model rule: no presentation-only fields on entity types (no "background color of the third paragraph"). Layout choices live in blocks; meaning lives in fields.
4. Answer engines (AEO/GEO). This is the newest consumer and reshapes the model in concrete ways:
Article/BlogPosting, Product/Offer, FAQPage, HowTo, Organization, Person. Pages with FAQPage schema appear in Google AI Overviews ~3.2× more often, and "search engines use schema as a signal, while AI engines use schema as a source" (AirOps; SearchAtlas, 2026). Don't bolt JSON-LD on in templates — generate it deterministically from the model so it can never drift from the content.faq[] array (question, answer as Portable Text) and consider a keyTakeaways[] field; AI engines cite content that "ranks for the sub-queries the AI generates" (LLMrefs, 2026).speakable candidates. Mark which fields are voice-suitable..md variants) is a serialization, not a rewrite.Below is a vendor-neutral model for an editorial + product site, expressed in a Sanity-style schema (the concepts map directly to Contentful/Strapi/Payload). It demonstrates entity types, blocks, references, taxonomy, validation, slug+redirect, lifecycle, and AI/AEO fields.
// ---------- Entity type: Article ----------
export const article = {
name: 'article', type: 'document', title: 'Article',
groups: [{ name: 'content' }, { name: 'seo' }, { name: 'ai' }],
fields: [
{ name: 'title', type: 'string', group: 'content',
validation: r => r.required().max(120) },
{ name: 'slug', type: 'slug', group: 'seo',
options: { source: 'title', maxLength: 96 },
validation: r => r.required() }, // lowercase/unique enforced
{ name: 'summary', type: 'text', rows: 2, group: 'seo',
description: 'One-sentence summary, max 160 chars. Used as meta description AND the answer-engine snippet.',
validation: r => r.required().max(160) },
{ name: 'author', type: 'reference', to: [{ type: 'author' }],
group: 'content', validation: r => r.required() },
{ name: 'categories', type: 'array', group: 'content',
of: [{ type: 'reference', to: [{ type: 'category' }] }],
validation: r => r.min(1).max(3) },
// Portable Text body with custom inline blocks
{ name: 'body', type: 'array', group: 'content',
of: [
{ type: 'block' }, // standard rich text
{ type: 'imageWithAlt' }, // alt text required (see below)
{ type: 'callout' },
{ type: 'productCard' }, // reference-backed block
] },
// AEO: explicit Q&A → maps to schema.org FAQPage
{ name: 'faq', type: 'array', group: 'ai',
of: [{ type: 'object', fields: [
{ name: 'question', type: 'string' },
{ name: 'answer', type: 'array', of: [{ type: 'block' }] },
] }] },
{ name: 'keyTakeaways', type: 'array', of: [{ type: 'string' }], group: 'ai' },
// AI governance
{ name: 'aiGenerated', type: 'boolean', initialValue: false, group: 'ai' },
{ name: 'reviewStatus', type: 'string', group: 'ai',
options: { list: ['draft', 'in_review', 'approved', 'published'] },
initialValue: 'draft' },
{ name: 'publishAt', type: 'datetime', group: 'seo' },
// schema.org mapping hint (deterministic JSON-LD generation)
{ name: 'schemaType', type: 'string', group: 'seo',
options: { list: ['Article', 'BlogPosting', 'NewsArticle'] },
initialValue: 'BlogPosting' },
],
};
// ---------- Reusable block: image with required alt ----------
export const imageWithAlt = {
name: 'imageWithAlt', type: 'image', title: 'Image',
fields: [
{ name: 'alt', type: 'string', title: 'Alt text',
description: 'Required for WCAG 2.2 + answer-engine indexing.',
validation: r => r.required().max(125) },
],
};
// ---------- Taxonomy: hierarchical Category ----------
export const category = {
name: 'category', type: 'document', title: 'Category',
fields: [
{ name: 'title', type: 'string', validation: r => r.required() },
{ name: 'slug', type: 'slug', options: { source: 'title' },
validation: r => r.required() },
{ name: 'parent', type: 'reference', to: [{ type: 'category' }],
options: { filter: 'defined(parent) || _id != _id' } }, // restrict graph
{ name: 'synonyms', type: 'array', of: [{ type: 'string' }],
description: 'Alternate terms for AEO entity consistency.' },
],
};
// ---------- Redirects as content ----------
export const redirect = {
name: 'redirect', type: 'document', title: 'Redirect',
fields: [
{ name: 'from', type: 'string', validation: r => r.required() },
{ name: 'to', type: 'string', validation: r => r.required() },
{ name: 'statusCode', type: 'number', initialValue: 301,
options: { list: [301, 302, 410] } },
{ name: 'isActive', type: 'boolean', initialValue: true },
],
};
The corresponding GROQ query shows how references collapse into a clean, AI-ready payload via the dereference operator:
*[_type == "article" && slug.current == $slug && reviewStatus == "published"][0]{
title, summary, body, faq, keyTakeaways, schemaType,
"author": author->{name, "sameAs": sameAs},
"categories": categories[]->{title, "slug": slug.current}
}
This single document now feeds the website, an email digest, an llms.txt entry, a JSON-LD BlogPosting + FAQPage block, and an MCP-exposed tool an agent can read or update — from one schema, with validation enforced for both humans and machines.
sameAs and explicit synonyms.aiGenerated, reviewStatus, versioning, and a human gate before machine-authored content goes live.-> dereference operator.