A custom AI-powered CMS hands you the rendering layer, which means you also own everything search engines and answer engines read: the <head>, the structured data, the sitemaps, the redirect map, and every byte that ships to the browser. This chapter is the technical-SEO and performance contract for a stack-agnostic build. It covers metadata and JSON-LD as data you generate (not plugins you install), sitemap and IndexNow strategy, canonical/hreflang correctness, a survivable 301 redirect plan for migrating off WordPress, Core Web Vitals budgets with concrete image/font tactics, the new GEO/answer-engine surface (llms.txt, schema as machine facts), and a measurement stack that distinguishes lab fiction from field truth.
When you leave WordPress + Yoast/Rank Math, you lose the plugin that silently rendered your <title>, meta description, canonical, Open Graph, Twitter cards, JSON-LD, and robots directives. In a custom CMS, every one of those is a field in your content model that your rendering layer must serialize into HTML. The single most important architectural decision is therefore server-rendered HTML: the metadata, canonical, hreflang, and JSON-LD must be present in the initial HTTP response, not injected by client-side JavaScript after hydration. Google can render JS, but it does so in a deferred second pass with no guarantees, and most answer-engine crawlers (GPTBot, PerplexityBot, ClaudeBot) do not execute JavaScript at all. SSR/SSG/ISR is non-negotiable for discoverability.
Design your content model so SEO is a first-class object on every entry:
| Field | Source of truth | Fallback |
|---|---|---|
seoTitle | Manual override | title + site name |
metaDescription | Manual override | First ~155 chars of body / AI-generated summary |
canonicalUrl | Computed from slug | Self-referential |
ogImage | Manual or featured image |
| Auto-generated OG card |
noindex | Editor toggle | false |
hreflangGroup | Translation linkage ID | None (mono-lingual) |
datePublished / dateModified | System timestamps | — |
An AI-native CMS has a real edge here: a generation step can draft metaDescription, suggest seoTitle variants, and produce alt text and OG summaries at publish time. Keep these as editable drafts with human approval, not silent auto-writes.
<title>: 50–60 characters before truncation; primary keyword early; unique per page. Google still frequently rewrites titles, but a good explicit title wins more often than not.og:title, og:description, og:image (1200×630, < 8 MB, absolute URL), og:type, og:url. These also increasingly feed link unfurls in Slack, iMessage, LinkedIn, and AI chat surfaces.robots meta + X-Robots-Tag: prefer the HTTP header X-Robots-Tag for non-HTML assets (PDFs, feeds) and the meta tag for pages. noindex must be reachable without JS.JSON-LD is the only format Google recommends, and it is the right one for a custom build because it is fully decoupled from your markup — you generate it from the same content object that renders the page, eliminating drift between visible content and machine data (Google Search Central; Schema Pilot 2026 guide). Inject it as <script type="application/ld+json"> in the <head> or body.
Build a "foundational trio" plus a content type per page:
| Schema type | Where | Why |
|---|---|---|
Organization (or WebSite with SearchAction) | Homepage / global | Brand entity, sitelinks search box, knowledge panel signals |
BreadcrumbList | Every nested page | Breadcrumb rich result; must mirror visible breadcrumbs |
Article / BlogPosting / NewsArticle | Editorial pages | Author/date attribution, Top Stories eligibility |
Product + Offer + AggregateRating | Commerce | Price, availability, review stars |
FAQPage / HowTo | Support content | (Note: Google heavily curtailed FAQ/HowTo rich results from 2023 — still useful as machine facts for GEO, low SERP payoff) |
Critical correctness rules drawn from current Google guidance and 2026 field reporting (Digital Applied; Schema Pilot):
Article missing author shows no author attribution; an Offer missing price shows no price. Validate every required + recommended field.@id and absolute URLs so entities can be cross-referenced (e.g., Article.author → a Person node with a stable @id).dateModified must be truthful and update only on substantive edits — both for Article schema and sitemap lastmod (they should agree).A practical pattern: write one server-side function buildJsonLd(entry) that emits a @graph array combining Organization, BreadcrumbList, and the type-specific node, deduplicated by @id. This keeps a single emitter you can unit-test.
Generate XML sitemaps dynamically from your content store, not as a static cached file. Rules and limits (sitemaps.org; Bing Webmaster Blog 2025):
<lastmod> in full ISO 8601 with timezone (2026-05-31T13:40:33+00:00). Both Google and Bing now lean on lastmod for crawl scheduling, and Bing explicitly uses precise timestamps to drive AI crawling. Set lastmod to the true content-modification time, never to sitemap-generation time — lying here trains crawlers to ignore it.sitemap-posts.xml, sitemap-pages.xml, sitemap-products.xml, sitemap-images.xml) for easier debugging and partial regeneration.robots.txt (Sitemap: https://example.com/sitemap.xml).IndexNow (the Microsoft-led protocol adopted by Bing, Yandex, Naver, and others) lets your CMS push a URL the instant it changes — publish, update, or delete — rather than waiting for a crawl. Wire it into your publish hook: on save, POST the changed URL(s) and your key to https://api.indexnow.org/indexnow. It is cheap to implement and the natural complement to a dynamic sitemap. Google does not participate in IndexNow; for Google, the legacy Indexing API is officially only for JobPosting and BroadcastEvent, so rely on fast sitemaps + internal linking + Search Console's URL Inspection for priority pages.
Canonical: every page should carry a self-referential <link rel="canonical"> by default. Use cross-canonicals only deliberately — for parameterized URLs (filters, tracking params, pagination duplicates) pointing to the clean version. Common custom-build bugs: trailing-slash inconsistency, http vs https, www vs apex, and uppercase/lowercase slug drift all create accidental duplicates. Normalize URLs at the edge (one canonical form, 301 everything else).
hreflang (multilingual/multiregional): the rule that breaks the most migrations is self-reference + bidirectionality. Each language variant must list itself and every alternate, and the lists must agree across all variants (WPPoland 2026; Google docs). Add x-default for the language selector / default fallback. You can emit hreflang in the HTML head, the HTTP header, or the sitemap; for a custom CMS the sitemap method scales best because it keeps the cross-references in one generated artifact instead of N templates. Use ISO 639-1 language + optional ISO 3166-1 region (en, en-GB, cs, de-AT).
If you are leaving WordPress, the redirect map is the single highest-risk SEO deliverable. The golden rule: preserve URLs exactly where you can — if /blog/my-post stays /blog/my-post, no redirect is needed and Google barely registers the platform change (FocusReactive; Pagepro). Only when URL structure changes (dropping /2024/03/ date prefixes, renaming categories, restructuring slugs) do you need redirects.
A safe migration playbook:
return 301 / rewrite, edge middleware, or the host's redirect engine) — faster and more reliable than app-level or plugin redirects, and they survive framework changes./feed/ RSS endpoints, /wp-sitemap.xml → new sitemap, ?p=123 short-links, attachment pages, and category/tag archives you keep.A regression test for the redirect map (assert every old URL returns 301 → expected new URL with 200) belongs in CI; migrations silently rot otherwise.
Google evaluates the three Core Web Vitals at the 75th percentile of real-user (field) data; all three must pass for an "good" overall assessment (web.dev; Google Search Central).
| Metric | Measures | Good | Needs work | Poor |
|---|---|---|---|---|
| LCP (Largest Contentful Paint) | Loading — render time of the largest element | ≤ 2.5 s | 2.5–4 s | > 4 s |
| INP (Interaction to Next Paint) | Responsiveness — worst-case interaction latency | ≤ 200 ms | 200–500 ms | > 500 ms |
| CLS (Cumulative Layout Shift) | Visual stability | ≤ 0.1 | 0.1–0.25 | > 0.25 |
INP replaced First Input Delay in 2024 (re-verified 2026-06-05: confirmed by web.dev/Google Search Central; the three thresholds above are unchanged for 2026 and Google has announced no new/replacement Core Web Vital). INP is widely reported as the most commonly failed metric — third-party 2026 field reporting puts ~43% of sites over the 200 ms threshold (CoreWebVitals.io; Digital Applied), but note this "~43% fail INP" figure comes from SEO-vendor blogs, not Google's own data, so treat it as Uncertain rather than authoritative. INP is JS-execution-bound, which makes heavy hydration frameworks the prime suspect.
Translate these into a performance budget enforced in CI (Lighthouse CI assertions):
| Budget item | Target |
|---|---|
| Total JS (compressed) | ≤ 150–170 KB initial route |
| Total CSS | ≤ 60 KB |
| LCP image | ≤ 150 KB, fetchpriority="high", no lazy-load |
| Total page weight | ≤ 1 MB initial |
| Main-thread long tasks | none > 50 ms on interaction |
| Lighthouse Performance (lab) | ≥ 90 (as a regression gate, not a victory) |
LCP: identify the LCP element (usually a hero image or heading). Preload it, set fetchpriority="high", and never loading="lazy" it. Serve it from a CDN with a small critical-CSS path. Reduce TTFB with edge caching / ISR — a slow origin caps LCP regardless of front-end work.
INP: this is the AI-CMS-specific trap. Rich editors, live preview, personalization, and "smart" client widgets all add main-thread work. Mitigations: ship less JS (prefer islands/partial hydration — Astro, React Server Components, Qwik resumability), break long tasks with scheduler.yield() / requestIdleCallback, debounce expensive handlers, and move non-urgent work off the main thread (Web Workers). Audit third-party scripts ruthlessly; tag managers and chat widgets are common INP killers.
CLS: reserve space for everything that loads late. Set explicit width/height (or aspect-ratio) on all images and embeds, reserve ad/widget slots, avoid injecting content above existing content, and fix font-swap reflow (below).
Images are usually the largest payload and the most common LCP element. The 2026 baseline (web.dev; DebugBear; Crystallize):
<picture> with type hints or a CDN that content-negotiates via Accept.srcset + realistic sizes so phones never download desktop-sized images. Generate a width ladder (e.g., 400/800/1200/1600/2400) at build/upload time.fetchpriority="high" on the LCP image; loading="lazy" on below-the-fold images (never lazy-load the LCP element).@unpic/sharp step. An AI-CMS can auto-generate alt text and OG cards at upload via the same pipeline.Web fonts cause both LCP delay and CLS reflow. The proven 2026 stack (web.dev; DebugBear; font-display guides):
font-display: swap so text renders immediately in a fallback rather than blocking. (optional is even better for CLS if you can tolerate occasional fallback-only renders.)<link rel="preload" as="font" type="font/woff2" crossorigin> for the fonts used in the LCP text, ideally with fetchpriority="high" so they start with the stylesheet.unicode-range); a Latin subset can be a fraction of the full file.size-adjust, ascent-override, descent-override (or framework helpers like next/font's automatic fallback) to make the swap visually seamless and eliminate font-driven CLS — the most-overlooked CLS source.Search behavior is shifting toward AI answer surfaces (ChatGPT Search, Perplexity, Google AI Overviews/AI Mode, Claude with web). Optimizing to be cited by these engines is Generative Engine Optimization (GEO) — a term formalized in late 2023 by Princeton/Georgia Tech/Allen AI researchers (GEO paper, arXiv:2311.09735).
llms.txt — proposed by Jeremy Howard (Answer.AI) in September 2024 — is a Markdown file at your root that gives LLMs a curated, low-noise map of your most important content. It is a convention, not yet an enforced standard, and there is genuine debate about how much major models honor it today; treat it as cheap insurance rather than a guaranteed channel.
# Example Co — AI-Powered CMS Docs
> One-line description of the site and what it offers.
## Docs
- [Getting Started](https://example.com/docs/start.md): Install and first deploy
- [API Reference](https://example.com/docs/api.md): Endpoints and auth
## Policies
- [Pricing](https://example.com/pricing.md)
A common pattern is to pair llms.txt (the index) with llms-full.txt (concatenated full content) and to expose a .md version of each page. An AI-CMS can generate all of this from the same content store automatically.
GEO levers that actually move citation rates (GEO research; 2026 practitioner guides):
Organization, Article author/date, Product specs, FAQPage give models verifiable brand facts to quote instead of hallucinating.robots.txt — explicitly Allow/Disallow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, etc. Decide deliberately: blocking them protects content but removes you from answer engines.dateModified and lastmod matter for AI crawling, especially Bing-powered surfaces (which feed Copilot and others).A realistic 2026 stance: keep doing classic technical SEO (it's the foundation both pipelines read), add llms.txt + per-page Markdown + explicit bot policy, and measure referral traffic from AI sources in analytics rather than trusting GEO promises.
The most expensive mistake is celebrating a green Lighthouse score while CrUX is failing — lab tools simulate a fast device on throttled network; real users are slower and more varied (RUMvision; Lumar). Use a layered stack:
| Layer | Tool | Data type | Use |
|---|---|---|---|
| Field (rankings) | CrUX / Search Console Core Web Vitals report | Real users, 75th pct, 28-day | The actual ranking signal — source of truth |
| Field (operational) | RUM via web-vitals JS library → your analytics; or SpeedCurve / DebugBear / RUMvision | Real users, real-time, segmentable | Alerting, per-route/per-device breakdown, attribution |
| Lab (regression) | Lighthouse CI in the pipeline | Synthetic, deterministic | Block regressions on PRs against the budget |
| Lab (diagnosis) | PageSpeed Insights (combines Lighthouse + CrUX), WebPageTest | Synthetic + field | Root-cause a specific page |
Concretely: ship Google's web-vitals library (the onLCP/onINP/onCLS "attribution build" gives you the offending element and event), send beacons with context (route, device memory, effective connection type, navigation type) to your own endpoint, and aggregate at the 75th percentile — matching how Google judges you. Add Lighthouse CI with assertions enforcing your budget so a heavy dependency or unoptimized hero image fails the build instead of shipping. Check the Search Console CWV report monthly for site-wide URL-group patterns; it is grouped by template, which tells you which page type regressed.
robots must be server-rendered into the initial HTML, with each field modeled as first-class content (AI can draft, humans approve).@graph of Organization + BreadcrumbList + a content type, and treat completeness as binary — partial markup earns zero rich-result lift; validate templates in CI.lastmod, split by type under a sitemap index, and add IndexNow to push changes to Bing/Copilot instantly on publish (Google ignores IndexNow — rely on fast sitemaps + Search Console there).srcset/sizes, fetchpriority="high" + no lazy-load on the LCP image, dimensions always set, driven by a transformation pipeline. Fonts: self-hosted WOFF2, font-display: swap, preload, subset, and size-adjust fallback metrics to kill font-driven CLS.llms.txt (+ per-page Markdown), ship clean SSR HTML, expose structured data as machine facts, set an explicit bot policy in robots.txt (GPTBot/ClaudeBot/PerplexityBot/Google-Extended), and measure AI referral traffic rather than trusting the convention blindly.web-vitals RUM (operational, 75th pct) + Lighthouse CI (regression gate) — a green lab score over failing field data is the most common and costly illusion.lastmod precision driving AI crawling; sitemap + IndexNow pairing.lastmod, not generation time.fetchpriority, size-adjust for CLS.llms.txt origin (Howard, Sept 2024), GEO definition, AI crawler list.