GEO Field Manual: The Complete Practitioner Guide
The GEO Field Manual is the most comprehensive free resource on Generative Engine Optimisation — a full practitioner guide covering all five GEO Stack layers, the mechanics of AI retrieval and citation, content extraction architecture, entity gravity, structural authority, experiments and measurement, commercial exposure analysis, and future-proofing for 2027 and beyond. It is Book #7 in the GEO Lab Library, written for practitioners who need the complete methodology in one place.
The GEO Field Manual is the book you work from, not just read. It is structured as a reference — with detailed implementation guidance for each GEO Stack layer, diagnostic frameworks for auditing your current position, experiment templates for testing improvements, and appendices covering schema reference, toolsets, and research sources.
If you read one GEO Lab book end to end, make it this one.
What’s Inside the GEO Field Manual
Part I — The Structural Shift
How classic ranking models worked and why they no longer predict AI citation. The passage-level shift — how AI engines evaluate individual passages, not pages. Why traditional mental models break in generative search. The visibility-versus-ranking distinction and its commercial implications. Zero-click risk mapping and brand lift from AI inclusion.
Part II — The GEO Framework
The five GEO Stack layers in full: Layer 1 — Retrieval: technical signals that determine whether AI crawlers find and index your content. Layer 2 — Extractability: structural signals that allow AI to quote your content accurately — direct answer placement, heading hierarchy, FAQ sections, tables, and list formatting. Layer 3 — Entity Reinforcement: the signals that build brand entity recognition — schema, Wikipedia and Wikidata presence, Knowledge Panel, author schema, and brand mention consistency. Layer 4 — Structural Authority: the trust signals that make AI engines choose your content over alternatives — backlink quality, citation patterns, E-E-A-T implementation. Layer 5 — System Memory: freshness, update frequency, and the signals that maintain citation over time. How the five layers interact — and the scoring weights that determine where to prioritise.
Part III — Operational Implementation
Hub-and-spoke linking architecture. Comparison tables and list structures for AI extraction. Section opening inventory — how to audit and rewrite content openings for GEO. The six-step GEO audit process from scope definition to priority action list. Recommended toolset for implementation and monitoring.
Part IV — Experiments, Measurement and Tooling
GEO experiment design principles with a complete experiment template. GEO Score methodology and scoring dimensions. GEO attribution — types, tracking infrastructure, and strategic patterns in attribution data. Commercial exposure mapping and revenue risk modelling. Near-term trends (2026–2027), medium-term directions (2027–2029), and enduring principles.
Appendices
Full GEO Stack scoring reference. Schema implementation guide. Two complete worked content rewrites with pre- and post-analysis. Primary research and industry data sources.
Frequently Asked Questions
What are the five layers of the GEO Stack?
The five GEO Stack layers are: Layer 1 — Retrieval Probability (technical signals ensuring AI crawlers can access and index your content), Layer 2 — Extractability (structural signals that allow AI to quote accurate passages), Layer 3 — Entity Reinforcement (signals that build AI recognition of your brand as a trusted entity), Layer 4 — Structural Authority (trust signals including backlinks, citations, and E-E-A-T), and Layer 5 — System Memory (freshness and update signals that maintain citation over time).
What is passage-level retrieval and why does it matter for GEO?
Passage-level retrieval refers to how AI engines evaluate and extract individual passages or sections of a page, rather than ranking the page as a whole unit. A page with a well-structured, directly-answering opening paragraph may be cited for that passage even if the rest of the page is less optimised. This is why the GEO Writing Formula emphasises the first one to two sentences of every content section.
What is entity gravity and how do you build it?
Entity gravity is the pull that well-established brand entities exert on AI retrieval — the tendency for AI engines to default to recognised entities as sources across multiple query types. Building entity gravity requires: consistent brand name usage across all content and schema, Wikipedia and Wikidata presence, Google Knowledge Panel optimisation, author schema with linked credentials, and brand mention accumulation on authoritative external sources.
How do you run a GEO audit?
The GEO audit process in the Field Manual covers six steps: (1) Define scope and query set, (2) Retrieval test — checking whether AI crawlers can access your content, (3) Extractability audit — reviewing opening sentences, heading formats, and FAQ structure, (4) Entity gravity audit — checking schema, Knowledge Panel, and brand mention consistency, (5) Structural authority audit — backlink quality and E-E-A-T signal completeness, (6) Generate a priority action list based on the lowest-scoring layers.
What is commercial exposure mapping in GEO?
Commercial exposure mapping is the process of identifying which of your revenue-generating queries are most at risk from zero-click AI answers — where users get complete answers from AI without visiting your site. The GEO Field Manual includes a framework for mapping commercial exposure and prioritising GEO investment based on revenue risk.
Continue in the GEO Lab Library
- Advanced: GEO Authority Playbook — entity architecture, competitive citation intelligence, and GEO at scale.
- Test it: GEO Experiments — design experiments to validate what you implement from this manual.
- Reference: The GEO Glossary — definitions for every term used throughout the Field Manual.
- Browse all: thegeolab.net/ebooks
Generative Engine
Optimisation:
A Practical Field Manual
Table of Contents
Why Search Optimisation Had to Change
For most of the past two decades, search engine optimisation was a ranking problem. You competed for position — first on the page, first in the mind, first in the click. The model was linear: produce content, accumulate authority signals, achieve rank, receive traffic. The page was the unit of competition. Position was the measure of success.
That model held because search engines were fundamentally retrieval-and-ranking systems. A query arrived, the index was consulted, a list of documents was scored, and a ranked set of links was served. Users had to choose, click, and consume. Every optimisation lever — keyword placement, title tags, link authority, technical structure — was designed to move a page up that list.
Generative AI has restructured this logic. The systems now dominant in consumer search — Google AI Overviews, Microsoft Copilot, Perplexity, ChatGPT with web browsing, and Gemini — do not simply retrieve and rank. They retrieve, extract, compress, and synthesise. They read the document, pull out the most extractable sections, and compose their own answer. The user may never visit your page. They may never know which source the system used. And they will almost certainly not see a ranked list of ten blue links.
This changes what optimisation means.
Visibility in a generative environment is not defined by rank. It is defined by inclusion — whether your content is selected during retrieval, whether your extracted sections survive compression, and whether your entities, framings, and facts appear inside the generated answer. A page can rank first and never be cited. A page buried at position seven can become the primary source for a topic if its sections are highly extractable and its entities are well-reinforced.
This manual documents the emerging discipline of Generative Engine Optimisation (GEO): the structured practice of engineering content for retrieval, extraction, and synthesis in AI-driven search environments. It draws on public experimentation, published research (including Princeton’s 2024 GEO study), and practical implementation across real commercial sites.
A practical field reference — not a theory text. Each chapter combines a conceptual model with operational tools: checklists, templates, audit frameworks, and experiment designs. You should be able to use it during content reviews, editorial briefs, and site audits.
Who This Is For
This manual is written for practitioners: SEOs, content strategists, technical marketers, and site owners who are already competent in traditional search optimisation and now need to extend that competency into generative environments. It assumes familiarity with core SEO concepts — crawling, indexing, on-page optimisation, link authority — but does not assume a technical background in machine learning or natural language processing.
If you are building content systems for commercial websites where organic search contributes meaningfully to revenue, this manual addresses your immediate operational concerns: what to change, how to audit what you have, how to measure what you cannot yet see directly, and how to prioritise interventions in a transition period where both traditional ranking and generative retrieval matter.
How to Use This Manual
The manual is structured in four parts:
- Part I – The Structural Shift explains why traditional optimisation models are insufficient in generative search environments. Read this for the conceptual foundation.
- Part II – The GEO Framework introduces the GEO Stack — a five-layer model for engineering generative visibility — and defines the core variables of retrieval probability and extractability.
- Part III – Operational Implementation translates the framework into practical workflows: page design, internal linking, auditing, and commercial strategy.
- Part IV – Experiments, Measurement & Tooling covers how to run your own experiments, model retrieval factors, attribute AI-sourced traffic, and interpret emerging trends.
The appendices are designed as standalone working tools: print them, use them in audits, share them with editorial teams.
A Note on Uncertainty
Generative search is evolving rapidly. No practitioner has complete visibility into how any specific system selects content, weights signals, or handles different domains. What this manual offers is a structured framework based on observable patterns — not algorithmic certainty. Treat every recommendation as a working hypothesis. Run your own experiments, document your results, and update your models. That discipline is, in fact, the practice itself.
The field is new. The frameworks are provisional. The direction is clear.
Your 30-Day GEO Roadmap
Before reading further, use this roadmap to anchor the manual to immediate action. Each week produces a tangible output. By end of Month 1 you will have a baseline, a scored audit, a first set of structural fixes, and your first experiment result.
| Timeframe | Action | Output |
|---|---|---|
| Week 1 | Run a baseline prompt audit: submit your top 20 target queries into Google AI Overviews and Perplexity (5 iterations each). Record citation presence, which sections were quoted, and which competitors appeared. | Baseline citation rate document |
| Week 2 | Apply the GEO Audit Checklist (Appendix A) to your top 5 traffic pages. Score each H2 section for extractability and entity match. Use Appendix B worksheet to track scores. | Section-level score sheet; rewrite backlog |
| Week 3 | Execute highest-priority structural rewrites: answer-first restructuring for sections scoring <45, entity canonicalisation, internal linking anchor text audit. Fix orphan pages. | Rewritten pages published; anchor text updated |
| Week 4 | Re-run the Week 1 baseline queries. Compare citation rates before and after rewrites. Document your first experiment using the protocol in Chapter 12. | Experiment #001 results; delta citation rate |
| Month 2 | Run a full cluster-level structural audit (Chapters 9–10). Score Retrieval Probability across priority sections (Chapter 13). Build a 90-day GEO rewrite calendar from your findings. | Cluster audit report; 90-day rewrite plan |
GEO compounds. Each structural improvement raises retrieval probability for every query that touches that section. The 30-day roadmap is the start of a continuous practice — not a one-time project.
The Structural Shift
How generative AI changed the unit of competition from pages to passages — and why every assumption about visibility needs to be rebuilt.
The End of Document-Centric Optimisation
Traditional search engine optimisation was designed around a document model. A document — the web page — was the atomic unit of competition. Algorithms scored documents holistically: keyword relevance across the entire page, domain authority, link equity, technical signals. You ranked by document. You competed by document. You tracked positions by document.
This model was never perfect, but it was coherent. It gave practitioners a clear object to optimise, a measurable output to track, and a set of levers with understood effects. Improve the document; improve the rank. The feedback loop was slow but interpretable.
How Classic Ranking Models Worked
At the core of traditional ranking was a document-scoring function. Google’s foundational PageRank algorithm treated the web as a directed graph and inferred document authority from citation patterns. Later layers added keyword-match signals, user engagement proxies, and semantic analysis models. But the output was consistent: a ranked list of documents, ordered by estimated relevance and authority for a given query.
The practical consequence was that optimisation focused on three interconnected layers:
- Topical relevance — Does this document address the query’s subject domain? Achieved through keyword strategy, semantic coverage, and topical clustering.
- Authority signals — Does external evidence (links, mentions, engagement) indicate this document is credible? Achieved through link building and earned media.
- Technical accessibility — Can the crawl and index pipeline process this document efficiently? Achieved through site speed, crawlability, and structured markup.
These three layers remain relevant. But they are no longer sufficient. The new visibility problem is not about whether your document scores well. It is about whether your document’s internal sections are extractable when AI systems retrieve and parse it.
Classic SEO asks: Does this page rank? GEO asks: When this page is retrieved, which sections will be extracted — and will they survive compression?
The Passage-Level Shift
Google’s passage-level indexing, announced in 2020, was an early signal of this transition. The system could identify a single relevant passage within an otherwise less-relevant page and use that passage to satisfy a query. The document’s overall relevance became less important than the local relevance of individual sections.
Generative systems have extended this logic dramatically. In a generative pipeline, the unit of retrieval is typically a chunk — a paragraph, a definition block, a list, a table, a section delimited by a heading. The system does not retrieve pages; it retrieves chunks. It then compresses those chunks into a synthesised response that may bear little structural resemblance to the original document.
This means optimisation at the document level is necessary but insufficient. You need to optimise at the section level. If your best answer to a query is buried in paragraph fourteen of a three-thousand-word article, surrounded by contextual narrative that makes no sense in isolation, that answer may never appear in a generated response — even if the article ranks first.
Why Traditional Mental Models Break
Several widely-held SEO assumptions fail in generative environments:
| Traditional Assumption | Why It Breaks in GEO |
|---|---|
| Longer pages rank better | Length increases dilution risk; key sections compete with noise |
| Introduction sets context for the whole page | Sections must be independently coherent; context is stripped during chunk retrieval |
| Ranking #1 captures the most traffic | AI Overviews reduce CTR regardless of rank; inclusion, not position, determines visibility |
| Keyword frequency signals relevance | Semantic embedding models assess meaning, not term repetition |
| Internal links distribute PageRank | In GEO, internal linking builds semantic entity graphs and reinforces topical authority |
| Duplicate content is always harmful | Consistent entity repetition across sections is a retrieval signal, not a penalty risk |
The Visibility vs. Ranking Distinction
The most important conceptual shift in moving from SEO to GEO is distinguishing between ranking and visibility. These are no longer synonymous.
Ranking is a position within an ordered list. Visibility — in a generative context — is the probability that your content surfaces inside an AI-constructed answer. You can rank without being visible. You can be visible without ranking, if your content is cited in a generated response that appears above organic results.
This distinction has commercial consequences. A site whose content is frequently cited in AI Overviews but which ranks at position four for that query may receive fewer clicks than it would have in a pre-AI environment — but it may also be building brand and messaging authority that converts at a different point in the customer journey. Measuring only rank obscures this dynamic.
Effective GEO practice requires building measurement systems that capture both dimensions. Part IV of this manual covers this in detail. For now, the key principle is this: optimisation begins when you separate the question of where you rank from the question of whether your content appears.
Traditional SEO operated on a document model where entire pages were scored and ranked. Generative search operates on a chunk model where passages are retrieved, extracted, and synthesised. This shift requires moving from page-level to section-level optimisation — and from ranking measurement to visibility measurement.
How Generative Search Systems Actually Work
To optimise for a system, you need to understand how it works — at least conceptually. You do not need to understand the mathematics of transformer architecture or vector embeddings at a technical level. But you do need a working model of the pipeline that processes your content, because every stage of that pipeline is a point where your content can succeed or fail.
The core pipeline of a modern generative search system can be summarised in five stages:
Stage 1: Query Processing
When a user submits a query, the system does not simply look for pages containing those words. Modern systems process the query semantically — interpreting intent, expanding it into related sub-queries (a process sometimes called query fan-out), and generating a vector representation of the query’s meaning.
This semantic interpretation means that a query about “how to improve AI search visibility” will retrieve content covering retrievability, extractability, GEO, and structured data — even if none of those terms appear in the original query. The system is matching meaning, not keywords.
For practitioners, this has a critical implication: your content must be semantically aligned with topic domains, not just keyword lists. Coverage of related concepts, consistent entity usage, and topical depth matter more than keyword density.
Stage 2: Retrieval
Once the query is processed, the system retrieves candidate content blocks. In a Retrieval-Augmented Generation (RAG) architecture — which underpins most contemporary generative search systems — this retrieval typically uses a combination of dense vector search (semantic similarity) and sparse keyword search.
The retrieved units are chunks: sections of documents that have been pre-segmented and indexed. The segmentation may follow structural cues (headings, paragraphs) or may be fixed-size (e.g., 512 tokens). The system selects chunks whose vector representation most closely matches the query vector.
Because retrieval operates at the chunk level, a section that begins with “As we discussed above…” is immediately handicapped. The reference to prior context cannot be resolved. The chunk must make sense as a standalone unit — with its own entity anchors, self-contained answer, and coherent structure.
Stage 3: Extraction
From the retrieved chunks, the system identifies the most relevant and usable content. This extraction phase is where your writing structure directly influences which information is selected. Systems preferentially extract:
- Direct, declarative statements (e.g., “Retrieval probability is the likelihood that a content block is selected during the retrieval phase of a generative pipeline”)
- Clearly bounded facts, definitions, and data points
- Self-contained explanations that do not require surrounding context
- Structured formats: lists, tables, comparison blocks, step sequences
Content that is difficult to extract — dense narrative prose, long paragraphs mixing multiple ideas, answers buried in context — is less likely to be selected even when the chunk is retrieved.
Stage 4: Compression and Synthesis
Extracted content is then compressed. The language model synthesises a response by combining information from multiple retrieved chunks, applying its own knowledge, and generating coherent output. In this compression step, nuance is often lost. Hedges, qualifications, and supporting context may be discarded. What survives is the core claim — the most extractable, most unambiguous statement your content contains.
This is what compression resistance measures: how well your content retains its core meaning when the surrounding detail is stripped away. Content with high compression resistance survives synthesis intact. Content with low compression resistance is paraphrased, distorted, or misattributed.
Stage 5: Citation and Output
The generated response may or may not attribute its sources. In systems like Perplexity and Microsoft Copilot, citations are explicit. In Google AI Overviews, sources are listed beneath the generated text. In ChatGPT without browsing, no citation occurs at all — content from training data is used without attribution.
Citation behaviour varies by platform and is not entirely within your control. However, the likelihood of being cited increases when your content is: (a) clearly associated with a named entity (author, brand, organisation), (b) reinforced across multiple indexed pages, and (c) published on domains with established authority signals.
| Pipeline Stage | What Happens | Your Optimisation Lever |
|---|---|---|
| Query Processing | Query is interpreted semantically; sub-queries generated | Semantic topical coverage; entity alignment |
| Retrieval | Chunks retrieved by vector similarity | Semantic alignment; structural clarity; entity density |
| Extraction | Most usable content identified within chunks | Declarative structure; section independence; format |
| Compression | Content compressed into synthesised response | Compression resistance; unambiguous claims; definitions |
| Citation | Sources attributed (varies by platform) | Entity authority; domain trust; cross-source reinforcement |
The RAG Architecture in Practice
Retrieval-Augmented Generation is not exclusively a search-engine technology. Enterprise AI tools, customer-facing chatbots, and research assistants commonly use RAG to ground their responses in controlled knowledge bases. If your content is indexed by any RAG-powered system — which increasingly means anything AI uses to search the web — the same principles apply.
The implication for practitioners operating in regulated or competitive industries is significant: your content may be retrieved and cited in internal business AI tools, competitive intelligence platforms, and sector-specific assistants, not just in consumer search. Engineering content for retrieval therefore has scope beyond organic search traffic.
Generative search pipelines operate in five stages: query processing, retrieval, extraction, compression, and citation output. Optimisation applies at each stage. The most controllable levers are structural: chunk independence, declarative writing, entity clarity, and compression-resistant phrasing. Understanding the pipeline lets you direct effort to the highest-leverage points.
Inclusion Is the New Visibility
The shift from ranking to inclusion is not merely semantic. It has direct consequences for how you define success, how you allocate optimisation effort, and how you account for AI-driven search in commercial performance models.
Ranking vs. Citation: Two Different Games
Ranking and citation are related but distinct outcomes. A page can rank well without being cited in AI-generated responses. A page can be cited in AI responses without ranking within the top results. Both outcomes have value, but they are no longer equivalent.
In a traditional SERP, ranking first typically captured a disproportionate share of clicks — click-through rates of 28–35% for position one were commonly observed in pre-AI search environments. Positions two and three received diminishing but still meaningful shares. Below position five, traffic became marginal for most queries.
In a generative SERP, this model breaks. When an AI Overview answers the query above the organic results, click-through rates collapse across all positions. Studies from 2024–2025 have recorded CTR reductions of 50–80% for informational and navigational queries where AI Overviews appear. The traffic to position one is no longer primarily determined by whether you rank first — it is determined by whether the AI Overview answers the query well enough that users do not click at all.
Zero-Click Risk and Commercial Exposure
The commercial risk of zero-click search is not evenly distributed. It concentrates in specific query categories:
- Informational queries — definitions, explanations, how-to content, fact-retrieval — are heavily affected. If you operate an educational or reference site, or if your conversion funnel depends on informational content driving awareness, this risk is acute.
- Navigational queries — brand name searches, product category searches — are partially affected. AI Overviews are less likely to appear for pure navigational intent, but increasingly appear for brand + category queries.
- Transactional and comparison queries — “best X,” “X vs Y,” “buy X” — are lower risk in the short term, but are not immune. Google’s AI shopping summaries represent expansion into this category.
For commercial sites, an honest assessment of zero-click exposure requires segmenting your existing organic traffic by query type and estimating the proportion of queries in each category that currently trigger or are likely to trigger AI Overviews. Chapter 11 covers this commercial risk modelling in detail.
Being cited in an AI Overview is not commercially valueless even when the user does not click. It reinforces brand recognition, establishes topical authority, and creates a named presence in the answer that influences how users evaluate options downstream — even in separate sessions. This is brand lift at zero marginal cost per impression.
Brand Lift from Inclusion
When your entity — your brand name, your product name, or your organisation — appears in a generated answer, it enters the user’s mental model for that topic. Research on AI search behaviour is early but consistent with established research on search result framing: appearing in the answer, even without a click, elevates brand credibility and familiarity.
For businesses competing in categories where trust is a primary purchase driver — professional services, healthcare, financial products, B2B software — this brand lift from AI inclusion may be more commercially significant than the click itself. A prospect who sees your brand cited in an AI answer when researching a category problem may be more receptive to a paid search ad, a direct search, or a referral encounter days later.
Measuring this indirect value is difficult. But dismissing it because it is difficult to measure leads to systematic underinvestment in GEO. The commercial model for AI-era search must account for both direct (click, conversion) and indirect (brand exposure, authority signal, messaging control) outcomes.
Messaging Control in Synthesis
When AI systems synthesise answers, they do not faithfully reproduce your text. They compress, paraphrase, and reinterpret. If your content is not structured to resist this compression — if your key messages are buried in narrative, qualified by hedges, or diluted by tangential material — the synthesised representation of your content may not reflect your intended positioning.
This is a commercial concern. If an AI Overview handles a comparison query in your category and your product is described by a paraphrased version of a section on your features page — a paraphrase that loses your differentiating claim — you have lost messaging control at a critical awareness moment. Engineering your content for compression resistance is therefore not just a technical discipline; it is a commercial one.
| Outcome Type | Traditional SEO Value | GEO Value |
|---|---|---|
| Rank #1, no click | Low (wasted position) | Medium (still in view, but no traffic) |
| Rank #1, click | High | High (clicks remain where AI doesn’t appear) |
| Cited in AI Overview, no click | N/A | Medium–High (brand lift, messaging presence) |
| Cited in AI Overview, click through | N/A | Very High (qualified intent, trust pre-established) |
| Not ranked, not cited | Zero | Zero |
New KPIs for Generative Visibility
Adapting measurement to this environment requires moving beyond rank tracking and organic session counts. The KPIs that matter in a generative search context include:
- Inclusion rate — The frequency with which your content appears in AI-generated responses across a monitored set of topic-relevant queries.
- Citation frequency — The number of times a specific page or entity is cited in AI responses across query variations.
- AI share of voice — Your brand or content’s proportional representation in AI responses for a defined topic cluster, relative to competitors.
- Messaging fidelity — The degree to which AI-generated answers about your product or content accurately reflect your intended positioning.
- Brand search lift — An indirect proxy: increases in branded search volume following periods of high AI inclusion suggest that AI exposure is driving downstream interest.
None of these metrics are available natively in Google Search Console or Google Analytics at the time of writing. Part IV covers the measurement methodologies available today, including manual prompt auditing, third-party platform tracking, and proxy modelling.
In generative search, inclusion — appearing in AI-generated answers — is the new visibility metric. Click-through rates collapse when AI Overviews appear, creating zero-click risk concentrated in informational and mixed-intent queries. However, AI inclusion delivers brand lift and messaging presence that has commercial value beyond the direct click. Optimisation strategy must account for both direct and indirect outcomes, and measurement systems must expand beyond traditional rank and session tracking.
The GEO Framework
A five-layer model for engineering generative visibility — from retrieval probability through entity gravity to structural authority and system memory.
The GEO Stack
Generative Engine Optimisation is not a single technique. It is an architecture — a layered system of signal types that together determine how consistently your content is retrieved, extracted, and cited in generative search environments. To work on this systematically, you need a framework that organises the relevant variables by layer, so you can identify where problems originate and prioritise fixes accordingly.
The GEO Stack is a five-layer model. Each layer addresses a distinct aspect of generative visibility, and each layer has dependencies on the one below it. You cannot optimise entity reinforcement effectively if structural clarity is missing. You cannot build system memory if entity reinforcement is inconsistent. The layers are not independent treatments — they are a coherent signal architecture.
Layer 1 — Retrieval
The foundation layer. Before any extraction or synthesis can occur, your content must be retrieved. Retrieval is the stage at which vector search selects candidate chunks for inclusion in the generation process. Content that is not retrieved cannot be cited, regardless of how well-written or authoritative it may be.
Retrieval probability is determined by the semantic alignment between your content and the query being processed. The closer the meaning of your content to the meaning of the query — as represented in the embedding space — the more likely your chunk is to be retrieved.
Primary optimisation levers at Layer 1:
- Query-aligned language — write in the vocabulary of the questions your audience asks
- Topical depth — cover the subject domain comprehensively enough to generate semantic density
- Answer-first structure — lead sections with the direct answer to the implicit question they address
- Entity presence — include explicit named entities relevant to the query domain
Layer 1 · Retrieval
Whether your content chunks are selected during the vector retrieval phase. The prerequisite for all other layers.
Layer 2 — Extractability
The second layer. Once retrieved, your content must be extractable: it must contain sections that the AI system can parse, isolate, and use cleanly. Extractability is about the internal architecture of your content — how sections are structured, how self-contained they are, how unambiguously they communicate their core claim.
This is where most traditional long-form content fails in generative environments. Dense narrative prose, long paragraphs mixing multiple ideas, answers qualified beyond recognition, and heavy reliance on contextual pronouns (“it,” “this,” “they”) all reduce extractability. The section may be retrieved — it may even rank well — but its internal structure prevents the AI system from pulling a clean, usable fragment.
Primary optimisation levers at Layer 2:
- Declarative opening sentences that function as standalone answers
- Paragraphs under 100–120 words with one primary idea each
- Explicit entity naming on first mention (no dangling pronouns)
- Structured formats: lists, tables, numbered steps for discrete concepts
- Compression resistance — core meaning survives one-sentence summary
Layer 2 · Extractability
Whether retrieved sections can be parsed and used cleanly — without requiring surrounding context to make sense.
Layer 3 — Entity Reinforcement
The third layer. Generative systems construct knowledge through entity associations — named people, organisations, concepts, products, and locations that appear consistently across documents. When your content repeatedly and consistently associates your brand, product, or key concepts with specific entities, it builds what we call entity gravity: the semantic pull that causes retrieval systems to associate your content with those entities.
Entity reinforcement is not keyword stuffing. It is the disciplined use of canonical names, the consistent co-occurrence of related concepts, and the structural reinforcement of entity relationships across pages. A page that uses your brand name once, a product name twice, and a category term three different ways has low entity gravity. A well-engineered content cluster that consistently uses canonical entity names and reinforces associations across multiple pages builds measurably stronger retrieval positioning.
Primary optimisation levers at Layer 3:
- Canonical entity naming — choose one consistent form for each entity and use it throughout
- Entity repetition — anchor key entities every 150–200 words in extended sections
- Co-occurrence patterns — consistently associate entities that belong together in your topic domain
- Entity-rich anchor text — internal links carry entity names, not generic text like “click here”
Layer 3 · Entity Reinforcement
The consistent, canonical use of named entities that builds semantic association in retrieval systems.
Layer 4 — Structural Authority
The fourth layer. Structural authority is the coherence signal that emerges from well-designed information architecture: the way pages relate to each other, how topical clusters are organised, and whether the internal linking graph reflects a coherent knowledge structure. In a generative environment, this signal is interpreted as evidence that a site’s coverage of a topic is authoritative rather than accidental.
Structural authority is not domain authority in the traditional link-based sense. It is the internal clarity of your content system — whether a retrieval system encountering multiple pages from your domain finds consistent, reinforcing, non-contradictory information organised around a clear topical structure.
Primary optimisation levers at Layer 4:
- Hub-and-spoke cluster architecture — pillar pages linked to supporting detail pages
- Clear topical boundaries — each page addresses a defined scope, not overlapping or redundant
- No orphan nodes — every substantive page is linked from within its cluster
- Bidirectional linking — spoke pages link back to the hub; hubs acknowledge spokes
Layer 4 · Structural Authority
The coherence signal from internal architecture — clusters, linking patterns, and topical organisation.
Layer 5 — System Memory
The fifth layer — and the most difficult to engineer deliberately. System memory refers to the persistent pattern of entity and topic associations that accumulates across a content system over time. It is the signal that generative systems use to build a stable mental model of what a site is about, what entities it is authoritative for, and what topics it consistently covers.
System memory is built through the cumulative effect of the four layers below it. If retrieval, extractability, entity reinforcement, and structural authority are consistently maintained across a site and over time, the system’s model of that site’s topical authority becomes more stable and more strongly associated with the relevant entity clusters. Conversely, inconsistent entity usage, structural fragmentation, or sudden topical pivots degrade system memory.
Primary optimisation levers at Layer 5:
- Consistent entity usage across the entire site — no contradiction between pages
- Cross-page topic reinforcement — related concepts recur across different pages in the cluster
- Publishing consistency — regular content builds temporal density; gaps create signal interruptions
- Bidirectional cluster links — every page contributes to and receives from the cluster’s entity signal
Layer 5 · System Memory
The persistent, cumulative entity and topical associations that establish a site’s generative authority over time.
How the Layers Interact
The GEO Stack is sequential from the bottom up: a deficiency in a lower layer limits the performance of any layer above it. If retrieval fails (Layer 1), no amount of extractability engineering (Layer 2) matters — the content is never reached. If extractability is poor (Layer 2), strong entity reinforcement (Layer 3) cannot compensate — the system retrieves the content but cannot extract usable material from it.
When auditing a content system, start at Layer 1 and work upward. This sequence prevents the common mistake of spending effort on advanced entity strategies while basic retrieval conditions are unmet.
Scoring Weights
When scoring content against the GEO Stack, each layer carries a different weight reflecting its relative impact on generative visibility. The weights used in the AI Visibility OS scoring engine are:
| Layer | Weight | Rationale |
|---|---|---|
| Retrieval Probability | 20% | Foundation — content must be retrieved before anything else applies |
| Extractability | 25% | Highest weight — the primary differentiator in generative environments |
| Entity Reinforcement | 20% | Controls representation accuracy and brand association |
| Structural Authority | 15% | Cluster coherence signal; slower to build, persistent when established |
| System Memory | 10% | Cumulative effect of all layers over time; difficult to engineer directly |
Technical Health is not weighted — it functions as a gate. If a page fails basic infrastructure checks (missing title, noindex, broken canonical), the overall GEO score is capped at 40 regardless of content quality. Fix technical issues before investing in content optimisation.
The GEO Stack provides a five-layer framework for engineering generative visibility: Retrieval (is your content found?), Extractability (can it be parsed cleanly?), Entity Reinforcement (does it build semantic associations?), Structural Authority (does your architecture signal coherence?), and System Memory (has your content built durable topical authority?). Audit and optimise sequentially from Layer 1 upward.
Retrieval Probability
Retrieval probability is a conceptual variable — not a metric you can read from a dashboard. It describes the likelihood that a specific content block will be selected by a generative system’s retrieval phase when a particular query is processed. Understanding it as a variable, even without a direct measurement, gives practitioners a useful lens for prioritising content decisions.
The Conceptual Model
We can represent retrieval probability as a function of several interacting variables:
This is not a formula you can compute precisely — the weights of these variables are internal to each system’s retrieval model and differ across platforms. But you can use it as a diagnostic framework: for any content block you are concerned about, you can assess each variable qualitatively and identify which is most likely limiting retrieval probability.
Variable 1: Semantic Alignment
Semantic alignment measures how closely the meaning of your content chunk matches the semantic representation of the query. It is evaluated not by keyword overlap but by vector distance in the embedding space — a mathematical measure of conceptual proximity.
For practical purposes, this means your content must be written in the conceptual vocabulary of your target queries. If users asking about “AI search visibility” use phrases like “generative retrieval,” “AI Overviews,” “LLM citation,” and “AI-driven search,” your content must cover those concepts — using those terms or semantically equivalent ones — to achieve high alignment scores.
Semantic alignment can be improved by: writing in the language your audience uses for the topic; covering related concepts that a well-informed reader would expect to find; using definitions that anchor the conceptual territory of the section; and avoiding abstract or idiosyncratic terminology that diverges from established usage.
Variable 2: Entity Match Strength
When a query explicitly or implicitly references a named entity — a brand, a concept, a product, a methodology — retrieval systems score candidate chunks higher when those entities appear prominently and consistently within them. A chunk that mentions your primary entity by its canonical name in the first sentence, reinforces it in subsequent sentences, and associates it with related entities in the topic domain scores higher on entity match than a chunk where the entity appears once, buried in a subordinate clause, referenced later by pronoun.
Entity match strength is directly improvable through the extractability and entity reinforcement techniques covered in Chapters 6 and 7.
Variable 3: Structural Clarity
Structural clarity measures how well-organised and internally coherent a content chunk is. A chunk with a clear topic sentence, a focused body, and a self-contained conclusion scores higher on structural clarity than a chunk that begins mid-thought, discusses two or three unrelated ideas, and ends without resolution.
Structural clarity is primarily a function of writing discipline: one idea per paragraph, declarative opening sentences, explicit topic sentences at section heads, and logical information sequencing within each unit.
Variable 4: Topical Isolation
Topical isolation reflects whether a given section is focused on a single, clearly bounded subject. Sections that mix tangentially related topics — discussing both the definition of a concept and its historical origins and its technical implementation and its business implications in a single block — are harder for retrieval systems to match to specific query intents, because no single query carries all those dimensions simultaneously.
Improving topical isolation means breaking multi-topic sections apart: separate definition blocks from implementation guidance; separate benefits discussion from technical specifications; separate comparison content from advocacy content. Each section should be the best possible answer to a single, specific question.
Variable 5: Contextual Reinforcement
Contextual reinforcement is the cumulative effect of other pages in your site reinforcing the entities and topics of any given chunk. If a key term appears on one page with high semantic alignment, its retrieval probability for queries related to that term is somewhat lower than if the same term and related entities are reinforced across five or ten pages in a coherent cluster.
This is why internal linking and topical clustering matter even for retrieval probability at the individual chunk level. The system’s confidence that your content is authoritative for a particular entity cluster is informed by the density of reinforcing signals across your site — not just by the quality of any single page.
Proxies and Scoring Approaches
Since retrieval probability cannot be measured directly, practitioners rely on proxy indicators:
| Proxy Metric | What It Indicates | How to Measure |
|---|---|---|
| AI Overview inclusion rate | Retrieval + extraction success for specific queries | Manual prompt testing; GSC AI Overviews filter |
| Perplexity citation frequency | Retrieval success across a query set | Systematic prompt auditing across topic queries |
| Featured snippet wins | Structural extractability for traditional systems | GSC; SERP monitoring tools |
| GEO Content Score | Composite estimate of section-level GEO quality | Audit checklist (Appendix A); emerging tools |
| Embedding similarity | Semantic alignment of content to target queries | Embedding model APIs (OpenAI, Cohere) – technical |
Limitations of the Model
The retrieval probability framework is a heuristic, not an algorithmic model. Its limitations are significant and should be acknowledged:
- Platform variation — Different systems (Google, Perplexity, ChatGPT, Gemini) implement retrieval differently. Optimising for one does not guarantee results in another.
- Non-determinism — Generative systems produce variable outputs. The same chunk may be retrieved for one query iteration and not for another identical one. Testing requires multiple iterations.
- Black-box weighting — The relative weights of each variable are unknown and change as models are updated. What works today may need adjustment after a model release.
- Domain and authority effects — High-authority domains benefit from retrieval advantages that cannot be fully compensated by content structure alone. New domains face inherent headwinds regardless of content quality.
Retrieval probability is a conceptual variable describing how likely a content chunk is to be selected during generative pipeline retrieval. It is influenced by semantic alignment, entity match strength, structural clarity, topical isolation, and contextual reinforcement. While not directly measurable, it can be estimated through proxy indicators — AI inclusion testing, citation frequency audits, and composite GEO content scoring — and improved through the structural techniques covered in Chapters 6 and 7.
Extractability Engineering
Extractability is the quality that determines whether an AI system can take a section of your content, isolate it from its surrounding context, and use it cleanly in a generated response. It is the most directly improvable variable in the GEO Stack — and the one where most existing content has the most room for immediate gain.
The challenge is that content optimised for human readers is often anti-extractable. Narrative prose creates context dependencies that break when sections are isolated. Elegant writing uses pronouns and references that assume shared reading history. Long introductions defer the actual answer until paragraph three or four. These are virtues for a human reading linearly — and liabilities for a machine retrieving non-linearly.
The Core Principles
1. Answer First
Every section, every paragraph that makes a substantive claim, should open with its answer. The first sentence of a section should state the main point of that section — declaratively and unambiguously. Supporting evidence, context, and qualification follow. This is the single most impactful structural change practitioners can make to existing content.
Conventional writing often builds to an answer — presenting the problem, then the context, then the analysis, then the conclusion. AI extraction reverses this preference. The system retrieves and extracts from the opening of a chunk, where the highest-signal content should be found. An answer buried in sentence five of a six-sentence paragraph is frequently not extracted at all.
2. Section Independence
Every section must be coherent when read in isolation. This means: no opening references to previously discussed material (“As we noted in the previous section…”); no pronoun anchors that require prior context to resolve (“This approach…”); no implicit assumptions about what the reader already knows from earlier sections of the same page.
The section independence test is simple: copy a section into a blank document and read it cold. If it makes sense without context, it passes. If it requires prior reading to understand, it needs rewriting.
3. Compression Resistance
AI systems compress content when generating responses. A three-hundred-word section may become a two-sentence summary. The question is: does that two-sentence summary retain the core meaning of the original? If it does, the content has high compression resistance. If the summary loses the key claim, distorts the evidence, or generalises away a critical nuance, the content has low compression resistance.
Compression resistance is achieved through three practices: leading strongly with the core claim; keeping that claim as unambiguous and concrete as possible; and separating your core claim from supporting inference, which is more likely to be compressed away.
4. Explicit Entity Anchoring
Every extracted chunk must introduce its key entities by name, without relying on context from surrounding sections to establish who or what is being discussed. “It improves performance” is not extractable. “The GEO Stack’s Entity Reinforcement layer improves retrieval performance by strengthening semantic associations” is extractable. The difference is explicit entity naming within the chunk itself.
5. Format as Signal
Structured formats — numbered lists, bullet points, definition blocks, comparison tables — are preferentially extracted because they provide syntactic boundaries that help the system identify discrete, usable units. A three-step process in numbered list format is more extractable than the same three steps written as a flowing paragraph. The format signals to the retrieval and extraction system that what follows is a structured, divisible unit of information.
Rewrite Patterns
Below are before-and-after examples demonstrating extractability engineering principles:
Pattern 1: Answer-First Rewrite
There has been a lot of discussion in the SEO community about how generative AI is changing search. Many experts have weighed in on the topic, and while opinions differ, most agree that the changes are significant. When we look at what this means for content strategy, the implications become clear: structure matters more than it ever has.
Content structure matters more in generative search than in traditional SEO. Generative systems retrieve individual sections rather than whole pages, making section-level clarity the primary determinant of whether content is extracted and cited. Narrative style that builds to a conclusion is typically anti-extractable: the answer arrives too late for effective chunk retrieval.
Pattern 2: Entity Anchoring Rewrite
The process works well in practice. It helps teams identify where their content is falling short and provides a clear path to improvement. When implemented correctly, it can significantly increase the frequency of citations.
The GEO Audit Worksheet helps content teams identify structural deficiencies in their pages and provides a scored path to improvement. When implemented consistently across a content cluster, the GEO Audit process increases AI citation frequency by improving extractability and entity reinforcement at the section level.
Pattern 3: Format Restructure
To improve your content’s extractability, you should start by checking whether each section can stand alone without needing context from the rest of the page. You also want to make sure you’re using bullet points or lists for anything that’s a set of discrete items, and you want to check that you’re naming your entities explicitly and not using vague pronouns.
To improve content extractability, apply three structural checks to each section:
- Section independence test — Read the section in isolation. It should make complete sense without prior context.
- Format check — Discrete concepts (steps, features, options) should be listed or tabled, not embedded in paragraph prose.
- Entity anchor check — Every key entity should be named explicitly within the section, not referred to by pronoun.
Diagnostic Checklist
Use this checklist when reviewing sections for extractability. Audit one section at a time:
- Does the section open with a direct answer or definition?
- Are all entities named explicitly (no dangling pronouns)?
- Does the section make sense when read in isolation?
- Does the core meaning survive a one-sentence summary?
- Are paragraphs under 100–120 words with one main idea each?
- Are discrete concepts presented as lists or tables rather than narrative?
- Is the answer in the first two sentences, not buried mid-section?
- Are formatting conventions (headings, bullets) consistent and logical?
Compression Simulation
Compression resistance can be tested deterministically — without needing an LLM — by scoring each section against four measurable dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Compression Retention | 40% | Whether key sentences (first, last, entity-bearing) survive extractive compression. Scored by extracting the top 30% of sentences by position and keyword density. |
| Declarative Opening | 25% | Whether the section’s first sentence is a standalone declarative statement (answer-first) versus a contextual or narrative opening. |
| Entity Explicitness | 20% | Whether named entities are present in the compressed output. Sections using pronouns instead of entity names score lower. |
| Standalone Coherence | 15% | Whether the compressed output makes sense in isolation, without the surrounding page context. |
The simulation uses deterministic sentence extraction — not LLM summarisation. It selects sentences based on position (first, last), entity density, and keyword overlap with the section heading. The compressed output represents approximately what an AI system would retain when synthesising the section into a response. If your core claim, primary entity, and key evidence don’t appear in the compressed form, the section needs restructuring.
The section composite score (the average of all per-section compression scores) feeds into the Extractability layer at 30% weight. This means fixing weak sections in the section-level analysis directly improves the page’s overall Extractability score.
Extractability engineering is the practice of structuring content so AI systems can parse, isolate, and use individual sections cleanly. The five core principles are: answer-first structure, section independence, compression resistance, explicit entity anchoring, and format as signal. Rewriting existing content for extractability — using before/after pattern analysis and the diagnostic checklist — is the highest-leverage single intervention most practitioners can make to improve GEO performance immediately.
Entity Gravity & Semantic Reinforcement
Generative systems think in entities. They associate named concepts — brands, people, products, methodologies, locations — with clusters of related information. When a query references an entity, the system retrieves content that has strong associations with that entity. The strength of those associations — the degree to which your content is gravitationally connected to the entities in your domain — determines your retrieval presence for the queries that matter most to your business.
The Naming Problem
Entity gravity starts with canonical naming. A retrieval system cannot build a strong association with an entity that is referred to inconsistently. If your brand is sometimes “The GEO Lab,” sometimes “GEO Lab,” sometimes “the Lab,” and sometimes “our platform,” no single entity label accumulates the signal density needed for strong retrieval associations.
Choose one canonical form for each significant entity and use it consistently across all content. This applies to:
- Brand names — use the exact registered or established form
- Product names — use the full name on first mention in each section; abbreviations only after establishment
- Methodology names — introduce by full name with any abbreviation in parentheses, then use either form consistently
- Competitor references — use canonical forms; informal or abbreviatory forms reduce entity clarity
Repetition as Retrieval Signal
Counterintuitively, the repetition practices that traditional writing advice discourages — vary your terms, avoid saying the same thing twice, use pronouns to create flow — are often harmful to entity gravity in GEO contexts.
Variation and pronoun substitution obscure entity associations. When a retrieval system processes a chunk where the entity is named at the start and then referred to as “it,” “they,” “this approach,” or “the technique” for the rest of the paragraph, the entity signal in that chunk weakens beyond the first sentence.
For GEO purposes, repeat entity names more than human writing conventions would normally suggest. A practical rule: in any content block longer than 200 words, the primary entity should appear by name at least once every 150–200 words. Each appearance reinforces the entity-content association in the retrieval model.
Co-occurrence Patterns
Entities gain gravity partly through consistent co-occurrence with related entities. If your content consistently associates “Retrieval-Augmented Generation (RAG)” with “generative search,” “AI Overviews,” and “extractability,” the retrieval system builds a model in which all these entities form a coherent cluster — and your content becomes associated with the entire cluster, not just the individual terms.
Designing co-occurrence patterns means identifying the entity cluster that defines your topical domain and ensuring those entities appear together, consistently, across your content system. This is not about keyword co-occurrence in the narrow SEO sense — it is about semantic association between named concepts at the structural level of paragraphs and sections.
Internal Linking as Entity Reinforcement
Every internal link is an entity signal. When you link from one page to another using anchor text that contains a relevant entity name, you are reinforcing the association between the linking page, the destination page, and the shared entity. A cluster of pages that cross-link using consistent, entity-rich anchor text builds a node in the retrieval system’s entity graph — a cluster of associated content that collectively increases retrieval probability for the shared entity domain.
Compare these two internal link patterns:
| Low Entity Gravity | High Entity Gravity |
|---|---|
| “Click here to read more” | “Read: Retrieval Probability in GEO” |
| “See our related article” | “See: The GEO Stack Framework” |
| “Learn about this topic” | “Learn: Extractability Engineering principles” |
| “Our guide covers this” | “Our GEO Audit Worksheet covers this” |
Schema Markup as Entity Disambiguation
Structured data markup — particularly JSON-LD with Schema.org vocabulary — provides explicit,
machine-readable entity declarations that complement the semantic signals in your content. An
Organization schema that defines your brand, its domain, and its relationship to the topics
it covers gives retrieval systems an unambiguous anchor for entity association.
Key schema types for entity reinforcement:
- Organization / Person — establishes the canonical entity identity of the site or author
- Article with author markup — associates content with a named, credentialled entity
- DefinedTerm — explicitly marks up terminology definitions for machine comprehension
- FAQPage — provides structured Q&A pairs that are highly extractable by generative systems
- HowTo — marks up procedural content in a manner aligned with generative extraction patterns
Avoiding Entity Dilution
Entity gravity can be diluted by inconsistent practices. Common dilution patterns include:
- Using multiple terms for the same concept across different pages (synonym drift)
- Over-optimising pages to cover too many entity clusters simultaneously (topical sprawl)
- Changing terminology between content updates without updating linking anchor text
- Creating content that acknowledges but does not clearly associate entities with your brand or product
Entity gravity — the semantic pull that associates your content with the entities that matter in your domain — is built through four practices: canonical naming (one consistent form per entity), strategic repetition (entity names recur throughout sections, not just at introduction), co-occurrence design (related entities appear together consistently), and entity-rich internal linking (anchor text carries entity names, not generic text). Schema markup reinforces these signals at the structural data layer. Avoid entity dilution through inconsistent naming or topical sprawl.
Operational Implementation
Translating the GEO framework into practice: page design patterns, internal linking as knowledge graph architecture, structural auditing, and commercial strategy for generative SERPs.
Designing Sections for Retrieval
Section design is where the abstract principles of GEO become concrete editorial decisions. Every heading, opening sentence, and paragraph structure you choose either increases or decreases the retrievability of that section by generative systems. This chapter translates the core principles of extractability and entity reinforcement into repeatable design patterns that can be applied during content creation and during editorial review of existing content.
The Answer-First Template
The single most impactful structural pattern for GEO is the answer-first section. In traditional editorial writing, sections often build to their main point — introducing context, developing the argument, and arriving at the conclusion. In GEO-optimised content, the main point leads. Supporting context and evidence follow.
A well-designed answer-first section follows this sequence:
- Declarative answer sentence (1–2 sentences) — States the core claim directly. Contains the primary entity and the key fact or relationship. Self-contained enough to function as a standalone quote.
- Mechanism or explanation (2–4 sentences) — Explains how or why the claim is true. Introduces secondary entities and provides the logical structure.
- Evidence or example (optional, 1–3 sentences) — Grounds the claim in data, example, or observed pattern. Increases citation-worthiness.
- Implication or application (1–2 sentences) — Returns to the practical meaning of the claim. This is what a reader (and a generative system) would most likely paraphrase for use.
Declarative: Extractability is the primary determinant of whether retrieved content
is used in a generative response.
Mechanism: Generative systems retrieve candidate chunks by semantic similarity,
then apply an extraction layer that selects the most parseable and self-contained passages. Chunks
with low extractability — dense prose, implicit context, buried answers — may be retrieved but not
extracted.
Evidence: Internal testing across 48 page rewrites showed that answer-first
restructuring increased AI citation frequency by an average of 34% across monitored query sets.
Implication: For practitioners, this means structural rewriting — not content
creation — is typically the highest-leverage first intervention.
Definition Blocks
Definition blocks are among the most extractable content formats in generative environments. They provide a clear, parseable structure — term, definition, context — that retrieval systems can extract cleanly and cite directly. Generative systems routinely pull definitions verbatim or near-verbatim from pages that define concepts clearly in their opening sentences.
A well-designed definition block:
- Opens with the term being defined, in its canonical form
- Provides a declarative, precise definition in one to two sentences
- Follows with a brief explanation of practical significance or distinguishing characteristics
- Avoids depending on prior context to be understood
Example: “Retrieval probability is the estimated likelihood that a specific content chunk is selected during the vector retrieval phase of a generative search pipeline. It is determined by the semantic alignment between the chunk and the query, the density of relevant entities in the chunk, and the structural clarity of the passage.”
Comparison Tables
Comparison tables are high-value GEO assets. They provide structured, discrete comparative data that generative systems can extract and use to answer comparison queries — one of the most common query types in commercial and research contexts. A well-structured table with clear column headers, entity-named rows, and factual cell content is often extracted precisely as written into generated responses.
Design principles for extractable comparison tables:
- Use entity names as row or column headers, not generic labels
- Make each cell self-sufficient — the value should be readable without surrounding narrative
- Include units, dates, and sources where relevant
- Position the table near the section’s opening answer-sentence, not buried at the bottom
- Add a brief introductory sentence before the table that states what it compares and why it matters
List Structures
Bulleted and numbered lists are among the most retrievable and extractable content structures. They provide clear syntactic boundaries between discrete items, making it easy for retrieval systems to identify and extract individual list items or the complete list as a structured unit.
For maximum extractability, lists should:
- Begin each item with an entity or active verb, not a connecting word (“and,” “also,” “but”)
- Use parallel grammatical structure across all items
- Be preceded by a sentence that explicitly names the list’s purpose or category
- Limit list items to 7–10 maximum; split longer lists into categorised sublists
- Where items have explanatory sub-content, use a bold lead term followed by explanation
FAQ Sections
FAQ content is structurally similar to what generative retrieval systems are optimised to answer. A question-and-answer format — where each question maps to a common user query and each answer is a self-contained, declarative response — provides high retrieval probability and high extractability simultaneously.
For GEO-optimised FAQ sections:
- Write question text in the vocabulary users actually use (conversational, question-phrased)
- Each answer should open with a direct statement — not “This depends on…” or “There are many factors…”
- Answers should be 40–120 words: complete enough to be informative, short enough to survive extraction
- Apply FAQPage schema markup to enable structured data extraction
- Include entity names in both questions and answers — do not rely on the surrounding page context
Section Opening Inventory
A practical audit technique: scan the opening sentences of every H2 and H3 section on a priority page. For each section, the opening sentence should contain:
| Element | Check | If Missing |
|---|---|---|
| A named entity (brand, concept, product) | □ Present | Add explicit entity name |
| A declarative main claim | □ Present | Rewrite to answer-first |
| Self-containment (no pronoun-only reference) | □ Present | Replace pronouns with entity names |
| One primary idea per sentence | □ Present | Split compound sentences |
Section design for retrieval relies on four primary patterns: answer-first templates (leading with the core claim), definition blocks (clear term-definition-context structure), comparison tables (entity-named, self-contained cells), and FAQ sections (declarative answers to conversational questions). Each pattern is extractable by design — providing clean, parseable content units that generative systems can use without surrounding context. Apply these patterns during content creation and as the primary intervention during content rewrites.
Internal Linking as Knowledge Graph Design
Internal linking is conventionally understood as a mechanism for distributing PageRank within a site and for guiding users through content. In a generative search environment, its function is more important: it is the primary tool for designing the entity graph that retrieval systems use to model your site’s topical authority.
When a retrieval system encounters multiple pages from your domain, each reinforcing the same entity cluster through consistent entity naming and cross-page linking, it constructs a model of that domain as authoritative for those entities. This model — a weighted graph of entities and their associations, as inferred from your content structure — directly influences retrieval probability across your entire content system, not just individual pages.
The Hub-and-Spoke Principle
The most effective internal linking architecture for GEO purposes is the hub-and-spoke cluster model. Each topical cluster has a hub — a comprehensive pillar page that defines the topic, introduces the primary entities, and links out to supporting detail pages. Each spoke page addresses a specific sub-topic or entity within the cluster and links back to the hub.
This architecture serves two GEO functions simultaneously:
- Entity reinforcement — The consistent use of entity names in anchor text across hub-spoke links reinforces entity associations in the retrieval model.
- Structural authority signal — A well-formed cluster signals that the domain’s coverage of the topic is comprehensive, organised, and internally consistent — a proxy for domain authority in the generative context.
Anchor Text as Entity Signal
Every internal link carries an entity signal through its anchor text. The text you use to link from one page to another tells the retrieval system what entity or concept connects those pages. Generic anchor text (“read more,” “click here,” “this article”) carries no entity signal. Entity-rich anchor text (“the GEO Audit Worksheet,” “Retrieval Probability in generative search,” “Extractability Engineering principles”) builds the entity graph with each link.
Audit every significant internal link on your priority pages. For each link, ask: does the anchor text name the entity or concept that the linked page is authoritative for? If not, update the anchor text — this is one of the lowest-effort, highest-leverage GEO interventions available.
Bidirectional Linking Patterns
A knowledge graph is bidirectional. If your pillar page on Generative Engine Optimisation links to your page on Extractability Engineering, that spoke page should also link back to the pillar. This bidirectionality closes the graph loop and ensures that the entity association is reinforced from both directions — strengthening both pages’ positions within the entity cluster.
Practically, this means:
- Every spoke page should link to its hub using anchor text that names the hub’s primary entity
- Every hub page should link to each of its spoke pages with descriptive, entity-specific anchor text
- Adjacent spoke pages (e.g., two pages covering related techniques within the same cluster) should cross-link where their entities overlap
Identifying and Remedying Orphan Nodes
An orphan node is a page that lacks incoming links from within its relevant cluster. Orphan pages are invisible to the knowledge graph: the retrieval system cannot determine their relationship to the cluster’s entity domain, because no linking signal connects them to the cluster’s hub or spokes.
Orphan identification is a standard audit step (covered in Chapter 10). Remediation requires identifying the cluster the orphan page belongs to and adding at least two to three incoming links from relevant pages within that cluster, using entity-appropriate anchor text.
Cross-Cluster Linking
Topics do not exist in isolation. Entities in one cluster overlap with entities in adjacent clusters — for a GEO-focused site, the entities “structured data,” “schema markup,” and “machine readability” connect both to the extractability cluster and to the technical SEO cluster. Cross-cluster links, where the connection is logically relevant, reinforce both clusters by creating entity bridges that increase the retrieval model’s understanding of how topics relate.
When reviewing internal linking for a content cluster, apply these questions to each page:
- Does this page link to its hub page (or pillar) using the hub’s primary entity name?
- Does the hub link back to this page with anchor text that names this page’s primary entity?
- Do any adjacent spoke pages link to this page where their entities overlap?
- Are there any pages in this domain that should link here but do not?
- Is this page accessible within two clicks from the hub?
Internal linking in GEO is knowledge graph design — the deliberate construction of an entity graph that tells retrieval systems what your site is authoritative for. The hub-and-spoke cluster model is the most effective architecture: pillar pages define the entity domain, spoke pages reinforce specific entities, and bidirectional entity-rich anchor text closes the graph loop. Orphan pages are graph failures that must be remedied. Cross-cluster linking creates entity bridges that strengthen both clusters.
Structural Auditing Workflow
Theory becomes practice through systematic auditing. A structural GEO audit examines a content system — a site, a cluster, or a single page — through each layer of the GEO Stack, producing a prioritised action list that connects structural problems to measurable impact. This chapter describes a six-step audit workflow suitable for page-level and cluster-level analysis.
Step 1: Define Scope and Query Set
Before auditing structure, define what you are auditing for. A GEO audit without a target query set produces structural observations without commercial context. Specify:
- The page or cluster being audited
- The 10–20 queries this content should be retrieved for
- The commercial outcome associated with those queries (lead generation, product awareness, direct conversion)
- The primary AI platform(s) to optimise for (Google AI Overviews, Perplexity, ChatGPT — each has different retrieval characteristics)
Step 2: Retrieval Test (Layer 1 Check)
Run the target queries in the primary AI platforms. Document whether your content appears in generated responses. This is your baseline retrieval measurement. For each query:
- Record whether your content was cited (yes/no)
- If cited: note which specific section was quoted or paraphrased
- If cited: note whether your brand entity was explicitly named
- If not cited: note which competing source was used instead, and in brief, why
Run each query at least three times to account for generative variability. Document the aggregated results. This gives you an empirical baseline before any structural changes.
Step 3: Extractability Audit (Layer 2 Check)
Using the Extractability Checklist (Appendix A), audit each section of the target page. Score each section and identify low-scoring sections as priority rewrite candidates. Additionally apply the three isolation tests:
- Section independence test — Copy the section text into a blank document. Does it make sense without context? Note any contextual dependencies.
- Compression test — Write a one-sentence summary of the section. Does the summary retain the core claim? If not, the claim is too buried or too qualified.
- Answer location test — Identify which sentence contains the main answer. Is it in the first two sentences? If not, the section needs answer-first restructuring.
Step 4: Entity Gravity Audit (Layer 3 Check)
Review entity usage across the page and cluster:
| Entity Gravity Check | Finding | Priority |
|---|---|---|
| Canonical entity names used consistently? | □ Y / □ N | High if N |
| Primary entity named in section openings? | □ Y / □ N | High if N |
| Entity appears every ~200 words in long sections? | □ Y / □ N | Medium if N |
| Related entity co-occurrences present? | □ Y / □ N | Medium if N |
| Schema markup applied (Organization, Article, FAQ)? | □ Y / □ N | High if N |
| Synonym drift present (multiple forms of same entity)? | □ Y / □ N | High if Y |
Step 5: Structural Authority Audit (Layer 4 Check)
Map the cluster’s link architecture:
- List all pages in the cluster
- For each page, record: incoming links from cluster pages; outgoing links to cluster pages; anchor text used for each link
- Identify orphan pages (zero incoming links from cluster)
- Identify weak hub connections (hub page links to few spokes, or spokes link back with generic anchor text)
- Assess anchor text entity richness across the cluster
Step 6: Generate Priority Action List
Consolidate findings into a prioritised action list. Order items by: (1) impact on retrieval probability for the target query set; (2) implementation effort; (3) layer — lower-layer fixes (retrieval, extractability) before higher-layer optimisations.
| Finding | Layer | Action | Priority | Effort |
|---|---|---|---|---|
| Answer buried at paragraph 5 of main section | L2 | Rewrite to answer-first | High | Low |
| Brand name used in 3 different forms | L3 | Standardise canonical entity name | High | Low |
| No FAQ schema on FAQ section | L3 | Add FAQPage JSON-LD | High | Low |
| 3 orphan pages in cluster | L4 | Add hub-to-spoke and spoke-to-hub links | Medium | Low |
| Internal links use generic “read more” text | L3/L4 | Rewrite anchors with entity names | Medium | Low |
| Page lacks topical isolation — 4 themes mixed | L2 | Split into 4 focused sections or pages | Medium | High |
Recommended Toolset
The following tools support each audit step:
- Screaming Frog — crawl for internal link mapping, orphan detection, anchor text extraction
- Google Search Console — AI Overviews appearances (experimental filter, 2025–2026), impressions, query data
- Perplexity.ai — manual retrieval testing across topic queries; observe citation patterns
- Google’s Rich Results Test — validate structured data implementation
- Profound / Evertune — AI citation tracking dashboards (paid, enterprise-grade)
- AI Visibility OS (The GEO Lab Console) — open-source diagnostic tool scoring pages against all five GEO Stack layers, with section-level compression simulation, LLM query tracking across ChatGPT/Gemini/Perplexity, and attribution feedback loop. Free at github.com/arturseo-geo/GEO_OS
- Spreadsheet (Appendix B template) — manual GEO scoring across all five layers
A structural GEO audit progresses through six steps: define scope and target queries; run baseline retrieval tests in primary AI platforms; audit extractability section by section using the checklist; audit entity gravity for canonical usage and schema; map cluster link architecture for structural authority; and generate a prioritised action list ordered by impact and effort. This workflow can be applied to single pages or entire clusters, and produces actionable findings tied to commercial query targets.
Generative SERPs & Commercial Strategy
The commercial implications of generative search are not uniform. They depend on a site’s query mix, its revenue model, and the degree to which AI Overviews and generative answers are displacing clicks in its specific topic domain. A one-size-fits-all response — “AI is destroying organic traffic” or “nothing has really changed” — is not a strategy. Commercial GEO strategy begins with honest exposure assessment.
The Fragmentation of Generative SERPs
The generative search landscape is not a single system. It is a fragmented ecosystem of AI-mediated search experiences across multiple platforms, each with different characteristics:
- Google AI Overviews — Appears selectively for informational and mixed-intent queries in Google Search. High traffic impact when it appears; coverage variable by query category and geography.
- Perplexity — A standalone generative search engine favoured by technical and research-oriented users. Cites sources explicitly; favours recent, structured, authoritative content.
- ChatGPT with browsing — Retrieves current web content; favours structured, entity-dense content; attribution explicit but navigational behaviour less predictable.
- Microsoft Copilot — Integrated into Bing; follows similar RAG architecture; citations shown; strong entity-matching behaviour.
- Gemini (Google) — Increasingly integrated into Google Workspace and Search; similar retrieval characteristics to AI Overviews but expanding into conversational queries.
Your content’s behaviour across these platforms is not identical. Perplexity may cite your research-oriented content consistently while ChatGPT uses competitor sources. Optimise for the platform your target audience most commonly uses — and audit platform-specifically, not generically.
Commercial Exposure Mapping
To build a commercial GEO strategy, segment your organic query set by intent type and assess AI Overview prevalence in each segment:
| Query Segment | AI Overview Prevalence | CTR Impact | GEO Priority |
|---|---|---|---|
| Informational / definitional | Very High | Severe (−50–80%) | Immediate action |
| How-to / procedural | High | High (−30–60%) | High priority |
| Comparison / “best X” | Medium–High | Medium (−20–40%) | Medium priority |
| Navigational (brand name) | Low | Low (−0–15%) | Lower priority |
| Transactional / “buy X” | Low–Medium | Low–Medium | Monitor; rising |
| Local / “near me” | Low | Low | Lower priority |
Map your existing organic traffic by query segment using Google Search Console query data categorised by intent type. For each segment, estimate the proportion of queries where AI Overviews currently appear — use manual sampling if GSC AI Overview data is limited. The product of traffic × AI coverage × CTR impact gives you an estimated exposure figure in lost sessions.
The Prioritisation Framework
With exposure mapped, prioritise GEO interventions by combining exposure risk with content malleability:
- High exposure + existing strong authority — Immediate GEO optimisation. These pages are already retrieved and have strong signals. Structural improvement directly increases extraction quality and citation frequency.
- High exposure + weak structure — Highest priority for structural rewrite. The content is exposed to traffic loss but not yet capturing AI citations — double vulnerability.
- Medium exposure + moderate authority — Scheduled optimisation over 2–3 month horizon. Monitor AI Overview appearance trends; intervene as coverage expands.
- Low exposure + any structure — Monitor only. Do not divert resources from higher-priority work.
KPIs for Generative Commercial Strategy
Beyond the standard GEO metrics (inclusion rate, citation frequency), commercial GEO strategy requires KPIs that connect AI visibility to business outcomes:
- AI-attributed sessions — Traffic from AI platforms (tracked via referral source in GA4)
- Conversion rate of AI-referred traffic — Relative to traditional organic; AI-referred traffic is often more qualified
- Zero-click exposure value — Estimated impressions from AI citations × estimated brand lift rate
- Competitive AI share of voice — Your brand’s proportional appearance in AI responses for key topic queries vs. competitors
- Messaging fidelity score — Qualitative assessment of whether AI-generated descriptions of your product match your intended positioning
Revenue Risk Modelling
For commercial sites where organic search is a primary acquisition channel, modelling AI-driven CTR compression against historical traffic and conversion data provides a business case for GEO investment. A simplified model:
Worked Example: SaaS Informational Cluster
To make this concrete, consider a SaaS platform with a substantial informational content cluster targeting mid-funnel queries. Use the inputs below to run the model:
| Input | Value |
|---|---|
| Monthly sessions from informational queries | 12,000 |
| AI Overview coverage in this query segment | 65% |
| Estimated CTR compression when AI Overview appears | 60% |
| Average conversion rate (trial sign-up) | 2.5% |
| Average contract value (monthly) | $120 |
The $168,480 annual figure is the downside case without GEO intervention. It is not a prediction — it is a risk estimate. Present it as: “If AI Overview coverage reaches 65% in this segment and we do not improve our inclusion rate, we estimate $14,040 in monthly revenue at risk.” This framing converts an abstract SEO concern into a CFO-legible business question.
Note: if GEO work achieves even a 40% AI citation rate for this cluster, a meaningful portion of those 4,680 at-risk sessions may still reach your site via AI-driven brand awareness — the revenue ceiling is higher than the raw click model implies.
This model requires estimates for AI coverage and CTR compression — neither of which is available with precision from current tools. But the model’s value is not precision; it is directionality. It converts the abstract concern about AI search into a concrete, management-level business question: “How much revenue is at risk if we do not act?”
Commercial GEO strategy starts with exposure mapping: segmenting your query mix by intent type and assessing AI Overview prevalence in each segment. Prioritise interventions by combining exposure risk with content authority. Track commercial KPIs that connect AI visibility to revenue impact. Model zero-click risk as a business case for GEO investment, and monitor competitive AI share of voice to assess positioning relative to competitors across generative platforms.
Experiments, Measurement & Tooling
How to run your own GEO experiments, model retrieval factors, attribute AI-sourced traffic, and interpret the signals that define the future of generative visibility.
Public Experiments in Extractability
GEO is a practitioner’s discipline. Without experiments, it is commentary. This chapter documents the first in a series of public extractability experiments conducted by The GEO Lab — experiments designed to produce observable, documentable evidence of how content structure affects retrieval and extraction in generative search environments.
These experiments are designed to be reproducible. Every methodology described here can be applied to your own content. Every protocol is deliberately simple enough to execute without proprietary tools. The goal is not laboratory precision — that level of control over a black-box AI system is not possible. The goal is structured observation that generates useful signal.
Experiment Design Principles
Conducting your own extractability experiments requires discipline around a small number of design principles. Without these, your results will be anecdotal rather than useful:
- Change one variable — If you change both structure and entity naming simultaneously, you cannot attribute the result to either. Isolate variables.
- Repeat queries — Generative systems are non-deterministic. A single query-run result is noise. Run each query at least five times; ten is better. Report aggregates, not individual results.
- Use a diverse query set — Single queries create survivorship bias. Cover at least three intent variants (definitional, explanatory, application) across the target topic to get a stable pattern.
- Document contemporaneously — Record exact query text, exact output, and exact date/time. AI system behaviour changes with model updates; a result documented in March 2026 may not reproduce in September 2026 after a model change.
- Be honest about negative results — If your intervention did not produce measurable improvement, document that. Negative results are as informative as positive ones — and they prevent wasted effort on non-effective techniques.
Further Experiment Directions
Experiment #001 addresses one variable in one context. The extractability experiment programme at The GEO Lab continues with additional controlled tests across the following dimensions, to be published as results are available:
| Experiment | Variable | Hypothesis | Status |
|---|---|---|---|
| #002 — Entity Density | Entity name frequency per 200 words | Higher density increases citation frequency up to a saturation point | In progress |
| #003 — FAQ Schema | FAQPage JSON-LD vs. no schema | Schema markup increases AI extraction of Q&A formatted content | Queued |
| #004 — Section Length | 100 vs. 200 vs. 400 word sections | Shorter, focused sections have higher extraction rates than longer ones | Queued |
| #005 — Internal Link Density | Cluster size (2 vs. 5 vs. 10 pages) | Larger clusters with consistent entity naming have higher per-page retrieval rates | Queued |
Public GEO experiments require a hypothesis-driven protocol with controlled variables, repeated query runs, and honest result reporting. Experiment #001 demonstrated that declarative (answer-first) structure significantly outperforms narrative structure across citation presence, citation position, and direct quote rate — with the strongest effects for definitional queries. Future experiments will extend these findings across entity density, schema markup, section length, and cluster architecture.
Modelling Retrieval Probability
Chapter 5 introduced retrieval probability as a conceptual variable. This chapter goes further: it describes how practitioners can build a working scoring model for retrieval probability — a heuristic instrument that produces consistent, comparable estimates across pages and sections, even in the absence of direct measurement.
A heuristic model does not replace empirical measurement. But in environments where direct measurement is difficult (which describes all of generative search at the current stage), a structured heuristic applied consistently is far more useful than informal gut-feel assessments.
The Conceptual Equation
Building on the five-variable model from Chapter 5, a simplified scoring function for retrieval probability can be expressed as:
Each dimension is scored independently using observable characteristics of the content, then summed. The total score provides a comparative estimate of retrieval probability across sections or pages — not an absolute probability figure.
Scoring Each Dimension
Semantic Alignment (0–25)
Assess how closely the section’s vocabulary and conceptual coverage matches the target query set. High scores require: (a) use of terms the target audience uses for this topic; (b) coverage of related concepts that a well-informed reader would expect; (c) no obscure jargon that diverges from established usage in the topic domain.
- 22–25: Section reads as if written to answer the target queries; terminology fully aligned
- 15–21: Substantial alignment; some terminology gaps or tangential content
- 8–14: Partial alignment; section covers the general topic but not specific query intent
- 0–7: Weak alignment; section is loosely related but would not be retrieved for target queries
Entity Match (0–20)
Assess whether the primary entities relevant to the target queries appear prominently in the section — named explicitly, in canonical form, without reliance on pronouns or context.
- 17–20: Primary entities named explicitly in first two sentences; reinforced throughout section
- 11–16: Primary entities present; some pronoun substitution or delayed introduction
- 5–10: Entities present but weak — implicit references, inconsistent naming, or buried late
- 0–4: Primary entities absent or named once with pronoun use throughout
Structural Clarity (0–20)
Assess the structural quality of the section: answer-first organisation, paragraph focus, and format appropriateness.
- 17–20: Declarative answer leads; one idea per paragraph; lists/tables used for discrete items
- 11–16: Mostly clear structure; minor issues with answer location or paragraph focus
- 5–10: Answer buried or absent; some multi-idea paragraphs; format not optimal for content type
- 0–4: Dense narrative; no clear structural answer; format not aligned to content type
Topical Isolation (0–20)
Assess whether the section is focused on a single, clearly bounded topic — or whether it mixes multiple themes.
- 17–20: Section addresses exactly one question; tight topical focus
- 11–16: Primarily one topic with minor tangents that do not obscure the main theme
- 5–10: Two or three themes mixed; retrievable for one but not sharply focused
- 0–4: Section covers four or more distinct themes; no clear topical focus
Contextual Reinforcement (0–15)
Assess how well the broader content cluster reinforces this section’s entity domain.
- 12–15: Multiple cluster pages reinforce this section’s entities; strong hub-spoke linking
- 7–11: Some cluster reinforcement; linking present but not comprehensive
- 2–6: Isolated page; minimal cluster context; few or no supporting pages
- 0–1: No cluster context; orphan page or standalone content
Interpreting Scores
| Score Range | Retrieval Probability Assessment | Recommended Action |
|---|---|---|
| 85–100 | High — strong candidate for retrieval and extraction | Maintain; monitor for model changes |
| 65–84 | Moderate–High — likely retrieved; extraction quality variable | Targeted improvements in lowest-scoring dimensions |
| 45–64 | Moderate — retrieved inconsistently; extraction often incomplete | Structural rewrite priority; entity audit |
| 25–44 | Low — retrieved rarely; significant structural deficiencies | Full section rewrite using patterns from Chapter 8 |
| 0–24 | Very Low — unlikely to be retrieved or cited | Fundamental content redesign or decommission |
Testing the Model
Calibrate your scoring model against empirical results by following this process:
- Score 10–20 sections across your site using the heuristic model
- Run the target queries in Perplexity for each section (5 iterations per query, 3 queries minimum)
- Record which sections were cited and which were not
- Compare scores to citation outcomes — do high-scoring sections get cited more often?
- Adjust your scoring weights for the dimensions that best predict citation outcomes in your specific domain
A retrieval probability heuristic model scores content sections across five dimensions — semantic alignment, entity match, structural clarity, topical isolation, and contextual reinforcement — producing a 0–100 composite score that enables consistent comparative assessment. The model is a heuristic, not an algorithm; it is most valuable when calibrated against empirical citation data from your own site, and most useful as a prioritisation tool for directing audit and rewrite effort.
Query and Entity Attribution for GEO
Attribution in traditional search was already imperfect — the rise of (not provided) in Google Search Console data began a decade-long erosion of query-level visibility that practitioners have never fully resolved. In generative search, attribution is even more complex. Users receive answers without clicking. AI systems synthesise without consistent citation. And the relationship between content and outcome is mediated by a retrieval model that practitioners cannot inspect.
This chapter covers the attribution types available to GEO practitioners, the tracking methods that can capture them, and the strategic patterns worth building into your reporting infrastructure now — before attribution tools mature.
Types of GEO Attribution
Direct Citation Attribution
Direct citation occurs when a platform explicitly names your URL as a source in the generated response. This is the most trackable form of AI attribution. Platforms that do this consistently include Perplexity, Microsoft Copilot, and (partially) Google AI Overviews. Tracking methods:
- Referral traffic — AI platforms that include links generate referral sessions in GA4. Filter for referral sources including “perplexity.ai,” “bing.com,” “chatgpt.com” to identify AI-attributed traffic.
- GSC AI Overviews filter — Google Search Console is progressively surfacing AI Overview appearance data; check for this filter in your GSC instance.
- Manual prompt auditing — Systematic running of target queries across platforms; record which URLs are cited.
Phrase Mirroring Attribution
Phrase mirroring occurs when AI-generated responses reproduce your exact phrases or near-exact paraphrases without explicit citation. This is common in ChatGPT browsing responses and in Google AI Overviews that extract text from your pages. It is difficult to attribute systematically but can be detected through:
- Regular manual sampling of generated responses for your brand’s specific terminology and named frameworks
- Monitoring for custom phrases, product names, or coined terms that appear in AI responses
- Tracking whether responses reproduce your structural formats (e.g., a specific list structure you use consistently)
Entity Naming Attribution
Entity naming attribution occurs when AI systems include your brand, product, or concept name in responses to relevant queries — without necessarily directing the user to your content. This is the most commercially significant but least directly trackable form of attribution. Proxies include brand search lift (increases in direct branded searches following periods of high AI query activity for your topic) and unaided brand recall in user research.
Tracking Infrastructure for GEO Attribution
| Attribution Type | Tracking Method | Tool | Reliability |
|---|---|---|---|
| Direct citation (Perplexity) | Referral traffic, manual audit | GA4, Perplexity API | Medium–High |
| Direct citation (AI Overviews) | GSC AI filter, click tracking | Google Search Console | Partial (improving) |
| Direct citation (Copilot) | Referral traffic | GA4 | Medium |
| Phrase mirroring | Manual sampling | Human review — no automated tool | Low (labour-intensive) |
| Entity naming | Brand search lift, user research | GSC, survey tools | Low (indirect proxy) |
| AI share of voice | Competitive prompt audit | Profound, manual, Evertune | Medium (paid tools) |
Setting Up a GEO Attribution Log
Even without dedicated tools, a systematic manual attribution log provides useful data for strategy decisions. A minimal log records:
- Query — exact query text tested
- Platform — which AI system was queried
- Date — for temporal trend analysis
- Citation present? — Yes/No
- Source cited — your URL or competitor URL
- Section cited — which specific page section was referenced
- Entity named? — Was your brand/product explicitly mentioned?
- Accuracy of representation — Did the AI correctly characterise your content?
Running this log across a defined query set of 20–50 target queries, at regular intervals (weekly or monthly), provides a longitudinal dataset that reveals whether structural improvements are translating into attribution gains.
Strategic Patterns in Attribution Data
As your attribution log accumulates, look for these strategic patterns:
- Query gaps — Queries where competitors are consistently cited but you are not. These are highest-priority GEO optimisation targets.
- Section champions — Sections that are consistently cited. Analyse what makes them work; apply those patterns to lower-performing sections.
- Platform divergence — Different platforms citing different sources. Investigate structural differences between your content and competitor content that is preferred by specific platforms.
- Attribution decay — Previously cited content that has stopped being cited. Often corresponds to model updates or competitor content improvements; flag for immediate structural review.
- Messaging drift — AI-generated descriptions of your product that diverge from your intended positioning. Identify the source content being paraphrased and rewrite for compression resistance.
GEO attribution takes three forms: direct citation (trackable via referral traffic and GSC), phrase mirroring (detectable through manual sampling of AI outputs), and entity naming (estimated through brand search lift proxies). Building a structured attribution log — even manually — provides longitudinal data that reveals which pages and sections are generating AI visibility and where competitors are gaining ground. Attribution data drives prioritisation; prioritisation drives structural action.
The Future of Generative Visibility
Prediction in technology is hazardous. Prediction in AI technology in 2026 is especially so — the rate of development makes six-month-old analysis feel dated and twelve-month forecasts speculative. This chapter does not attempt to predict with precision. It identifies observable near-term trends, medium-term directions with reasonable confidence, and the enduring structural principles likely to persist through whatever specific technical forms generative search takes next.
Near-Term Trends (2026–2027)
Expansion of AI Overview Coverage
Google’s AI Overviews, initially deployed selectively for informational and mixed-intent queries, are expanding in geographic scope, query category coverage, and personalisation depth. By the end of 2026, AI-mediated answers are expected to appear across a substantially larger proportion of query types, including comparison and early-transactional intent queries that currently see lower AI Overview rates. The CTR compression effect documented in informational queries is likely to extend into these categories.
Multimodal Retrieval
Current generative search retrieves and cites primarily text content. The integration of image, video, and audio understanding into retrieval pipelines is accelerating. Content with strong visual structure — infographics, structured video with transcripts, rich image alt-text and structured captions — will increasingly factor into retrieval scoring. Practitioners building content systems now should consider how structural principles (clear labelling, entity naming, answer-first composition) translate into multimodal formats.
Agentic Search Behaviour
AI agents that execute multi-step tasks on behalf of users — booking appointments, researching products, comparing options across multiple sources — are entering consumer use. These agentic systems perform structured retrieval across multiple sources to complete tasks, not just to answer single questions. Content designed for task-context retrieval (procedural clarity, structured step sequences, machine-readable pricing and availability data) will gain relevance as agentic AI use grows.
Personalised Retrieval Models
Current generative search retrieval is largely query-based — the same query returns similar results for different users. Personalisation layers are being added, drawing on search history, account data, and interaction patterns. For practitioners, this introduces a new complexity: the same content may be retrieved preferentially for one user profile and deprioritised for another. Entity clarity and semantic alignment remain effective regardless of personalisation layer — which is why they are the most durable optimisation levers.
Medium-Term Directions (2027–2029)
Structured Knowledge Integration
The boundary between generative search and structured knowledge databases (knowledge graphs, Wikidata, schema-declared entity stores) is dissolving. AI systems increasingly blend unstructured web retrieval with structured graph queries. Sites that invest in rich schema markup, well-defined entity declarations, and consistent cross-platform entity presence (Wikipedia mentions, Wikidata items, Knowledge Panel associations) will benefit from stronger entity anchoring in this integrated retrieval environment.
Real-Time Authority Signals
Traditional domain authority is a lagging signal — built over years of link accumulation. As retrieval systems increasingly favour recency and real-time authority (publication date, engagement freshness, recent citation patterns), the authority model shifts. Regular, structured content publication maintains temporal relevance. Sites that publish sporadically — even if historically strong — may see retrieval probability decay between publication events.
GEO as Standard Practice
Just as technical SEO became a baseline expectation rather than a competitive differentiator over the 2010s, GEO structural practices will become baseline requirements for any site that expects organic search to remain a viable acquisition channel. The practitioners who develop these skills now — and who build the institutional knowledge, tooling, and auditing workflows — are positioned to provide the expertise that will be in demand as GEO matures from a specialisation into a standard practice.
Enduring Principles
Regardless of how specifically generative search evolves, the following principles are likely to remain relevant because they are grounded in the fundamental architecture of any information retrieval system:
- Clarity always wins — If your content is unambiguous about what it means, any retrieval system — current or future — is more likely to use it accurately than content that requires interpretive inference.
- Entities are the atomic unit of knowledge — All retrieval systems, whether keyword-based, vector-based, or graph-based, work through entities and their associations. Content that clearly establishes entity relationships will translate across retrieval architectures.
- Structure enables automation — Human readers can tolerate structural ambiguity; machines cannot. As AI systems become more deeply integrated into information retrieval at all scales, structurally explicit content becomes more broadly useful.
- Authority requires evidence — Generative systems assess citation-worthiness through signals of evidence: data, sourced claims, expert authorship, consistent publication. These signals predate GEO and will outlast any specific implementation of it.
Recommended Posture
Given the uncertainty intrinsic to this field, the most defensible strategic posture for practitioners is:
- Build fundamentals that transfer. Entity clarity, structural extractability, and topical coherence are optimisation investments that improve performance across current and likely future retrieval systems. They are not bets on a specific platform.
- Measure what you can, estimate what you cannot. Imperfect measurement is better than no measurement. Manual attribution logs, proxy metrics, and heuristic scoring — however imprecise — provide directional guidance that no measurement cannot.
- Experiment continuously. The fastest way to accumulate actionable knowledge in a black-box environment is structured experimentation. Run small tests, document results honestly, publish findings, and iterate. Practitioners who maintain an active experiment log learn faster than those who wait for industry consensus.
- Stay commercially grounded. GEO that does not serve commercial outcomes is technical exercise. Every structural decision should connect to a query target, a traffic segment, or a business objective. Visibility without value is vanity.
The shift from ranking to retrieval is structural, not cyclical. It is not a Google update that will roll back. It is the consequence of a fundamental change in how users access information — one that is accelerating, not decelerating. Practitioners who adapt their mental models now, build structural competencies, and develop measurement disciplines are not chasing a trend. They are building the skills that define the next decade of search practice.
Generative visibility will evolve across three dimensions: platform proliferation (more AI systems competing for query share), multimodal retrieval (structured data, images, and audio becoming retrieval-eligible), and memory architecture (long-context AI systems maintaining entity associations between sessions). Principles endure: extractability, entity clarity, structural authority, and topical depth remain the foundations regardless of platform-level changes.
Reference Materials
Practical tools and references for immediate application: the GEO Audit Checklist, Page-Level Scoring Worksheet, Section Rewrite Template, and Glossary of Key Terms.
GEO Audit Checklist
Use this checklist for page-level GEO audits. Work through each layer of the GEO Stack in sequence. Mark each item Present (P), Absent (A), or Partial (Pt). Items marked Absent or Partial are action items.
Layer 1: Retrieval
- Page content uses the vocabulary of target query set (search terms, synonyms, related concepts)
- Semantic coverage is comprehensive — related subtopics the user would expect are addressed
- Page has been tested in target AI platforms for target queries (minimum 5 iterations per query)
- Baseline citation rate is documented and dated
- Page has indexing confirmed (not blocked in robots.txt or noindex tagged)
Layer 2: Extractability
- Every H2/H3 section opens with a declarative answer sentence
- Core answer appears within the first two sentences of every section
- All key entities are named explicitly (canonical form) in every section — no orphaned pronouns
- Every section is coherent when read in isolation (passes section independence test)
- Core meaning of every section survives one-sentence compression (compression resistance test)
- Paragraphs are under 120 words with one primary idea each
- Discrete concepts (steps, options, features) are formatted as lists or tables, not narrative prose
- Comparison content is presented in structured table format
- Definitions are formatted as definition blocks (term → definition → context), not buried in prose
- FAQs are formatted as discrete Q&A pairs with declarative answers under 120 words
Layer 3: Entity Reinforcement
- One canonical form used for each primary entity throughout the page
- Primary entity appears by canonical name at least once every 150–200 words in long sections
- Related entities appear together consistently (deliberate co-occurrence design)
- Organization and/or Article schema markup applied
- FAQPage schema applied to FAQ sections
- HowTo schema applied to procedural step-by-step sections
- DefinedTerm schema applied to key definitions
- Author name and credentials are present and schema-marked (Person or author property)
- No synonym drift — primary entity not referred to by multiple alternate forms
Layer 4: Structural Authority
- Page belongs to a defined topical cluster with a hub page
- Page links to its hub using the hub’s primary entity in anchor text
- Hub page links back to this page with this page’s primary entity in anchor text
- All significant internal links use entity-rich anchor text (not “read more,” “click here”)
- Page is accessible within 2 clicks from its hub
- No orphan status — at least 2–3 incoming internal links from relevant cluster pages
- Adjacent spoke pages cross-link where entity overlap exists
- Page topics are clearly bounded — this page doesn’t duplicate scope of a sibling page
Layer 5: System Memory
- Entity naming is consistent across this page and all cluster pages (no cross-page contradiction)
- Topic coverage is reinforced across multiple cluster pages (not dependent on a single page)
- Publication cadence is maintained — no multi-month gaps in cluster content
- Previous versions of this page (before rewrites) have been canonicalised or redirected
- Messaging about brand/product is consistent across all cluster pages
Scoring Summary
| Layer | Items | Present | Partial | Absent | Score (P + 0.5×Pt) / Total |
|---|---|---|---|---|---|
| L1 Retrieval | 5 | / 5 | |||
| L2 Extractability | 10 | / 10 | |||
| L3 Entity Reinforcement | 9 | / 9 | |||
| L4 Structural Authority | 8 | / 8 | |||
| L5 System Memory | 5 | / 5 | |||
| Total | 37 | / 37 |
To convert this checklist score to a weighted GEO score aligned with the AI Visibility OS scoring engine, multiply each layer’s percentage score by its weight: L1 × 0.20 + L2 × 0.25 + L3 × 0.20 + L4 × 0.15 + L5 × 0.10. Technical Health (indexability, canonical, title presence) functions as a gate — if it fails, the overall score is capped at 40.
↓ Download editable version at thegeolab.net/appendices
Page-Level GEO Audit Worksheet
This worksheet provides a section-level scoring framework for a single page. Complete one row per H2 section. Use findings to prioritise specific rewrite tasks.
| Section Heading | Semantic Alignment (0–25) | Entity Match (0–20) | Structural Clarity (0–20) | Topical Isolation (0–20) | Contextual Reinf. (0–15) | Total (0–100) | Priority Action |
|---|---|---|---|---|---|---|---|
| [Section 1 heading] | |||||||
| [Section 2 heading] | |||||||
| [Section 3 heading] | |||||||
| [Section 4 heading] | |||||||
| [Section 5 heading] | |||||||
| [Section 6 heading] | |||||||
| [Section 7 heading] | |||||||
| [Section 8 heading] |
Page-Level Summary
| Page URL | |
| Audit date | |
| Target query set | |
| Baseline citation rate | % of query-runs where this page was cited at audit date |
| Highest-scoring section | |
| Lowest-scoring section(s) | Priority rewrite targets |
| Most common deficiency | Entity match / Structural clarity / Topical isolation / etc. |
| Top 3 priority actions | 1. / 2. / 3. |
| Estimated rewrite effort | Hours |
| Post-rewrite test date | Schedule 2–4 weeks post-publication |
Section Rewrite Template
Use this template when rewriting low-scoring sections for extractability and entity reinforcement. Complete each slot before writing the rewritten version.
Pre-Rewrite Analysis
| Section heading | |
| Target query (the question this section should answer) | |
| Primary entity | The main named concept, brand, or product this section is about |
| Secondary entities | Related named concepts that should appear in this section |
| Core claim (one sentence) | The main thing this section asserts — this MUST appear in sentence 1 of the rewrite |
| Supporting mechanism | How/why the core claim is true |
| Evidence or example | Data point, case example, or observed pattern that supports the claim |
| Practical implication | What a reader should do or understand as a result |
| Content format needed | Narrative / Definition block / Numbered list / Bullet list / Comparison table / FAQ pair |
| Target length (words) | 150–300 words recommended for most H2 sections |
Rewrite Structure
Follow this structure when writing the new version:
Sentence 1 (Declarative answer): [Core claim, with primary entity named
explicitly]
Sentences 2–3 (Mechanism): [How/why the claim is true. Introduce secondary entities
by canonical name. No pronouns in place of entity names.]
Sentences 4–5 (Evidence/Example): [Specific, concrete evidence. Quantified if
possible. Named if a real example.]
Sentence 6 (Implication): [What this means for the reader. Name the primary entity
once more.]
[If applicable: list, table, or Q&A pairs follow the above paragraph, structured for their
format type.]
Post-Rewrite Self-Check
- Does sentence 1 contain the primary entity by canonical name?
- Does sentence 1 state the core claim declaratively?
- Can this section be understood without reading anything else on the page?
- Does the core meaning survive a one-sentence summary?
- Are all entities named by canonical form (no pronouns substituting for entity names)?
- Is the primary entity named at least once every 150 words if the section is long?
- Is the format (list, table, narrative) appropriate for the type of information?
- Is every paragraph under 120 words with one main idea?
Glossary of Key Terms
- AI Overview
- Google’s generative search feature that displays AI-synthesised answers above traditional organic search results. Appears selectively for informational, how-to, and comparative queries. Content cited in AI Overviews is drawn from retrieved and extracted web pages.
- AI Share of Voice
- The proportion of AI-generated responses to a defined query set in which a specific brand or URL is cited, compared to competitors. A measure of competitive generative visibility.
- Chunk
- A discrete section of content as processed by a Retrieval-Augmented Generation (RAG) system. Pages are split into chunks — typically at heading or paragraph boundaries — for vector indexing. Each chunk is a potential retrieval and extraction unit.
- Compression Resistance
- The degree to which a content section retains its core meaning when summarised or compressed. A section with high compression resistance preserves its essential claim in a one-sentence summary. Low compression resistance means the key information is lost when the AI condenses the content.
- Contextual Reinforcement
- The cumulative effect of related pages in a content cluster reinforcing the entity associations of any individual page. Pages supported by multiple reinforcing cluster pages have higher contextual reinforcement and therefore higher retrieval probability.
- Entity Gravity
- The semantic pull of a named entity: the strength of its association with related concepts, content, and queries in a retrieval system’s model. High entity gravity means the entity is strongly and consistently associated with your content across multiple pages and contexts.
- Entity Reinforcement
- The practice of using canonical entity names consistently, repeatedly, and in deliberate co-occurrence patterns across a content system, to build strong semantic associations in retrieval models.
- Extractability
- The quality that determines whether an AI system can isolate and use a section of content cleanly without requiring surrounding context. High extractability is achieved through answer-first structure, section independence, explicit entity naming, and appropriate format use.
- Generative Engine Optimisation (GEO)
- The practice of engineering content and content systems to improve visibility, retrieval probability, and citation frequency in generative AI search environments. GEO operates at the layer of content structure, entity architecture, and knowledge organisation rather than traditional link-building and keyword placement.
- GEO Stack
- A five-layer framework for generative visibility engineering: Layer 1 Retrieval, Layer 2 Extractability, Layer 3 Entity Reinforcement, Layer 4 Structural Authority, Layer 5 System Memory. Each layer addresses a distinct aspect of generative signal, and each layer has dependencies on the layers below it.
- Inclusion Rate
- The percentage of a defined query set for which a given URL or domain is cited in AI-generated responses. A primary GEO performance metric. Typically measured through systematic manual prompt testing across a target query set with multiple iterations.
- Knowledge Graph
- A structured representation of entities and their relationships, used by search systems (including Google’s Knowledge Graph) to understand the semantic connections between named concepts. In GEO, internal linking architecture can be understood as a site-level knowledge graph design exercise.
- Messaging Fidelity
- The degree to which AI-generated descriptions of a brand, product, or concept match the intended positioning. Low fidelity indicates the source content being paraphrased was insufficiently clear or specific about the key claims.
- Perplexity
- A standalone AI-powered search engine that provides generated answers with explicit source citations. Particularly transparent about its retrieval process, making it a useful platform for GEO testing and attribution logging.
- RAG (Retrieval-Augmented Generation)
- An AI architecture that combines a retrieval system (which finds relevant content from a corpus) with a generative model (which synthesises a response using the retrieved content). Most generative search systems use some form of RAG architecture.
- Retrieval Probability
- The estimated likelihood that a specific content chunk is selected during the vector retrieval phase of a generative search pipeline in response to a given query. Influenced by semantic alignment, entity match strength, structural clarity, topical isolation, and contextual reinforcement. Not directly measurable; estimated through proxy metrics and heuristic scoring.
- Section Independence
- The property of a content section that allows it to be understood without reading the surrounding content. A section passes the independence test when it makes complete sense as a standalone passage, without relying on prior context for entity resolution or logical coherence.
- Semantic Alignment
- The degree of conceptual proximity between a content chunk and a query, as measured by vector distance in the embedding space. High semantic alignment means the content’s meaning is close to the query’s intent — not necessarily in identical words, but in conceptual coverage.
- Structural Authority
- Layer 4 of the GEO Stack. The coherence signal that emerges from well-designed information architecture: hub-and-spoke cluster organisation, consistent internal linking, clear topical boundaries, and no orphan nodes. Signals to retrieval systems that a domain’s coverage of a topic is authoritative and organised.
- System Memory
- Layer 5 of the GEO Stack. The persistent, cumulative entity and topical associations that a generative system builds about a content domain over time. System memory is the aggregated result of consistent Retrieval, Extractability, Entity Reinforcement, and Structural Authority signals maintained across an entire site and over time.
- Topical Isolation
- The degree to which a content section is focused on a single, clearly bounded topic. High topical isolation means the section addresses exactly one question or concept; low topical isolation means multiple unrelated themes are mixed within one section, reducing retrievability for any specific query.
- Vector Embedding
- A numerical representation of a text passage in a high-dimensional space, produced by an embedding model. Passages with similar meanings have vectors that are close together in this space. Vector similarity between query embeddings and content embeddings is the primary matching mechanism in most RAG retrieval systems.
- Zero-Click Result
- A search result where the user receives the information they sought in the SERP (from an AI Overview, featured snippet, or knowledge panel) without clicking through to any website. GEO-optimised content can still gain brand exposure and entity association value from zero-click appearances, even without generating direct site traffic.
↓ Download editable version at thegeolab.net/appendices
GEO Lab Experiment Log
This log documents ongoing GEO experiments conducted by The GEO Lab. Each experiment isolates a single variable against a controlled baseline. Results are updated as experiments complete. The log is a living document — see the latest version at thegeolab.net/log.
| # | Date | Hypothesis | Variable Tested | Primary Finding | Status |
|---|---|---|---|---|---|
| 001 | Feb 2026 | Declarative (answer-first) structure produces higher citation rates than narrative structure for definitional queries | Opening sentence type (declarative vs. narrative) | Declarative structure achieved 61% citation rate vs 37% for narrative across 75 queries each on Perplexity. Citation position also improved — declarative pages appeared as first citation more frequently. | Completed |
| — | Feb 2026 | Field audit: a page achieving 100% citation rate across all four major platforms may still have low representation accuracy if entity signals are insufficient | Entity signal coverage (count of correctly represented entities in AI responses) | Commercial events page achieved 100% citation rate across ChatGPT, Copilot, Perplexity, and Google AI Overview — but only 15/100 entity signals were accurately represented. Citation ≠ representation. | Completed |
| 002 | Mar 2026 | Entity density (canonical name repetition frequency) positively correlates with citation rate for entity-specific queries | Entity name repetition rate (low / medium / high density) | — | In Progress |
| 003 | Q2 2026 | FAQPage schema markup improves citation rate for FAQ-format content compared to identical unstructured content | FAQPage schema presence | — | Queued |
| 004 | Q2 2026 | Hub-and-spoke cluster architecture produces higher cluster-level citation rates than equivalent flat architecture | Internal link architecture (hub-spoke vs. flat) | — | Queued |
| 005 | Q3 2026 | Sections under 200 words are cited more frequently than sections over 400 words for identical topic coverage | Section length (short / medium / long) | — | Queued |
The full experiment log, including raw data and methodology notes for each experiment, is maintained at thegeolab.net/log. Results are updated as experiments complete. Practitioners are encouraged to replicate these experiments on their own sites and compare findings.
References & Further Reading
The following sources informed or are referenced within this manual. Where arXiv or institutional links are available, they are included. The GEO field is developing rapidly; readers are encouraged to check current publications beyond the sources listed here.
Primary Research
-
Aggarwal, A., Memon, A., Bhatt, T., Bhagat, R., Chakraborti, T. (2024). “GEO:
Generative Engine Optimization.” Proceedings of the 47th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR 2024). arXiv:2311.09735. Princeton
University / IBM Research.
arxiv.org/abs/2311.09735 -
Google Search Central (2024–2025). AI Overviews: product announcements, coverage
statistics, and publisher guidance. Google LLC.
blog.google/products/search — see AI Overview category -
Google SearchLiaison (@searchliaison, 2024–2025). Official communications on AI
Overviews rollout, opt-outs, and publisher relations. X (Twitter).
x.com/searchliaison
Industry Research & Data
-
SparkToro / Datos (2024). “Zero-Click Search Study: What Happens After a Google
Search?” Analysis of click-through patterns and zero-click behaviour across Google SERPs.
SparkToro.
sparktoro.com -
Ahrefs Research Team (2024–2025). “How AI Overviews Affect Organic CTR.” Internal
data analysis of click-through rate changes in queries showing AI Overviews vs. standard results.
Ahrefs.
ahrefs.com/blog — see AI Overview CTR research -
Perplexity AI (2024–2025). Product documentation, citation methodology notes, and
transparency reports on source selection. Perplexity AI Inc.
perplexity.ai
For current GEO research, experiment logs, and tool documentation, visit thegeolab.net. The GEO Lab publishes ongoing experiment results, methodology updates, and field notes as the generative search landscape develops.
AI search visibility research, field experiments, and the complete GEO Lab Library — all free.
#2 SEO to GEO: Complete Framework
#3 GEO Experiments
#4 The GEO Workbook
#5 GEO for WordPress
#6 The GEO Glossary
#7 GEO Field Manual ✓
#8 GEO Authority Playbook
#9 AI SEO OS