Does restructuring content — with no changes to what it says — change how often AI systems cite it?
Two versions of the same content. Same domain, same queries, same word count. One written the way a competent human writer would write it — context first, ideas that build, prose that flows. The other restructured so every section opened with a direct answer and every entity was named explicitly.
The first version was cited 37% of the time. The second: 61%.
That 24-point gap came entirely from structure. No new content. No links. No authority changes. This is the first quantified evidence that Extractability operates as a genuine retrieval signal in AI search — and what that means for content strategy.
This is a live experiment. The hypothesis, methodology, data, and interpretation are documented in full. The experiment is replicable. If you run it on your own content and get different results, I want to know.
Why This Needed Testing
I had a working assumption going into this: that how you structure content matters more than how much content you have. That Extractability — not word count, not domain authority, not internal linking — is what determines whether a generative system actually uses your material when it retrieves it.
Assumptions need testing. This one got tested.
The five-layer GEO Stack framework framework predicts that Extractability at Layer 2 is a controlling variable for citation consistency. The logic is mechanically coherent: generative systems retrieve sections by vector similarity, lift them out of context, compress them, and weave them into a synthesised answer. A section written in narrative style — where the answer arrives after three sentences of build-up — will be compressed differently than a section where the answer is the first sentence. But how large is that difference, in actual citation rate numbers?
That is what this experiment set out to measure. Not theoretically. In numbers, on a live platform, under controlled conditions.
Content sections rewritten in declarative structure — answer-first, entity-explicit, standalone-complete — will be cited more consistently by generative search systems than semantically equivalent sections written in narrative style, when tested across identical queries on the same platform under controlled conditions.
What Declarative and Narrative Structure Mean
These terms are used precisely in this experiment. Before the results, they need exact definitions — because the entire finding depends on the distinction being real and testable.
Declarative structure has four characteristics:
- Opens with a direct answer or definition in the first sentence
- Names all entities explicitly — no pronoun dependency
- Contains one primary idea per paragraph
- Makes complete sense when read in isolation, without surrounding context
Narrative structure has the opposite characteristics:
- Builds context before delivering its core claim
- Uses natural flowing prose with pronoun references
- May rely on prior paragraphs for entity anchoring
- Is optimised for a human reader moving through the page sequentially
Neither is a writing failure. Narrative structure is what competent writers produce by default because it is what human readers experience as natural. The problem is that generative systems do not read the way humans do. They retrieve individual sections non-sequentially, lift them from context, and compress them. Writing conventions that serve human reading directly undermine machine extraction.
Citation consistency is measured by running the same query multiple times and logging whether the target section’s phrasing, structure, or specific claims appear in the generated output — not merely whether the URL appears as a source link.
When we think about how content gets used in the new era of AI search, it’s becoming increasingly clear that the way sections are structured plays an important role. Context matters, of course — but the order in which information is presented turns out to have significant implications for whether the system is able to make use of it.
Content structure determines AI citation rate independently of content quality. Generative search systems retrieve individual sections by vector similarity, lift them from surrounding context, and compress them into synthesised answers. Sections that open with a direct claim are extracted more accurately and cited more frequently than sections that build to their conclusion.
Same topic. Same length. The declarative version is more precise — but the structure is what changes citation probability. That is the variable this experiment isolates.
The Extractability Hypothesis in Citation Testing
The Extractability hypothesis predicts that AI systems preferentially cite content where core claims appear in opening positions. When a retrieval system chunks content and embeds it for similarity matching, declarative openings create stronger alignment between query vectors and content vectors. This experiment tests that prediction with controlled variables — the same content, across identical queries, in two structural forms.
How the Experiment Was Conducted
The methodology was designed to isolate structure as the single independent variable. Everything else was held constant: domain, topic, word count, internal linking, external promotion, and query set.
Topic selection: A neutral informational topic with moderate query volume and no YMYL sensitivity. Chosen to avoid volatility from news events, contested claims, or medical/financial quality filters that could introduce confounding variables.
Content creation: Two 400-word versions of the same content.
- Version A (Narrative): Written as a competent human writer would write it by default — context-first openings, natural prose flow, pronouns used normally, ideas that build across sentences and paragraphs.
- Version B (Declarative): Every paragraph opens with a direct answer. All entities named explicitly on first use within each paragraph. Each paragraph coherent in isolation. Information identical to Version A — only structure changed.
Publication: Both versions published on the same domain under separate URLs with no internal links to either page and no external promotion.
Indexing period: 14 days before testing began, to allow both pages to stabilise.
Citation testing: 75 query iterations per version on Perplexity. Testing distributed across three sessions over five days to reduce temporal bias from model state fluctuation.
This methodology is documented in the GEO Experiments framework. The experiment is replicable. Instructions for running it on your own content are in the GEO Field Manual.
Declarative vs Narrative Design Patterns
The declarative version followed strict patterns throughout: every paragraph opened with a claim, all entities were named explicitly on first mention per paragraph, and each section was written to stand alone without surrounding context. The narrative version mirrored typical content writing — context-first, flowing, and dependent on prior paragraphs for entity resolution. The information was identical across both versions at every point.
The Citation Results
The result was bigger than I expected.
Citation rate — declarative structure (Version B)
Citation rate — narrative structure (Version A)
Gap from structural rewriting alone — no other changes made
| Version | Structure | Query Runs | Citation Rate |
|---|---|---|---|
| Version A | Narrative | 75 | 37% |
| Version B | Declarative | 75 | 61% |
I want to be specific about what “the same information” means here — because this is where the finding gets interesting. Version A and Version B contained identical facts, identical claims, identical word counts. Nothing was added. Nothing was removed. The only change was the order within each paragraph: declarative structure puts the answer first; narrative structure builds to it. That single difference accounted for 24 percentage points of citation rate.
The gap was consistent across all three testing sessions. Session variance was within 4 percentage points for both versions, suggesting the result is not an artefact of a single model state.
Statistical Significance of the Citation Gap
The 24-point gap (37% vs 61%) exceeds session variance of 4 points by a factor of 6. A chi-square test on the raw citation counts yields p < 0.01, indicating statistical significance at standard thresholds. The result is not a fluctuation — but 75 runs per version is reasonable, not definitive. A larger sample of 200+ iterations would further reduce variance.
The Qualitative Patterns — Which Were More Interesting Than the Number
The citation rate was the headline. The qualitative patterns underneath it were more interesting, and more useful for understanding why the gap exists.
Retrieval Anchoring
When declarative content was cited, it was cited accurately. The generated output typically mirrored the opening sentence almost verbatim — the system selected the chunk and reproduced its frame. The declarative opening sentence functions as a retrieval anchor: it is both what gets selected and what gets used.
When narrative content was cited, outputs paraphrased more heavily and the core claim was sometimes displaced. The system retrieved the section but reconstructed it loosely — the words changed, and with them, sometimes the meaning.
Does your content appear in AI outputs with your phrasing, or in a heavily paraphrased form you don’t recognise? Accurate reproduction of your opening sentence is the signal that retrieval anchoring is working.
Representation Drift
This was the unexpected finding. Narrative sections that were retrieved didn’t always surface the central claim. The system extracted a peripheral detail instead — a supporting example, a qualifying clause, something from the middle of a paragraph. The content was cited. The citation misrepresented what the source was actually saying.
This did not occur with declarative content. When a declarative section was retrieved, the compression step selected the opening claim — which, by design, is the most important claim in the section. Structure constrained which claim was extracted.
Getting cited incorrectly may be worse than not being cited at all. A misrepresentation attributes a peripheral point to your content at scale. Declarative structure prevents this — not by improving retrieval, but by constraining what gets extracted once retrieval has occurred.
Partial Traces
Narrative sections that were not cited occasionally left partial traces in outputs — paraphrased clause structures that matched the source without attribution. The system was drawing on the content without surfacing it. You were contributing to someone else’s answer.
Declarative sections that were not cited were cleanly not cited. No partial traces. The section either cleared the full extraction threshold or didn’t enter the output at all.
Partial traces contribute citations without credit. If your narrative content is shaping AI outputs without appearing as a source, structural rewriting is the intervention — not more content or more links.
These three patterns together suggest that declarative structure doesn’t just increase citation frequency — it changes the character of how your content is used. More accurate representation. Fewer misattributions. No silent contribution to outputs that don’t credit you.
Retrieval Anchoring and Representation Fidelity
Retrieval anchoring positions declarative definitions at section starts, where embedding models assign highest semantic weight. When the system retrieves and compresses content, it defaults to the most extractable sentence — which in declarative structure is the opening claim by design. Representation fidelity measures whether the cited claim matches the source’s intended meaning. Declarative structure scores high on both because it makes the most important sentence and the most extractable sentence the same sentence.
What the 24-Point Gap Means
The 24-point retrieval gap demonstrates that structural positioning — not content quality — determines extraction probability. This is larger than I expected for a single structural variable with no change to content, domain authority, or internal linking. It suggests that Extractability operates as a genuine retrieval signal rather than a marginal optimisation.
I’ll say plainly what I think: a 24-point increase in citation rate from structural rewriting alone — with no new content, no new links, no authority changes — is the most actionable finding in GEO research to date. Not because the number is large in absolute terms, but because the intervention is so low-cost. You’re not building links. You’re not creating new content. You’re reordering paragraphs.
Additional citation events monthly, for a page receiving 1,000 AI-driven impressions at a 37% baseline citation rate, after structural rewriting to 61%
Where most content fails — not at retrieval, but at extraction. Pages ranking well in narrative style are leaving citation consistency on the table
For pages already being retrieved, structural rewriting — not content expansion or link building — is the correct intervention
This is what the GEO Stack identifies as the Layer 2 opportunity: content that passes the retrieval threshold but fails at extraction. Experiment 001 provides the first quantified estimate of how large that opportunity is. The GEO Lab Console can identify which sections need this rewrite at the section level.
The Mechanism Behind the Declarative Citation Advantage
Declarative structure reduces semantic distance between the query vector and the content chunk vector at retrieval. When a section opens with a direct answer, the embedding representation of that section aligns more closely with the embedding representation of a direct question. Narrative structure, which buries the answer in contextual framing, produces a chunk embedding that weights preamble more heavily than the core claim — reducing alignment with direct query intent.
This is consistent with published research on dense passage retrieval and the way answer-bearing sentences are weighted in RAG pipelines. It also explains the representation drift pattern: when a narrative section is retrieved despite lower alignment, the compression step selects the most extractable sentence — which in narrative prose is often a supporting detail, not the central claim. In declarative prose, the most extractable and the most important sentence are the same sentence by design.
The retrieval mechanics of Perplexity are not publicly documented in sufficient detail to confirm this mechanism. It is a hypothesis consistent with the evidence — not a confirmed fact.
Limitations — What This Experiment Cannot Claim
I’m documenting the limitations fully, not as caveats, but because this is a research log and the limitations matter for anyone considering replication or application.
-
Single platform
Conducted entirely on Perplexity. ChatGPT, Google AI Overviews, Copilot, and Claude each use different retrieval and synthesis architectures. The 24-point gap is specific to Perplexity until replicated elsewhere. Cross-platform replication is the logical next step. -
Topic neutrality
The topic was deliberately neutral and informational. Commercial, YMYL, or highly contested topics may show different patterns due to additional quality filtering layers that interact with structural signals in ways this experiment doesn’t capture. -
Sample size
75 query runs per version provides statistical significance at standard thresholds (p < 0.01), but a larger sample of 200+ iterations would further reduce variance. The result is directionally reliable, not definitively precise. -
Domain authority interaction
Both versions were on the same domain, which controls for authority as a variable. However, the domain’s authority characteristics may interact with structural signals in ways that don’t generalise across domains at different authority levels.
None of these limitations make the findings less useful. They make them more useful — applied correctly, as a diagnostic tool that identifies structural opportunities, rather than a guarantee of specific citation outcomes.
Platform-Specific Considerations for Citation Testing
Perplexity, ChatGPT, Google AI Overview, and Claude each use different retrieval architectures. Perplexity’s real-time web retrieval may weight structural signals differently than systems with knowledge-cutoff retrieval approaches. Replication across platforms is essential before declaring declarative structure a universal optimisation lever, and it is planned as part of the GEO Experiments programme.
What This Means for Your Content
A 24-point increase in citation rate from structural rewriting alone — with no new content, no new links, no authority changes — represents a meaningful commercial lever for any page where AI citation contributes to visibility or brand exposure.
- For a page receiving 1,000 AI-driven impressions monthly, the difference between 37% and 61% citation consistency equals 240 additional citation events
- Pages ranking well but written in narrative style are leaving citation consistency on the table
- Structural rewriting — not content expansion or link building — is the intervention
- The GEO Lab Console — automated GEO scoring can identify which sections need this rewrite, at the section level, before you invest time in manual audits
This is what the GEO Stack framework identifies as the Layer 2 opportunity: content that passes the retrieval threshold but fails at extraction. Experiment 001 provides the first quantified estimate of how large that opportunity is.
What Experiment 002 Tests — And Why
Experiment 001 established that structure moves the number. But structure and entity density are tangled in declarative writing — by naming entities explicitly per paragraph, the declarative version also increased named entity frequency across the content. Which variable is actually doing the work?
Experiment 002 isolates entity density as an independent variable. Same structure, same content length — just more named entities per section, without changing the structural pattern. If entity density produces an independent citation lift, the framework gets more granular: structure and entity density are separate levers, each with their own effect size. If it doesn’t, then structure is the primary driver and entity naming is a consequence of good declarative form rather than an independent signal.
The result of Experiment 002 will either complicate or sharpen the findings from Experiment 001. Both outcomes are useful.
Results publish 24 March 2026 in GEO research log.
If you are running similar experiments, I am interested in comparing methodologies. The field needs more replication, not more speculation. Reach out via @TheGEO_Lab.
Key Takeaways on Declarative Structure
- Structure is a retrieval signal. Declarative content achieved 61% citation rate against 37% for narrative — a 24-point gap from structural rewriting alone, with no changes to content, authority, or links.
- Section openings are the highest-leverage intervention. The first sentence of each section determines how the whole section is extracted and compressed. Moving the core claim to sentence one is the single most impactful change most content needs.
- Getting cited wrongly may be worse than not being cited. Representation drift — the extraction of peripheral claims instead of central ones — is a citation failure that doesn’t appear in URL-level tracking but shapes how your content is attributed at scale.
- Partial traces are a silent loss. Narrative content that isn’t cited may still be shaping AI outputs without attribution. Declarative content either gets cited or doesn’t — no silent contribution.
- Cross-platform replication is needed. The 24-point gap is specific to Perplexity until replicated elsewhere. Experiment 002 and subsequent experiments will extend testing scope.
- The intervention is low-cost. This is not a content creation project. It is a structural editing project. The same information, reorganised, outperforms the original significantly.
For implementation guidance, see the GEO Field Manual — audit and implementation guide. For section-level extractability analysis on your own content, see the GEO Lab Console. For the theoretical foundation, see Extractability — Layer 2 of the GEO Stack.
Check Layer 1 first. Content that isn’t being retrieved cannot be cited, regardless of how well it is structured for extraction. If a page has low Retrieval Probability, structural rewriting at Layer 2 is premature. Solve retrieval before optimising extraction.
The citation gap measured in this experiment maps directly to Layer 2 (Extractability) of the GEO Stack five-layer framework. Structure is the variable that separates content that gets cited from content that gets ignored. Over time, consistent structural improvements at Layers 1–4 compound into System Memory — the accumulated authority signal that determines whether a domain becomes a baseline retrieval source for its topic. In the GEO Brand Citation Index, brands with declarative, extractable content consistently outperform those with narrative-heavy pages — even when the narrative pages carry stronger domain authority.
Frequently Asked Questions
Can I apply declarative structure to existing content without rewriting everything?
Section openings are the only thing that needs to change. In Experiment 001, Version B kept the exact same content as Version A — the intervention was rewriting the first sentence of each section to state the main claim directly. Supporting detail, context, and qualification all followed exactly as before. Start with one page: identify every section that buries its main point mid-paragraph, and move that point to the first sentence. The GEO Field Manual provides step-by-step implementation guidance for this approach.
Does declarative structure work the same way across all AI platforms?
Experiment 001 tested only Perplexity, so cross-platform generalisation is not yet justified by the data. ChatGPT, Google AI Overviews, and Copilot each use different retrieval architectures, and the structural signal that drives the 24-point gap on Perplexity may produce different effect sizes elsewhere. Experiment 002 will begin expanding testing scope, and subsequent experiments will validate whether the declarative structure citation advantage holds across platforms before it can be treated as a universal lever.
How many queries do I need to test before results are reliable?
75 runs per version produced statistically meaningful results with 4-point session variance. For initial validation of your own content, 50 runs split across two or three sessions is a reasonable starting point — enough to separate signal from model-state noise. Anything below 30 runs introduces too much variance to trust directionally. Distributing sessions across multiple days matters more than raw run count, because single-session testing is more vulnerable to model state fluctuations that can inflate or deflate citation rates artificially.
Does declarative structure reduce readability for human visitors?
Most readers prefer it. Front-loading the core claim doesn’t eliminate narrative — it puts the answer before the argument rather than after it. Supporting detail, context, and nuance still follow. The reader gets to decide early whether a section is worth their time, which is a courtesy most content writing doesn’t extend. The structure that feels engineered for machines turns out to be the structure that respects human reading time most directly. The key change is moving the main point to sentence one — not eliminating the surrounding explanation.
What is the difference between declarative structure and answer-first content?
Answer-first is one component of declarative structure, not the whole of it. Declarative structure also requires entity explicitness (naming subjects rather than using pronouns), standalone coherence (sections that make complete sense in isolation), and single-idea paragraphs. The declarative version in Experiment 001 implemented all five principles documented on the Extractability page. Answer-first writing alone — without entity explicitness and standalone coherence — would likely produce a smaller citation lift than the full declarative structure approach.
How does declarative structure relate to the GEO Stack?
Declarative structure is the primary implementation pattern for Layer 2 (Extractability) of the GEO Stack. It also impacts Layer 1 (Retrieval Probability) through better query-chunk alignment — when sections open with direct answers, their embedding representations align more closely with query intent, improving selection probability at the retrieval step. Experiment 001 provides empirical data on the size of the combined Layer 1 and Layer 2 effect. Experiment 002 will begin separating these contributions by isolating entity density as an independent variable.

