Retrieval Mechanism: Zero-Variance Binding in Perplexity The GEO Lab

Zero-Variance Mechanism Capture: Retrieval-Layer vs Synthesis-Layer Determinism

By Artur Ferreira · The GEO Lab · Published: 28 May 2026 · Version 1.0

Experiment E043 · Research note · Platform: Perplexity · 9 observations across 3 days

E027 confirmed that Perplexity cites the same pages for 14 consecutive days on proprietary-concept queries. E043 asks where in the retrieval mechanism that binding actually lives: the index lookup stage, or the synthesis selection stage post-retrieval.

TL;DR

The zero-variance binding is synthesis-layer, not retrieval-layer. On the core binding query (Q02), the retrieval set varied across all three days, with a mean Jaccard similarity of 0.458, well above the platform average of 0.157, but thegeolab.net was cited every time. The synthesis model has a stable preference for the page regardless of what else it retrieves. Breaking the binding requires displacing the page from the synthesis model’s selection logic, not just from the retrieval set.

Two other queries showed a different pattern: retrieved, but not always extracted into the answer. That boundary, retrieval set presence versus synthesis selection, is the measurement target for the rest of the portfolio.

The Mechanism Question E027 Left Open

E027 established that Perplexity returns identical citations for 14 consecutive days on T1 proprietary-concept queries. The paper frames this as “deterministic citation-identity,” but it cannot specify which layer of the retrieval mechanism the determinism operates at, because E027 was measured via API: citation outcome only, no retrieval internals.

Two mechanistically distinct explanations are consistent with that data:

Explanation A: Retrieval-layer

Perplexity issues the same search query and retrieves the same source set on every run. The citation is stable because the retrieval input is stable. The binding is at the index lookup stage.

Explanation B: Synthesis-layer

The retrieval set varies between runs, but the synthesis model consistently selects thegeolab.net for citation regardless of what else is in the set. The binding is at the ranking and selection stage, post-retrieval.

The two explanations carry different implications for how fragile the binding is. If A: a competitor entering the top of the retrieval set can break it without displacing the page from the index. If B: competitors need to dislodge the page from the synthesis model’s preference, not just from the retrieval set. That is a higher bar.

E043 directly distinguishes these via three-day Chrome DevTools capture on the same queries used in E027.

Related resources

Want to replicate E043 or design your own retrieval mechanism test? GEO Experiments: How to Test & Measure AI Citation Rates covers protocol design, Chrome DevTools capture methods, and Jaccard scoring for cross-session experiments.

For the full optimisation framework that situates retrieval-layer and synthesis-layer strategy: AI SEO OS: the operating system for AI search visibility.

Method

Three queries from E027’s confirmed binding set. Three consecutive days. Chrome DevTools open on Perplexity’s /rest/thread/{thread_id} endpoint, the same capture method used in E030 and E042.

Query set

Q02: retrieval probability in GEO Stack model
Q05: what is the GEO Stack framework
Q06: what is extractability in AI search

Session structure

3 queries × 3 days = 9 observations. Same account (Klub 24-7 Pro) throughout. Advanced Preview (mode=2) confirmed before each session. DevTools open, thread trace active.

Observations per query

Internal search string, full source domain list with ranks, URL_NAVIGATE events, snippet text extracted from thegeolab.net, inline citation outcome (0/1).

Determinism metric

Jaccard similarity of source domain sets between sessions: J(A,B) = |A ∩ B| / |A ∪ B|. Computed per query across all three session pairs. Benchmark: Perplexity platform average J=0.157 (Anthony Lee, 2026).

Search-mode protocol note: Day 1 hit Perplexity’s free-tier preview limit mid-session, silently downgrading from mode=2 to mode=1 without UI warning. Q06 on Day 1 ran in degraded mode and produced an invalid observation: thegeolab.net absent, binding apparently broken. Confirmed artefact on Day 2 pro re-run. Mode degradation is now a required observation column in all E043-derived protocols.

This is an exploratory mechanistic observation, not a controlled experiment with pre-registered falsification criteria. The findings extend E027 by adding a retrieval-layer view that the API-only methodology could not provide. Replication package: github.com/arturseo-geo/geo-lab-experiments/tree/main/e043.

Results: Nine Observations Across Three Days

Query	Day 1	Day 2	Day 3	Mean Jaccard	Classification
Q02: retrieval probability in GEO Stack model	✓ cited	✓ cited	✓ cited	0.458	Synthesis-layer binding
Q05: what is the GEO Stack framework	✓ cited	✗ dropped	✓ cited	0.815	Synthesis displacement
Q06: what is extractability in AI search	✗ mode artefact	✗ dropped	✓ cited	0.590	Contested, partial recovery

9 observations total. Day 1 Q06 run in degraded mode=1 (free-tier limit hit); excluded from determinism classification. Jaccard computed across valid sessions only. Platform average Jaccard benchmark: 0.157 (Anthony Lee, 2026).

Q02 is the decisive result. Its retrieval set varied substantially across three days. A mean Jaccard of 0.458 means fewer than half of the source domains were consistent between any two sessions, yet thegeolab.net was cited on all three days. That pattern fits Explanation B and rules out Explanation A as the primary mechanism for this query.

Q05 and Q06 tell a different story: retrieved but not reliably extracted. The competitive and content dynamics behind that distinction are covered in the per-query findings below.

Per-Query Findings

Q02
retrieval probability in GEO Stack model
Synthesis-layer determinism (Explanation B confirmed)

Retrieval set varied day to day, with different GIS noise entries and different supporting academic sources, but thegeolab.net held three to four retrieval slots every run and won synthesis selection every day. Day 3 ranks: 1, 2, 3, 8 in the retrieval set; cited inline at [1][2][3]. ThatWare absent all three days. everything-pr.com absent all three days.

Jaccard mean 0.458, nearly three times the platform average of 0.157, confirming retrieval mechanism variability. This quantifies what “variable retrieval set, stable citation” means in practice: the source set rotates, but the synthesis model’s preference does not. The /retrieval-probability/ has achieved a stable synthesis preference on its core concept query.

Implication for E027: The “deterministic citation-identity” framing in the Zenodo paper should be updated to specify synthesis-layer determinism. The zero-variance is a synthesis model preference, not an index artefact. That distinction matters: synthesis preferences are more robust to competitive retrieval pressure than index-level rankings.

Q05
what is the GEO Stack framework
Synthesis displacement by everything-pr.com

Retrieval set was highly stable (mean Jaccard 0.815), thegeolab.net present at rank 4–6 across all three days. everything-pr.com present at rank 1 all three days. But synthesis selection was not stable: thegeolab.net cited on Day 1 and Day 3, dropped on Day 2. everything-pr.com was the synthesis layer owner on Day 2.

This is the fragile binding state. The page is in the retrieval set but loses the synthesis extraction competition on days when everything-pr.com’s chunk density better matches the query intent. The retrieval set is stable; the synthesis preference is not.

everything-pr.com deploys a four-layer Schema/Entity/Citation/Authority framework under the “GEO Stack” name without attribution to thegeolab.net. Its rank 1 retrieval position on this query is the active sedimentation risk flagged in competitive monitoring. A differentiation sentence was deployed to /geo-stack/ on 26 May 2026. The effect of that deployment is measured in the post-publication session below: everything-pr.com was absent from the retrieval set two days later, and thegeolab.net held rank 1 synthesis ownership on the same query.

Q06
what is extractability in AI search
Contested: no stable binding

No stable binding across the three days. Day 1 invalid (mode=1 artefact, thegeolab.net absent). Day 2 on Pro: thegeolab.net at rank 7, not cited. lumar.io owned the synthesis layer. Day 3 on Pro: thegeolab.net at rank 6, cited inline with a clean definitional snippet, the first post-re-crawl reading, following PerplexityBot re-crawl of /extractability/ on 20 May 2026.

The Day 3 recovery is confounded. The /extractability/ was re-crawled between Day 2 and Day 3. A content fix was also deployed during the E043 window. Neither variable was controlled. Whether the recovery reflects the content fix, the re-crawl, or normal retrieval variance cannot be separated from this dataset.

Jaccard mean 0.590 across valid sessions, indicating moderate retrieval set variance. The /extractability/ binding is not stable in either direction: not consistently cited, not consistently absent. It is the most sensitive of the three queries to retrieval set composition changes.

The Boundary Between Retrieval and Synthesis

E043’s three queries map onto two distinct binding states that practitioners optimising for AI citation need to distinguish:

Synthesis-layer binding (Q02): The synthesis model has a stable preference for the page on this query, regardless of what else is in the retrieval set. Topical authority is sufficient: the page enters the retrieval set and wins synthesis selection reliably. The binding is robust to moderate competitive retrieval pressure.

Synthesis-layer fragility (Q05, Q06): The page enters the retrieval set but loses the synthesis extraction competition when a competitor with denser relevant chunks is also present. Retrieval and citation are decoupled. The page is visible in the source list but absent from the answer. This is the fragile state, and it is invisible to any measurement method that only tracks citation outcomes without retrieval internals.

Topical authority determines retrieval set membership. Query-level chunk density determines synthesis selection. These are different levers, and the distinction has direct consequences for how to respond to a citation rate problem: retrieval-layer interventions (entity reinforcement, freshness, crawl access) and synthesis-layer interventions (definition density, answer-first section structure, compression resistance) require different content work.

The practical test: If your page appears in Perplexity’s source list but is not cited inline, the problem is synthesis-layer. If your page does not appear in the source list at all, the problem is retrieval-layer. E043’s Chrome DevTools methodology makes this distinction observable. The API alone cannot.

Post-Publication Measurement: Effect of Deployed Interventions

Two interventions were deployed to target pages after the E043 three-day window closed:

/geo-stack/ disambiguation sentence, deployed 26 May 2026. Adds “measurement-first research framework” language to differentiate from everything-pr.com’s four-layer Schema/Entity/Citation/Authority framing. Target: reduce everything-pr.com’s synthesis layer ownership on Q05.
/extractability/ definition section, deployed 26 May 2026. Adds a dense, answer-first definition block targeting the extractability concept query. Target: improve synthesis selection on Q06 against lumar.io.

The E044 Week 4 session (28 May 2026) provides the first post-deployment reads on all three E043 queries. Results below.

Query	E043 last read (Day 3, 21 May)	E044 Week 4 (28 May)	thegeolab rank	Synthesis owner	Delta
Q02: retrieval probability in GEO Stack model	✓ cited [1][2][3]	✓ cited	Rank 1	thegeolab.net	Binding strengthened: rank 1 synthesis owner vs ranks 1/2/3 previously. everything-pr and ThatWare absent.
Q05: what is the GEO Stack framework	✓ cited (rank 5, everything-pr rank 1)	✓ cited	Rank 1	thegeolab.net	Disambiguation sentence effective; everything-pr displaced, thegeolab now full synthesis owner within 2 days of deployment.
Q06: what is extractability in AI search	✓ cited (rank 6, post re-crawl)	✗ not cited	Not ranked (nrlc rank 1)	nrlc.ai	Regression: definition section deployment (26 May) has not yet shifted synthesis layer. nrlc.ai now owns the answer on this query.

E044 Week 4 session run 28 May 2026, Perplexity Pro mode=2, Chrome DevTools. Advanced Preview confirmed active on all three queries.

Two of three results moved in the expected direction. The /geo-stack/ disambiguation sentence deployed on 26 May displaced everything-pr.com from synthesis layer ownership on Q05 within two days. thegeolab.net now holds rank 1 on both Q02 and Q05. The /extractability/ definition section deployment has not produced the same outcome on Q06: nrlc.ai now owns the extractability synthesis layer, a regression from the Day 3 partial recovery. Whether this reflects insufficient content depth, a crawl cycle lag, or retrieval set composition changes requires further monitoring before any content intervention. The next E044 session (Week 5) will determine whether nrlc.ai’s ownership is stable or whether the definition section deployment takes effect after a re-crawl.

Implications for the GEO Lab Portfolio

E027 framing update: The E027 Zenodo paper describes Q02 binding as “deterministic citation-identity.” E043 specifies this as synthesis-layer determinism. The citation outcome is identical, 14 days of zero variance, but the mechanism is a synthesis preference, not an index artefact. This is a stronger finding than retrieval-layer determinism would imply, because synthesis preferences are more robust to retrieval set fluctuation.

E030 citation rate gradient explained: E030 found a 47-percentage-point citation rate drop from 2–4 word queries (61%) to 10–12 word queries (13%). E043’s mechanism data provides the retrieval-layer explanation: long queries at the 10–12 word tier trigger query rewrites that expand the retrieval set into generic GEO territory, where competitive density is high and the synthesis model cannot establish a stable preference. Short queries stay in proprietary concept namespaces where the synthesis preference for thegeolab.net is intact. The two experiments together describe the same phenomenon from different angles.

Competitive monitoring consequence: E043’s Q02 synthesis-layer finding means ThatWare’s rank 3 escalation on that query (E044 T1 trigger) is less immediately threatening than it would be under retrieval-layer determinism. ThatWare entering the retrieval set does not break the binding unless it also displaces thegeolab.net from the synthesis model’s selection. That requires higher content density at the chunk level, not just retrieval rank. The synthesis preference provides a temporary buffer. Monitor for further rank displacement.

Measurement methodology consequence: Any experiment that measures only citation outcomes (API-only, without retrieval internals) cannot distinguish retrieval failure from synthesis failure. A zero citation result could mean the page is absent from the retrieval set, or it could mean the page is in the retrieval set but losing synthesis selection. These require different interventions. The Chrome DevTools methodology used in E043 is the minimum requirement for diagnosing citation rate problems at the mechanism level.

Frequently Asked Questions

What is the difference between retrieval-layer and synthesis-layer determinism in Perplexity?

Retrieval-layer determinism means Perplexity issues the same search queries and retrieves the same source set every time. The citation is stable because the inputs are stable. Synthesis-layer determinism means the retrieval set varies between runs, but the synthesis model consistently selects the same source for citation regardless of what else is in the set. E043 confirms synthesis-layer determinism for Q02: the retrieval set varies day to day, but thegeolab.net is selected for citation every time.

Why does the distinction between retrieval-layer and synthesis-layer determinism in the retrieval mechanism matter?

The two mechanisms have different implications for competitive vulnerability. If binding is retrieval-layer, a competitor entering the top of the retrieval set can break it, even if your page stays in the index. If binding is synthesis-layer, competitors need to displace your page from the synthesis model’s preference, not just from the retrieval set. E043 finds synthesis-layer binding for Q02, which means the zero-variance result from E027 is more robust than retrieval-layer determinism would imply.

What is Jaccard similarity used for in E043?

Jaccard similarity measures how much the source domain set overlaps between two sessions. A score of 1.0 means the retrieval set was identical; 0.0 means no overlap at all. In E043, Q02 had a mean Jaccard of 0.458 across three days, nearly three times the Perplexity platform average of 0.157, while citation outcome was 3/3. This quantifies the synthesis-layer finding: the retrieval set varies substantially, but the citation is stable. Q05 had a mean Jaccard of 0.815, indicating a much more stable retrieval set but with synthesis displacement by everything-pr.com on Day 2.

What caused the Q06 binding break on Day 1 of E043?

Day 1’s Q06 break was a search-mode artefact. The session hit Perplexity’s free-tier preview limit mid-session, silently downgrading from Advanced Preview (mode=2) to basic search (mode=1). In basic mode, the /extractability/ page was absent from the retrieval set. On Day 3, run on a fresh Pro account with mode=2 confirmed throughout, /extractability/ returned to the retrieval set and was cited. The Day 3 recovery was also influenced by a PerplexityBot re-crawl of /extractability/ on 20 May, making the Day 3 result partially confounded by both the content fix and the re-crawl.

Key Takeaway

Perplexity’s zero-variance citation binding on proprietary-concept queries operates at the synthesis layer of the retrieval mechanism, not the retrieval-layer. The synthesis model maintains a stable selection preference regardless of retrieval set variation. Competitive displacement requires targeting the correct layer of the retrieval mechanism: specifically, dislodging the synthesis preference, not just the retrieval rank.

Data and Replication

Raw data: github.com/arturseo-geo/geo-lab-experiments/tree/main/e043
Files: e043_9_observations_filled.csv, e043_jaccard_scores.csv
Parent experiment: E027 Zenodo preprint, doi:10.5281/zenodo.20245814
E043 is a mechanistic observation. No pre-registration. Published as a research note.

Version History

Version 1.0, 28 May 2026: Initial publication. Three-day Chrome capture on E027 binding queries. Synthesis-layer determinism confirmed for Q02. Post-publication measurement from E044 Week 4 session included.

About the author: The GEO Lab founder Artur Ferreira has 20+ years of experience in SEO and organic growth strategy. He developed the GEO Stack framework and leads research into Generative Engine Optimisation methodologies. Connect on X/Twitter or LinkedIn.

Have questions? Contact The GEO Lab