The Noise Floor: Why Most GEO Experiment Results Are Uninterpretable The GEO Lab

Before You Trust Any GEO Experiment Result, Ask About the Noise Floor

By Artur Ferreira · The GEO Lab · Published: 18 May 2026 · Version 1.0

Citation rate on Perplexity varies by ±4–6 percentage points day to day with no content changes. Most published GEO “experiments” don’t mention this. Here’s what that omission means for every result they report.

TL;DR

The noise floor is the baseline citation rate variance that exists independently of any content change. On Perplexity, it runs at ±4–6 percentage points across consecutive days on the same query set with no site modifications. Any experiment reporting a delta smaller than the noise floor is reporting noise, not signal.

The GEO Lab ran E016 — five consecutive days, same 10 queries, zero content changes — specifically to establish this baseline before interpreting any other experiment result. This post explains why that had to happen first.

The Problem with Most Published GEO Experiments

Scroll LinkedIn on any given week and you will find at least one post claiming to have “tested” something about AI citations. Changed a heading format — citations went up 12%. Added schema — citations increased. Rewrote the intro — Perplexity started citing the page.

Almost none of them mention how many days the measurement ran — a gap that Search Engine Land has also flagged in their GEO coverage. Almost none specify the query sample size. Almost none establish what citation rate looked like before the change on multiple days. And almost none acknowledge the fundamental problem: citation rate is not a stable number even when nothing changes.

That last point is the one that makes most published GEO experiments uninterpretable. If your baseline is a single-day measurement and your post-change measurement is also a single-day reading, and the natural day-to-day variance in citation rate is larger than the delta you’re reporting, you have measured nothing. You’ve observed noise and called it a result.

This is not a minor methodological quibble. It is the difference between a finding and a story that sounds like a finding.

What a Noise Floor Is — and Why GEO Needs One

In signal processing, the noise floor is the level of background noise in a system that limits how small a signal you can meaningfully detect. Below the noise floor, you cannot distinguish signal from noise — any reading could be either.

In GEO measurement, the noise floor is the citation rate variance that exists independently of any content change. It is the range within which citation rate will fluctuate on any given platform, on any given day, for any given query set, even when the site, the content, and the queries are all held constant.

Every measurement system has a noise floor. As SparkToro’s zero-click study demonstrated for traditional search metrics, single-point measurements are unreliable when the underlying system has inherent variance. A thermometer has one — you cannot meaningfully distinguish a 0.01°C difference with a standard household thermometer. A bathroom scale has one — your weight fluctuates by 1–2kg across a single day without any change in actual body mass. The noise floor does not mean measurement is impossible. It means you need to know what it is before you can interpret what you’re measuring.

GEO research published without a noise floor is in the same position as a clinical trial with no control group. The treatment may have worked. It may also have been the natural variance of the outcome measure. Without knowing the variance, you cannot tell the difference. The experiment is not wrong — it is uninterpretable.

The GEO Lab established the noise floor as the first dependency for all other experiment interpretation. Every subsequent experiment — E002, E003, E006, the entity density test, the citation decay study — is only interpretable because E016 exists to tell us what baseline variance looks like.

E016: What the GEO Lab Measured

E016 ran for five consecutive days in April 2026. The protocol was deliberately minimal: the same 10 queries, run once per day on Perplexity sonar-pro, with no changes to the site, the content, the schema, or the query set across the five days. The single variable was time.

The question was not “does X affect citation rate?” It was: “what does citation rate do when nothing changes?”

Day 1
22%
baseline

Day 2
27%
+5pp

Day 3
19%
−8pp

Day 4
24%
+5pp

Day 5
23%
+1pp

E016 indicative daily citation rates — Perplexity sonar-pro, 10-query set, no content changes. Day-to-day range: 19%–27%. Peak-to-trough swing: 8 percentage points. Full data: thegeolab.net/e016-noise-floor-measurement/

E016: Citation rate across five consecutive days with zero content changes. The shaded band represents the ±4–6pp noise floor range. An 8pp swing (Day 2 → Day 3) with no intervention.

The range across five days was 19%–27% — an 8 percentage point swing with zero content changes. The natural variance of citation rate, absent any intervention, sits at approximately ±4–6 percentage points on a 10-query sample. This is the noise floor for the GEO Lab measurement system on Perplexity at the current domain authority level.

Two immediate implications follow from this number. First, any single-day delta smaller than 8 percentage points cannot be confidently attributed to a content change — it falls within the observed natural variance range. Second, the noise floor is domain- and query-specific: a higher-authority site with a larger query sample will have a different noise floor. E016 establishes the GEO Lab’s noise floor, not a universal constant.

Why Citation Rate Variance Exists on AI Platforms

The variance documented in E016 is not random noise in a statistical sense — it has identifiable sources. Understanding those sources explains both why the variance is unavoidable and why it is larger on some platforms than others.

Web search index freshness. Perplexity retrieves citations via live web search. The results of that search vary by the hour as the index updates, pages are recrawled, and competing content is added or removed. A page that ranks position 3 in Perplexity’s web search on Monday may rank position 7 on Tuesday after a competitor publishes relevant content — not because of anything the original page did.

Model temperature and sampling. AI language models use probabilistic sampling when generating responses. Even with identical retrieval results, the synthesis step introduces variation in which retrieved sources make it into the final response. Two identical queries run seconds apart can produce responses that cite different subsets of the same retrieval pool.

Query routing variation. Perplexity routes queries to different retrieval strategies depending on its assessment of query intent. A query phrased identically on two different days may be classified differently if the model’s assessment of intent shifts, sending it to a different retrieval path and producing a different citation set.

Index state at query time. Perplexity’s index is not a static snapshot — it is continuously updating. The same query run at 9am and 9pm may hit different index states, with different pages at the top of the retrieval pool.

None of these sources of variance are controllable from the publisher side. You cannot prevent Perplexity’s index from updating or prevent model sampling variation. This is why the noise floor is an empirical measurement, not a calculation — you have to run the same queries across multiple days and observe what happens.

Minimum Detectable Effect: The Number Every GEO Experiment Needs

The minimum detectable effect (MDE) is the smallest real change in citation rate that your measurement system can reliably distinguish from noise. It is a function of your noise floor, your sample size, and how many days you run the measurement.

For the GEO Lab measurement system — 10 queries per day, Perplexity sonar-pro, current domain authority — the E016 data establishes the following:

Measurement duration	Queries per day	Min. detectable effect	Practical meaning
1 day	10	~8–10pp	Only very large effects are detectable. Most GEO interventions produce smaller effects than this.
3 days (each condition)	10	~5–6pp	Medium effects become detectable. Minimum viable for most A/B style tests.
5 days (each condition)	10	~4pp	GEO Lab standard. Sufficient for detecting effects of the size typically produced by structural content changes.
5 days (each condition)	30	~2–3pp	Higher confidence. Required for testing subtle variables like schema format or heading style.

MDE estimates derived from E016 variance data. Values are approximate — actual MDE depends on the specific query set and domain. Higher domain authority typically reduces noise floor and MDE.

The practical consequence: if you run a GEO experiment for one day on a 10-query sample and report a +5pp citation rate increase, that number is inside the noise floor. You cannot say whether the intervention worked or whether the measurement simply landed on a good day for citation rate.

This is not a theoretical concern. The E016 data shows a +5pp swing between Day 1 and Day 2 with zero intervention. A +5pp “result” from a one-day test is indistinguishable from that movement.

📖

The GEO Workbook: 30-Day AI Visibility Action Plan

Includes the full GEO experiment protocol template — query design, measurement duration, noise floor check, and result interpretation framework. Download free →

What the Noise Floor Invalidates

Applying the noise floor standard to published GEO research produces an uncomfortable list. The following experiment types are structurally unable to produce interpretable results without multi-day measurement:

Experiment type	Common format	Problem	Valid version
Before/after schema change	“Added FAQ schema, citations went up 15%”	Before = 1 day, after = 1 day. Delta may be noise.	5-day baseline, change, 5-day post-measurement
Content rewrite test	“Rewrote intro declaratively, Perplexity cited us”	Single citation event on single day. No baseline variance established.	Multi-day multi-query protocol before and after rewrite
Platform comparison	“Perplexity cites us 3× more than ChatGPT”	If measured on a single day, both readings carry ±4–6pp uncertainty.	Average across ≥5 days per platform
Tool-reported AI visibility score	Dashboard showing “AI visibility: 34%”	Point-in-time measurement presented as a stable metric. Varies ±4–6pp by day.	Rolling 5-day average, shown with variance band

Common GEO experiment formats and their noise floor problems. This is not an exhaustive list — the issue applies to any measurement that does not account for daily citation rate variance.

The FAQ schema experiment (E002) is one of the few published GEO experiments that holds up under noise floor scrutiny. The −1.7% overall delta reported in E002 would normally be questionable — it falls within the noise floor range. But the null result is interpretable because the experiment ran 480 queries across three platforms, used 5 iterations per query, and produced a result consistent with zero effect across every platform and every query type. The consistency pattern is informative even when the aggregate delta is small.

A positive result of −1.7% would not be interpretable the same way. Null results at small deltas can be informative. Positive results at small deltas cannot — without multi-day measurement to rule out variance.

How to Run a GEO Experiment That Produces Interpretable Results

The methodology changes required to produce valid GEO experiments are not complicated. They require more time than a single-day test, but they do not require specialist tools or large budgets.

The five requirements for a valid GEO experiment, derived from the E016 findings:

1. Establish a noise floor before starting. Run your query set for at least three days with no changes and record the citation rate each day. Calculate the day-to-day variance range. This is your noise floor. Any result from the experiment must exceed this range to be interpretable as a real effect.

2. Use a fixed query set of at least 10 queries. Fewer than 10 queries produces citation rate readings that are too coarse — a single citation added or removed shifts the rate by 10 percentage points on a 10-query set. More queries reduce the impact of any individual retrieval event.

3. Run each condition for at least 5 days. Five days gives you enough data points to see whether the post-change citation rate is consistently different from the pre-change baseline, or whether it is bouncing within the noise floor range.

4. Make one change at a time. Changing schema, heading format, and content structure simultaneously makes it impossible to attribute any delta to a specific variable. One variable, one experiment.

5. Report the variance, not just the average. A result reported as “citation rate increased from 22% to 28%” obscures whether the post-change readings were consistently 28% or ranged from 19% to 37%. The variance tells you whether the effect is stable. The average alone does not.

The GEO Lab applies this protocol to every experiment published on the site. The 30-check citation protocol specifies query count, platform coverage, iteration count, and recording format. The E016 post provides the variance baseline against which all GEO Lab results are interpreted.

Key Takeaway

Citation rate on Perplexity varies by ±4–6 percentage points day to day with no content changes. This is the noise floor — the baseline variance that must be established before any experiment result can be interpreted.

Any delta smaller than the noise floor cannot be attributed to a content change
Single-day measurements are structurally uninterpretable for most GEO variables
The minimum viable experiment runs each condition for at least 5 days on at least 10 queries
Null results at small deltas (like E002’s −1.7%) can still be informative — positive results at small deltas cannot
Every noise floor is domain- and platform-specific. E016 establishes the GEO Lab’s — it is not a universal number

Frequently Asked Questions

What is a noise floor in GEO measurement?

The noise floor is the baseline citation rate variance that exists independently of any content change — the range within which citation rate fluctuates on a given platform across consecutive days when the site, content, and queries are all held constant. For thegeolab.net on Perplexity with a 10-query sample, the E016 experiment established a noise floor of approximately ±4–6 percentage points, with a peak-to-trough range of 8 percentage points across five days. Any experiment reporting a delta smaller than the noise floor is reporting noise, not signal from a content change.

Why does citation rate vary if nothing on the site changes?

Four sources produce natural citation rate variance: web search index freshness (Perplexity’s retrieval pool updates continuously, changing which pages appear at the top of results), model temperature and sampling (probabilistic generation means identical queries can produce different citation sets), query routing variation (the same query may be classified differently on different days, sending it to different retrieval strategies), and index state at query time (the index is a moving target, not a static snapshot). None of these are controllable from the publisher side, which is why the noise floor must be measured empirically rather than assumed to be zero.

How do you establish a noise floor for your own site?

Run a fixed set of at least 10 queries on your target platform for at least three consecutive days with no changes to your site, content, schema, or the query set. Record the citation rate each day. The range from lowest to highest reading is your observed variance — this is your noise floor. Any subsequent experiment result must produce a delta that consistently exceeds this range across multiple measurement days to be interpretable as a real effect rather than natural variance.

Does the noise floor apply to all AI platforms equally?

No — the noise floor is platform-specific. Perplexity’s live web search architecture produces more day-to-day variance than platforms that draw from a more stable knowledge base. Gemini’s grounding mechanism and ChatGPT’s web search each have different variance characteristics. The E016 experiment measured the noise floor on Perplexity specifically. Running the same protocol on Gemini or ChatGPT would produce different variance figures. The noise floor is also domain-specific: a higher-authority site that consistently ranks at the top of retrieval results will show lower variance than a newer site whose position in the retrieval pool is less stable.

What is the minimum experiment duration for a valid GEO test?

Based on E016 variance data, the minimum viable duration is five days of measurement per condition on a 10-query sample — meaning five days before the change (baseline) and five days after (post-change). This produces a minimum detectable effect of approximately 4 percentage points. For testing subtle variables like schema format or heading style, where real effects may be smaller, 30 queries per day over five days reduces the MDE to approximately 2–3 percentage points. Single-day measurements on 10-query samples have a minimum detectable effect of approximately 8–10 percentage points — larger than most GEO interventions are expected to produce.

Running your own GEO experiments? The GEO Lab publishes its full experiment methodology — query design, measurement protocol, and noise floor standards — alongside every experiment result. See the experiments index →

Version History

Version 1.0 — 18 May 2026: Initial publication. Noise floor concept, E016 variance data, minimum detectable effect framework, experiment validity standards.

About the author: Artur Ferreira is the founder of The GEO Lab with over 20 years (since 2004) of experience in SEO and organic growth strategy. He developed the GEO Stack framework and leads research into Generative Engine Optimisation methodologies. Connect on X/Twitter or LinkedIn.

Have questions? Contact The GEO Lab

Before You Trust Any GEO Experiment Result, Ask About the Noise Floor

The Problem with Most Published GEO Experiments

What a Noise Floor Is — and Why GEO Needs One

E016: What the GEO Lab Measured

Why Citation Rate Variance Exists on AI Platforms

Minimum Detectable Effect: The Number Every GEO Experiment Needs

What the Noise Floor Invalidates

How to Run a GEO Experiment That Produces Interpretable Results

Frequently Asked Questions

What is a noise floor in GEO measurement?

Why does citation rate vary if nothing on the site changes?

How do you establish a noise floor for your own site?

Does the noise floor apply to all AI platforms equally?

What is the minimum experiment duration for a valid GEO test?

Related

Version History