Does query length affect citation rate on Perplexity? E030 pre-registers a controlled test across 3 fan-out length tiers, 3 stably-cited pages, and 45 frozen queries.
I’m about to spend 5 days asking Perplexity 45 versions of the same three questions, just to find out whether typing more words makes a citation more likely.
This is, on paper, a silly thing to do. The interesting thing is that nobody has actually tested it.
The gap I keep tripping over
Every controlled experiment I’ve run at The GEO Lab so far has varied the page and held the queries constant. E001 changed declarative vs narrative structure. EDX changed entity density. E025 (active) changes external entity reinforcement. The query set sits there as a fixed baseline.
There’s an obvious axis nobody’s testing across the experiments I’ve published or got active so far: the queries themselves. Same page, same content, same authority, same everything — but ask the question differently and see if the citation rate moves.
I was confused for a while about why this hadn’t been done. Then I realised the reason: the design space for “ways to ask the same question” is enormous and most of it is uncontrolled. Take “what is extractability in AI search” and “how does extractability work for Perplexity citation” — same intent, but they vary on at least 4 dimensions: length (6 vs 9 words), intent type (definitional vs mechanistic), platform anchoring (none vs Perplexity), and complexity. Pick any two and you’ve got an interaction. Pick all of them and you’ve got a paper, not an experiment.

Then in late April, Dr. Pete Meyers gave a talk at BrightonSEO 2026 called “The Infinite Tail,” and slide 64 carved off one specific axis as a clean variable: query length.
His proposal: AI fan-out generates queries at three approximate length tiers — 2-4 words (root), 6-8 words (mid), 10-12 words (deep). Whether you accept those exact bounds or not, the tiered framing isolates length as a single dimension and lets you hold everything else constant.
That’s an experiment.
What I’m pre-registering: length vs citation rate
E030 — Fan-out Length × Citation Rate. Frozen 2026-05-10. Measurement window 2026-05-19 to 2026-05-23.
The hypothesis:
On the same page set, same intent, same fan-out category, citation rate on Perplexity goes up as query length goes up. Specifically, the longest tier (10-12 words) clears the shortest tier (2-4 words) by at least 22 percentage points, which is the noise floor I measured in E016 last month.
The pre-registered null:
No pair of length tiers is separated by 22 percentage points or more. Length doesn’t matter — Perplexity matches by intent regardless of phrasing length.
I am, on balance, predicting H1. But the entire reason for pre-registering is that I lock the prediction in writing before the data arrives. If the null holds, the post that goes live on 2026-05-29 will say so. That’s the deal.
How the experiment is built
3 pages, all stably cited in E014 M2 (April 2026):
I started with all 5 GEO Stack layer pages, then E014 measurement data through April 2026 showed only 3 of them have observed Perplexity citation history. Including pages with a 0% citation baseline would have masked any length effect on the cited pages. They’d always fail to be cited regardless of query length, conflating “did length matter?” with “is this page in the cited set at all?” So the experiment uses the 3-page subset only. The generalisability claim is “length affects citation rate among already-citable T1 pages” — not “length moves uncited pages onto the cited list.” That’s a different experiment.
45 queries, 5 per length tier per page:
Each query is anchored to the same root noun phrase across all three length tiers. For example, on /extractability/:
- 2-4 words: “extractability AI search”
- 6-8 words: “what is extractability in AI search”
- 10-12 words: “how does extractability affect citation rate in Perplexity AI search”
Length variation comes from added qualifiers (verbs, modifiers, context), not from synonym substitution. All queries are in Pete Meyers’ Semantic fan-out category — same intent, same semantic core, length-only variation. No brand mentions. No category mixing. No edits during the measurement window.
The full query set is on GitHub at e030_query_set.csv. 45 rows, frozen at pre-registration time, programmatically verified for word-count compliance.
5-day measurement window:
Same protocol as E016 (the noise floor experiment) and E027 (the Perplexity zero-variance replication). One iteration per query per day. 45 × 5 = 225 measurements per platform. Perplexity is the primary platform; ChatGPT runs in parallel as observation data only because, per E016, ChatGPT cites 0-2 pages per day on a 10-query set — too small for a controlled comparison at this sample size.
Each measurement is binary: cited or not cited. URL match against the 3 target pages. No partial credit. No soft-match. This is the same binary protocol used across all GEO Brand Citation Index runs.
On the 3-page stably-cited subset, citation identity is deterministic per E027. Citation rate is therefore the variable axis where length effects, if they exist, would manifest. A null result on E030 would rule out length as a Layer 1 driver; a confirmed result would extend Pete Meyers’ fan-out framework with a quantified citation-rate effect size.

The four outcomes I’m pre-committing to
1. Monotonic confirmation: 10-12 word > 6-8 word > 2-4 word, largest gap clears 22pp. H1 holds. Length matters.
2. Inverted result: 2-4 word > 10-12 word, gap clears 22pp. H1 falsified. Perplexity favours short root queries — possibly because retrieval rewards keyword density over compositional matching.
3. Plateau: 6-8 word and 10-12 word equivalent, both above 2-4 word, gap clears 22pp. Length matters but as a threshold, not a gradient.
4. All-null: No pair clears 22pp. Length isn’t a Layer 1 driver. Reported as null, full stop.
Three of these four outcomes are uncomfortable to write. One of them happens.
The confounds I haven’t solved
A 10-12 word query isn’t a pure length increase. It carries more disambiguation. Add the word “Perplexity” to a query about extractability and you’ve also added a platform anchor that wasn’t there before. The “same root noun phrase” rule controls for synonym drift but not for the inevitable narrowing that comes with extra qualifiers.
I’ve documented this as the primary confound in the pre-registration. The control is to clamp every query to the Semantic fan-out category, where added words act as qualifiers around a fixed semantic core rather than introducing new entities. If Pete Meyers’ taxonomy is wrong about Semantic being a clean category — or if my classification of these 45 queries into Semantic is wrong — that’s the failure mode that hits hardest.
The page-set selection bias is also explicit. Choosing only stably-cited pages means I can’t say anything about whether length variation could promote previously-uncited pages onto the cited list. That’s a real limitation. It’s also a deliberate scope choice — I want a clean test of the length variable on pages where I know the floor isn’t already 0%.
Why the pre-registration
If I ran this experiment the normal SEO-blog way — measure first, write second — I’d be writing a story, not an experiment. Whatever pattern showed up in the data, I’d find a way to make the hypothesis fit. The pre-registration locks the hypothesis in writing before the data exists. If the null holds, the result post says so. If H1 is inverted, the result post says so. The methodology survives whichever way the data lands.
This is the standard E025 (entity reinforcement, also active) and E016 (noise floor, published last month) work to. It costs more to do it this way. The trade-off is that the result, whichever direction it goes, is interpretable.
Credit where it’s due
The 2-4 / 6-8 / 10-12 word fan-out length pyramid is from Dr. Pete Meyers, “The Infinite Tail: Keyword Research for AI”, BrightonSEO April 2026 (slide 64). His deck is the cleanest operationalisation of AI query expansion I’ve read this year. The framework would have taken me considerably longer to derive on my own.
The GEO Lab contribution here is per-platform per-page citation-rate measurement against the length pyramid, with pre-registered falsification criteria, noise-floor calibration from E016, isolation from category effects, and the page-set selection from E014 M2 history.
Timeline
- Pre-registration: 2026-05-12 (this post)
- Measurement window: 2026-05-19 to 2026-05-23 (5 consecutive days)
- Data analysis: 2026-05-24 to 2026-05-26
- Result post: 2026-05-29
Win, lose, or null — the result post lands on 2026-05-29.
If H1 holds, longer queries get cited more, and the implication is that section-level extractability optimised for compound questions has a structural advantage over short-generic-section optimisation. If H0 holds, length doesn’t matter and we’ve just ruled out one variable. Both are findings.
The dataset will be deposited on Zenodo at result-post time, matching the E027 publication protocol. The 45-query frozen set is already on GitHub.
I’ll update this post with the result link on 2026-05-29.
Key Takeaway
E030 pre-registers the hypothesis that Perplexity citation rate increases with query length. Frozen 45-query set across 3 length tiers, 3 stably-cited pages. Measurement window 2026-05-19 to 2026-05-23 — result post 2026-05-29 regardless of outcome.
Frozen query set: https://github.com/arturseo-geo/geo-citation-index/blob/main/data/e030/e030_query_set.csv
Noise floor reference (E016): https://thegeolab.net/e016-noise-floor-measurement/
Related ongoing experiment (E025): https://thegeolab.net/e025-entity-reinforcement-intervention/
Version History
- Version 1.0 — 12 May 2026: Pre-registration. Hypothesis, null prediction, methodology, four pre-committed outcomes, and known confounds recorded ahead of the measurement window opening 2026-05-19. Frozen 45-query set deposited at github.com/arturseo-geo/geo-citation-index. Result post scheduled 2026-05-29 regardless of outcome.
Have questions about this topic? Contact The GEO Lab · Return to homepage

