1. Overview

The GEO Brand Citation Index runs a fixed panel of 54 queries per monthly cycle — 18 per platform across ChatGPT, Perplexity, and Gemini. Every brand mention in every AI response is extracted and scored by citation position. Raw scores are normalised to 0–100 within each vertical per platform. The Perplexity minus ChatGPT delta is the primary analytical output: it distinguishes brands cited from AI training data versus brands cited from the current live web.

The index was designed to answer a question existing brand tracking tools cannot: is your brand cited by AI systems because of current web presence, or because of historical training data? By comparing a retrieval-augmented platform (Perplexity, which searches the live web before answering) against a fixed-training-data platform (ChatGPT, which does not), the delta surfaces the gap between AI memory and AI retrieval.

This document is the complete public record of all scoring decisions. Any change to the methodology is versioned and recorded in the changelog. Scores from different methodology versions are not directly comparable without explicit adjustment notes in the monthly report.

2. All 54 queries

The index runs 6 queries per vertical per platform, sent identically to ChatGPT, Perplexity, and Gemini. The panel is fixed month-to-month. 6 queries × 3 verticals × 3 platforms = 54 total query runs per monthly cycle. Queries are phrased as natural evaluation questions reflecting real user intent when comparing tools.

Query selection criteria

A query is included if it meets all three of the following:

  1. Real evaluation intent. The query must reflect how a user actually compares tools — open-ended, category-level questions that prompt multi-brand responses. Single-brand navigational queries (“What is Semrush?”) are excluded.
  2. Produces multiple brand mentions. Queries that consistently return single-brand answers yield no comparative data. Queries that generate list-format or comparison responses are preferred because they allow position-scoring across multiple brands per response.
  3. Stable month-to-month. Queries referencing time-specific events or rapidly shifting contexts are excluded. The panel tests the same intent every month. Any modification requires a versioned methodology update.

The “best alternative to X” query pattern is an intentional exception to the no-brand-naming rule. It is a high-volume, real user query that generates rich multi-brand responses, and it specifically tests whether competing brands surface when a user is actively considering leaving a market leader.

SEO & Marketing Tools — 6 queries
  1. What are the best SEO tools in 2026?
  2. Which SEO platform should I use for keyword research?
  3. What is the best tool for backlink analysis?
  4. What are the top tools for technical SEO audits?
  5. Which SEO software is best for content optimisation?
  6. What is the best alternative to Semrush?
CRM & Sales — 6 queries
  1. What is the best CRM for small businesses in 2026?
  2. Which CRM platform should I use for sales teams?
  3. What are the top customer relationship management tools?
  4. What CRM integrates best with email marketing?
  5. Which CRM is best for managing sales pipelines?
  6. What is the best alternative to Salesforce?
AI & LLM Tools — 6 queries
  1. What are the best AI writing tools in 2026?
  2. Which AI assistant is most useful for content creation?
  3. What is the best AI tool for productivity?
  4. Which large language model is best for business use?
  5. What are the top AI tools for marketing teams?
  6. What is the best alternative to ChatGPT?

Each query is sent identically to all three platforms in the same monthly session window with no paraphrasing or platform-specific modifications. The total per platform is 18 queries. Why fix the panel? If queries change between months, score movements become uninterpretable — any change could reflect a query shift, not a brand shift. Month-over-month comparability requires holding the query panel constant.

3. Brand panel — all 28 brands

March 2026 tracked 28 brands across 3 verticals. Brands are selected for category relevance and market presence — not paid inclusion. A brand receives 0.0 if it is not mentioned in any response in its vertical on a given platform. Zero is a valid and meaningful score, not a data absence.

SEO & Marketing Tools (12 brands): Semrush, Ahrefs, Google Search Console, Moz, Surfer SEO, Ubersuggest, Screaming Frog, Yoast SEO, Majestic, SE Ranking, Mangools, Rank Math

CRM & Sales (8 brands): Salesforce, HubSpot, Zoho CRM, Pipedrive, Freshsales, Insightly, SugarCRM, Keap

AI & LLM Tools (8 brands): ChatGPT, Copilot, Jasper, Claude, Notion AI, Writesonic, Copy.ai, Grammarly

Brand mentions are matched by name including common variants (e.g. “Ahrefs” and “ahrefs.com” both score for Ahrefs). URL citations within a response are tracked separately and contribute the +2 bonus to whichever brand’s domain is cited.

4. Scoring formula

Each brand mention is scored by its ordinal position within the AI response. First mention carries the most weight. A cited URL adds a bonus. Scores are summed across all 6 queries per vertical per platform, then normalised.

Position weights

1st mention
5
points
2nd mention
3
points
3rd mention
2
points
4th+ mention
1
point each
URL cited
+2
bonus

Rationale for position weighting

Position-weighted scoring reflects a well-established finding in information retrieval research: items presented first receive disproportionately more attention and are more likely to influence decisions. This effect — documented as primacy bias in cognitive psychology and position bias in information retrieval — has been replicated across web search (Joachims et al., 2005; Craswell et al., 2008) and shown to apply to long-context LLM outputs (Liu et al., 2023). A brand mentioned first in an AI response has meaningfully different visibility than the same brand mentioned fifth.

The weight values (5, 3, 2, 1) reflect steep initial advantage for first mention, a meaningful but smaller reward for second, diminishing returns for subsequent mentions, and a floor of 1 for any mention beyond third. The URL citation bonus reflects an additional quality signal: a cited domain URL indicates the platform actively retrieved content from that domain — a higher-fidelity signal than a training-data reference alone.

Worked example — scoring one query response

Example query: “What are the best SEO tools in 2026?” — one platform response scored

Assume a platform returns:

Semrush is generally the most comprehensive SEO platform, covering keyword research, backlink analysis, and site audits in one place. Ahrefs is a strong alternative particularly for backlink data. Surfer SEO has become popular for content optimisation. Other tools worth considering include Screaming Frog for technical audits and Moz for beginners. You can compare options at semrush.com/compare.”
BrandPositionPosition ptsURL citedURL ptsRaw pts (this query)
Semrush1st5Yes — semrush.com/compare+27
Ahrefs2nd3No3
Surfer SEO3rd2No2
Screaming Frog4th1No1
Moz5th1No1
All othersNot mentioned0No0

This process runs for all 6 queries in the vertical on that platform. Raw points are summed across all 6 responses per brand. The brand with the highest total raw points becomes the normalisation reference (100). All others are scaled proportionally per Section 5.

5. Normalisation

Raw point totals are normalised to 0–100 within each vertical and platform separately. The brand with the highest raw total in a vertical on a given platform receives 100. All others scale proportionally. Scores are comparable within a vertical across monthly runs — not across different verticals.
Normalised score = (brand raw total ÷ highest raw total in vertical on that platform) × 100
Normalisation worked example — SEO vertical, Perplexity, March 2026

Semrush accumulates the highest raw point total across all 6 SEO queries on Perplexity. It receives 100.0.

Ahrefs accumulates approximately 51.9% of Semrush’s raw total across the same 6 queries. It receives 51.9.

Moz accumulates approximately 3.7% of Semrush’s raw total — mentioned rarely, at low positions. It receives 3.7.

Majestic was not mentioned in any of the 6 Perplexity SEO responses. It receives 0.0.

Scores are published to one decimal place. A score of 0.0 is not a missing data point — it means the brand was not cited in any response in that vertical on that platform.

Normalisation is performed within verticals rather than globally because different verticals have structurally different response patterns. CRM queries produce shorter brand lists dominated by two or three names. AI tools queries produce longer, more varied lists. Global normalisation would systematically disadvantage CRM brands based on their vertical’s response structure, not their actual citation frequency within category.

6. Delta calculation

The delta is Perplexity normalised score minus ChatGPT normalised score. Negative delta = brand cited more from training data than live web. Positive delta = brand winning on current web before training data has caught up.
Delta (Δ) = Perplexity normalised score − ChatGPT normalised score

Perplexity is the live web benchmark because it is a retrieval-augmented generation (RAG) system that searches the current web before generating each response (Lewis et al., 2020). ChatGPT (GPT-4o mini, browsing disabled) is the training data baseline — it generates responses from a fixed corpus with a knowledge cutoff and no live retrieval. The two platforms represent opposite ends of the training-data versus live-retrieval spectrum. Gemini is tracked as a third data point but is not used in the primary delta; it occupies a hybrid position and is most useful as a triangulation signal.

Delta interpretation reference

Delta rangeSignalWhat it typically means
+20 or aboveStrong live web outperformanceBrand is generating substantial current content. Training data has not absorbed the growth. Advantage is temporary — narrows when models next retrain.
+5 to +19Moderate live web leadBrand growing live web presence faster than training data reflects. Monitor for gap narrowing in subsequent runs.
−4 to +4Platform parityConsistently cited across training data and live web. Low AI visibility risk.
−5 to −19Moderate AI Memory gapTraining data reputation ahead of live web presence. Requires monitoring and active GEO investment to defend position.
−20 or lowerSignificant AI Memory gapHeavily cited from training data, rarely from live retrieval. Structural risk: advantage erodes as models retrain and training data reflects a web that no longer prominently features the brand.

7. Archetype assignment

Archetypes are assigned based on cross-platform score patterns — the shape of scores across all three platforms, not the absolute level. Assignment is reviewed manually each month. Thresholds below are guidelines; edge cases are assessed in context. Brands that do not meet clear criteria are published without an archetype label.
ArchetypeAssignment criteriaMarch 2026 examples (ChatGPT / Perplexity / Gemini / Δ)
👑 Dominant BrandHigh scores (60+) on all three platforms. Delta within ±15. No significant platform gap.ChatGPT (100/100/100/0.0), Salesforce (100/100/100/0.0), Semrush (90.5/100/100/+9.5), HubSpot (74.1/70.8/80.0/−3.2)
🧠 AI Memory BrandChatGPT significantly higher than Perplexity (delta −20 or lower). Perplexity score not at zero — brand retains some live presence but not matching training data prominence.Ahrefs (100/51.9/60.7/−48.1), Google Search Console (61.9/33.3/35.7/−28.6)
🔍 Live Search BrandPerplexity significantly higher than ChatGPT (delta +15 or above). Winning on current web before training data has incorporated the growth.Claude (13.0/63.2/43.5/+50.1)
🩽 Fading BrandLow absolute scores across all platforms AND large negative delta. Losing ground on both training data and live web simultaneously.Moz (47.6/3.7/32.1/−43.9), SugarCRM (14.8/0.0/0.0/−14.8)
⭐ GEO OutlierConsistent scores across all platforms above what domain authority or market size alone would predict. Suggests effective GEO or content strategy.Zoho CRM (40.7/33.3/36.0/−7.4), Copilot (43.5/36.8/47.8/−6.6)

8. Platform definitions

Three platforms are used because they represent three meaningfully distinct AI citation mechanisms. Two form the delta endpoints; one provides triangulation.
PlatformVersion usedMechanismRole in index
ChatGPT GPT-4o mini
Browsing: off
Fixed training corpus. Knowledge cutoff applies. No live web access during response generation. Brands cited based on historical frequency in training data. Training data baseline. High ChatGPT scores reflect historical brand authority. Subtracted in the delta calculation.
Perplexity Default mode
Live search: on
Retrieval-augmented generation (RAG). Searches the live web before generating each response. Citations grounded in currently indexed content. Domain URLs cited in responses are tracked for the +2 bonus. Live web benchmark. Perplexity scores reflect current brand prominence on the indexed web. Added in the delta calculation.
Gemini Gemini 2.5 Flash
Grounding: default
Hybrid: large training corpus plus optional grounding. Middle position between pure training inference and full live retrieval. Triangulation. Tracked and published but not used in the primary delta. When Gemini and Perplexity agree against ChatGPT, the AI Memory signal is confirmed. When Gemini and ChatGPT agree against Perplexity, the Perplexity result warrants scrutiny.

9. March 2026 — complete results

All normalised scores for all 28 brands across all 3 platforms. Run #1, March 2026. Scores are normalised 0–100 within each vertical per platform. Delta = Perplexity minus ChatGPT. Brands with 0.0 were not mentioned in any response in that vertical on that platform.

SEO & Marketing 12 brands · 6 queries per platform · 18 total query runs

BrandChatGPTPerplexityGeminiΔ P−GPTArchetype
Semrush90.5100.0100.0+9.5👑 Dominant Brand
Ahrefs100.051.960.7−48.1🧠 AI Memory Brand
Google Search Console61.933.335.7−28.6🧠 AI Memory Brand
Moz47.63.732.1−43.9🩽 Fading Brand
Surfer SEO14.329.614.3+15.3
Ubersuggest28.611.17.1−17.5
Screaming Frog23.87.417.9−16.4
Yoast SEO19.13.73.6−15.3
Majestic14.30.03.6−14.3
SE Ranking4.811.17.1+6.3
Mangools4.811.110.7+6.3
Rank Math9.53.73.6−5.8

CRM & Sales 8 brands · 6 queries per platform · 18 total query runs

BrandChatGPTPerplexityGeminiΔ P−GPTArchetype
Salesforce100.0100.0100.00.0👑 Dominant Brand
HubSpot74.170.880.0−3.2👑 Dominant Brand
Zoho CRM40.733.336.0−7.4⭐ GEO Outlier
Pipedrive18.533.320.0+14.8
Freshsales18.58.312.0−10.2
Insightly14.84.20.0−10.6
SugarCRM14.80.00.0−14.8
Keap7.40.00.0−7.4

AI & LLM Tools 8 brands · 6 queries per platform · 18 total query runs

BrandChatGPTPerplexityGeminiΔ P−GPTArchetype
ChatGPT100.0100.0100.00.0👑 Dominant Brand
Copilot43.536.847.8−6.6⭐ GEO Outlier
Jasper39.152.68.7+13.5
Claude13.063.243.5+50.1🔍 Live Search Brand
Notion AI4.321.10.0+16.7
Writesonic8.715.88.7+7.1
Copy.ai13.010.58.7−2.5
Grammarly13.010.54.3−2.5

10. Known limitations

Scores represent relative citation frequency within this specific query panel — not absolute mention volume, domain authority, market share, or commercial success. The following limitations are acknowledged and published transparently.
Limitation 1 — Query panel coverage

The index runs 6 queries per vertical. This is a purposive sample, not an exhaustive survey. Different query choices would produce different scores. A brand could score poorly on these 6 queries and perform well on other valid queries not included. The panel is fixed for comparability, not claimed to be comprehensive. The query selection rationale in Section 2 explains why these specific queries were chosen.

Limitation 2 — Platform response variability

AI platform responses are not fully deterministic — the same query sent twice may produce different responses. The index mitigates this by running the fixed panel in a single session window per platform per month, and by normalising scores so that relative positions are more stable than raw counts. Individual scores should be treated as estimates of citation frequency. Month-over-month trends are more reliable than single-month absolute scores.

Limitation 3 — Platform model updates

AI platforms update continuously. A score change between months may reflect a genuine shift in brand citations or a platform model update that changed response generation behaviour. Where a major known platform update occurred between runs, this is noted in the monthly report. Platform versions used in each run are documented in Section 8.

Limitation 4 — ChatGPT self-reference bias

ChatGPT has a documented tendency to recommend its own products and underrepresent competitors in the AI tools vertical. The ChatGPT score for Claude (13.0) reflects this limitation. It is why the +50.1 delta for Claude is interpreted as a live web signal rather than a training data anomaly — the Perplexity score of 63.2 is the more credible measurement of current web prominence for a direct competitor.

Limitation 5 — Scores are relative, not absolute

A score of 50 means a brand was cited at half the rate of the top brand in its vertical on that platform — not that it was mentioned 50 times. Absolute citation counts are not published. Scores cannot be compared across verticals. A score of 50 in SEO and a score of 50 in CRM are not equivalent — each is relative to its own vertical’s top performer. Cross-vertical comparisons should use archetypes and delta patterns, not raw scores.

11. References

The methodology draws on established research in information retrieval, position bias, retrieval-augmented generation, and temporal bias in language models.

  • Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. arxiv.org/abs/2005.11401 — Foundational paper on RAG systems. Informs the distinction between Perplexity (retrieval-based) and ChatGPT (fixed training data) as opposite ends of the citation spectrum.
  • Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback. Proceedings of SIGIR ’05, 154–161. — Seminal evidence for position bias in search results. Justifies the position-weighted scoring model: items presented first receive disproportionately more user attention and influence.
  • Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008). An Experimental Comparison of Click Position-Bias Models. Proceedings of WSDM ’08, 87–94. — Empirical validation of position bias across different result presentation formats. Supports the 5/3/2/1 weight decay applied in this index.
  • Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. arxiv.org/abs/2307.03172 — Documents primacy and recency bias in LLM output generation. Supports elevated weighting for 1st and 2nd position citations in AI-generated lists.
  • Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., … Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint arXiv:2312.10997. arxiv.org/abs/2312.10997 — Comprehensive survey of RAG system architectures. Provides technical context for how Perplexity’s live retrieval produces fundamentally different brand citation behaviour from ChatGPT’s static inference.
  • Metzler, D., Yi, T., Mahmoudi, G., & Najork, M. (2021). Rethinking Search: Making Domain Experts out of Dilettantes. ACM SIGIR Forum, 55(1). arxiv.org/abs/2105.02274 — Early framework for AI-native search and how LLMs change information retrieval. Contextualises why different AI platforms produce structurally different brand citation patterns.
  • Nakano, R., Hilton, J., Balwit, A., Wu, J., Ouyang, L., Kim, C., … Schulman, J. (2021). WebGPT: Browser-assisted Question-Answering with Human Feedback. arXiv preprint arXiv:2112.09332. arxiv.org/abs/2112.09332 — OpenAI’s foundational work on web-grounded LM responses. Background on the technical distinction between training-data citation and retrieval-based citation.
  • Lazaridou, A., Gribovskaya, E., Stokowiec, W., & Grigorev, N. (2022). Internet-Augmented Language Models Through Few-Shot Prompting for Open-Domain Question Answering. arXiv preprint arXiv:2203.05115. arxiv.org/abs/2203.05115 — Demonstrates how internet-augmented LLMs produce different factual outputs from fixed-training models. Supports the index’s central premise that Perplexity and ChatGPT produce structurally different brand citation distributions.

12. Methodology changelog

Every change to the query panel, scoring weights, normalisation method, platform selection, or archetype thresholds is documented here with a version number. Scores from different methodology versions are not directly comparable without explicit adjustment notes in the monthly report.

Version history
March 2026
v1.0
Initial publication. GEO Brand Citation Index Run #1. 3 verticals (SEO & Marketing, CRM & Sales, AI & LLM Tools), 6 queries per vertical, 28 brands, 3 platforms (ChatGPT GPT-4o mini, no browsing, Perplexity default, Gemini 2.5 Flash). Position weights: 5/3/2/1 + URL bonus +2. Normalised 0–100 within vertical per platform. Delta = Perplexity − ChatGPT. Five archetypes: Dominant Brand, AI Memory Brand, Live Search Brand, Fading Brand, GEO Outlier.