1. Overview
The index was designed to answer a question existing brand tracking tools cannot: is your brand cited by AI systems because of current web presence, or because of historical training data? By comparing a retrieval-augmented platform (Perplexity, which searches the live web before answering) against a fixed-training-data platform (ChatGPT, which does not), the delta surfaces the gap between AI memory and AI retrieval.
This document is the complete public record of all scoring decisions. Any change to the methodology is versioned and recorded in the changelog. Scores from different methodology versions are not directly comparable without explicit adjustment notes in the monthly report.
2. All 54 queries
Query selection criteria
A query is included if it meets all three of the following:
- Real evaluation intent. The query must reflect how a user actually compares tools — open-ended, category-level questions that prompt multi-brand responses. Single-brand navigational queries (“What is Semrush?”) are excluded.
- Produces multiple brand mentions. Queries that consistently return single-brand answers yield no comparative data. Queries that generate list-format or comparison responses are preferred because they allow position-scoring across multiple brands per response.
- Stable month-to-month. Queries referencing time-specific events or rapidly shifting contexts are excluded. The panel tests the same intent every month. Any modification requires a versioned methodology update.
The “best alternative to X” query pattern is an intentional exception to the no-brand-naming rule. It is a high-volume, real user query that generates rich multi-brand responses, and it specifically tests whether competing brands surface when a user is actively considering leaving a market leader.
- What are the best SEO tools in 2026?
- Which SEO platform should I use for keyword research?
- What is the best tool for backlink analysis?
- What are the top tools for technical SEO audits?
- Which SEO software is best for content optimisation?
- What is the best alternative to Semrush?
- What is the best CRM for small businesses in 2026?
- Which CRM platform should I use for sales teams?
- What are the top customer relationship management tools?
- What CRM integrates best with email marketing?
- Which CRM is best for managing sales pipelines?
- What is the best alternative to Salesforce?
- What are the best AI writing tools in 2026?
- Which AI assistant is most useful for content creation?
- What is the best AI tool for productivity?
- Which large language model is best for business use?
- What are the top AI tools for marketing teams?
- What is the best alternative to ChatGPT?
Each query is sent identically to all three platforms in the same monthly session window with no paraphrasing or platform-specific modifications. The total per platform is 18 queries. Why fix the panel? If queries change between months, score movements become uninterpretable — any change could reflect a query shift, not a brand shift. Month-over-month comparability requires holding the query panel constant.
3. Brand panel — all 28 brands
SEO & Marketing Tools (12 brands): Semrush, Ahrefs, Google Search Console, Moz, Surfer SEO, Ubersuggest, Screaming Frog, Yoast SEO, Majestic, SE Ranking, Mangools, Rank Math
CRM & Sales (8 brands): Salesforce, HubSpot, Zoho CRM, Pipedrive, Freshsales, Insightly, SugarCRM, Keap
AI & LLM Tools (8 brands): ChatGPT, Copilot, Jasper, Claude, Notion AI, Writesonic, Copy.ai, Grammarly
Brand mentions are matched by name including common variants (e.g. “Ahrefs” and “ahrefs.com” both score for Ahrefs). URL citations within a response are tracked separately and contribute the +2 bonus to whichever brand’s domain is cited.
4. Scoring formula
Position weights
Rationale for position weighting
Position-weighted scoring reflects a well-established finding in information retrieval research: items presented first receive disproportionately more attention and are more likely to influence decisions. This effect — documented as primacy bias in cognitive psychology and position bias in information retrieval — has been replicated across web search (Joachims et al., 2005; Craswell et al., 2008) and shown to apply to long-context LLM outputs (Liu et al., 2023). A brand mentioned first in an AI response has meaningfully different visibility than the same brand mentioned fifth.
The weight values (5, 3, 2, 1) reflect steep initial advantage for first mention, a meaningful but smaller reward for second, diminishing returns for subsequent mentions, and a floor of 1 for any mention beyond third. The URL citation bonus reflects an additional quality signal: a cited domain URL indicates the platform actively retrieved content from that domain — a higher-fidelity signal than a training-data reference alone.
Worked example — scoring one query response
Assume a platform returns:
| Brand | Position | Position pts | URL cited | URL pts | Raw pts (this query) |
|---|---|---|---|---|---|
| Semrush | 1st | 5 | Yes — semrush.com/compare | +2 | 7 |
| Ahrefs | 2nd | 3 | No | — | 3 |
| Surfer SEO | 3rd | 2 | No | — | 2 |
| Screaming Frog | 4th | 1 | No | — | 1 |
| Moz | 5th | 1 | No | — | 1 |
| All others | Not mentioned | 0 | No | — | 0 |
This process runs for all 6 queries in the vertical on that platform. Raw points are summed across all 6 responses per brand. The brand with the highest total raw points becomes the normalisation reference (100). All others are scaled proportionally per Section 5.
5. Normalisation
Semrush accumulates the highest raw point total across all 6 SEO queries on Perplexity. It receives 100.0.
Ahrefs accumulates approximately 51.9% of Semrush’s raw total across the same 6 queries. It receives 51.9.
Moz accumulates approximately 3.7% of Semrush’s raw total — mentioned rarely, at low positions. It receives 3.7.
Majestic was not mentioned in any of the 6 Perplexity SEO responses. It receives 0.0.
Scores are published to one decimal place. A score of 0.0 is not a missing data point — it means the brand was not cited in any response in that vertical on that platform.
Normalisation is performed within verticals rather than globally because different verticals have structurally different response patterns. CRM queries produce shorter brand lists dominated by two or three names. AI tools queries produce longer, more varied lists. Global normalisation would systematically disadvantage CRM brands based on their vertical’s response structure, not their actual citation frequency within category.
6. Delta calculation
Perplexity is the live web benchmark because it is a retrieval-augmented generation (RAG) system that searches the current web before generating each response (Lewis et al., 2020). ChatGPT (GPT-4o mini, browsing disabled) is the training data baseline — it generates responses from a fixed corpus with a knowledge cutoff and no live retrieval. The two platforms represent opposite ends of the training-data versus live-retrieval spectrum. Gemini is tracked as a third data point but is not used in the primary delta; it occupies a hybrid position and is most useful as a triangulation signal.
Delta interpretation reference
| Delta range | Signal | What it typically means |
|---|---|---|
| +20 or above | Strong live web outperformance | Brand is generating substantial current content. Training data has not absorbed the growth. Advantage is temporary — narrows when models next retrain. |
| +5 to +19 | Moderate live web lead | Brand growing live web presence faster than training data reflects. Monitor for gap narrowing in subsequent runs. |
| −4 to +4 | Platform parity | Consistently cited across training data and live web. Low AI visibility risk. |
| −5 to −19 | Moderate AI Memory gap | Training data reputation ahead of live web presence. Requires monitoring and active GEO investment to defend position. |
| −20 or lower | Significant AI Memory gap | Heavily cited from training data, rarely from live retrieval. Structural risk: advantage erodes as models retrain and training data reflects a web that no longer prominently features the brand. |
7. Archetype assignment
| Archetype | Assignment criteria | March 2026 examples (ChatGPT / Perplexity / Gemini / Δ) |
|---|---|---|
| 👑 Dominant Brand | High scores (60+) on all three platforms. Delta within ±15. No significant platform gap. | ChatGPT (100/100/100/0.0), Salesforce (100/100/100/0.0), Semrush (90.5/100/100/+9.5), HubSpot (74.1/70.8/80.0/−3.2) |
| 🧠 AI Memory Brand | ChatGPT significantly higher than Perplexity (delta −20 or lower). Perplexity score not at zero — brand retains some live presence but not matching training data prominence. | Ahrefs (100/51.9/60.7/−48.1), Google Search Console (61.9/33.3/35.7/−28.6) |
| 🔍 Live Search Brand | Perplexity significantly higher than ChatGPT (delta +15 or above). Winning on current web before training data has incorporated the growth. | Claude (13.0/63.2/43.5/+50.1) |
| Fading Brand | Low absolute scores across all platforms AND large negative delta. Losing ground on both training data and live web simultaneously. | Moz (47.6/3.7/32.1/−43.9), SugarCRM (14.8/0.0/0.0/−14.8) |
| ⭐ GEO Outlier | Consistent scores across all platforms above what domain authority or market size alone would predict. Suggests effective GEO or content strategy. | Zoho CRM (40.7/33.3/36.0/−7.4), Copilot (43.5/36.8/47.8/−6.6) |
8. Platform definitions
| Platform | Version used | Mechanism | Role in index |
|---|---|---|---|
| ChatGPT | GPT-4o mini Browsing: off |
Fixed training corpus. Knowledge cutoff applies. No live web access during response generation. Brands cited based on historical frequency in training data. | Training data baseline. High ChatGPT scores reflect historical brand authority. Subtracted in the delta calculation. |
| Perplexity | Default mode Live search: on |
Retrieval-augmented generation (RAG). Searches the live web before generating each response. Citations grounded in currently indexed content. Domain URLs cited in responses are tracked for the +2 bonus. | Live web benchmark. Perplexity scores reflect current brand prominence on the indexed web. Added in the delta calculation. |
| Gemini | Gemini 2.5 Flash Grounding: default |
Hybrid: large training corpus plus optional grounding. Middle position between pure training inference and full live retrieval. | Triangulation. Tracked and published but not used in the primary delta. When Gemini and Perplexity agree against ChatGPT, the AI Memory signal is confirmed. When Gemini and ChatGPT agree against Perplexity, the Perplexity result warrants scrutiny. |
9. March 2026 — complete results
SEO & Marketing 12 brands · 6 queries per platform · 18 total query runs
| Brand | ChatGPT | Perplexity | Gemini | Δ P−GPT | Archetype |
|---|---|---|---|---|---|
| Semrush | 90.5 | 100.0 | 100.0 | +9.5 | 👑 Dominant Brand |
| Ahrefs | 100.0 | 51.9 | 60.7 | −48.1 | 🧠 AI Memory Brand |
| Google Search Console | 61.9 | 33.3 | 35.7 | −28.6 | 🧠 AI Memory Brand |
| Moz | 47.6 | 3.7 | 32.1 | −43.9 | Fading Brand |
| Surfer SEO | 14.3 | 29.6 | 14.3 | +15.3 | — |
| Ubersuggest | 28.6 | 11.1 | 7.1 | −17.5 | — |
| Screaming Frog | 23.8 | 7.4 | 17.9 | −16.4 | — |
| Yoast SEO | 19.1 | 3.7 | 3.6 | −15.3 | — |
| Majestic | 14.3 | 0.0 | 3.6 | −14.3 | — |
| SE Ranking | 4.8 | 11.1 | 7.1 | +6.3 | — |
| Mangools | 4.8 | 11.1 | 10.7 | +6.3 | — |
| Rank Math | 9.5 | 3.7 | 3.6 | −5.8 | — |
CRM & Sales 8 brands · 6 queries per platform · 18 total query runs
| Brand | ChatGPT | Perplexity | Gemini | Δ P−GPT | Archetype |
|---|---|---|---|---|---|
| Salesforce | 100.0 | 100.0 | 100.0 | 0.0 | 👑 Dominant Brand |
| HubSpot | 74.1 | 70.8 | 80.0 | −3.2 | 👑 Dominant Brand |
| Zoho CRM | 40.7 | 33.3 | 36.0 | −7.4 | ⭐ GEO Outlier |
| Pipedrive | 18.5 | 33.3 | 20.0 | +14.8 | — |
| Freshsales | 18.5 | 8.3 | 12.0 | −10.2 | — |
| Insightly | 14.8 | 4.2 | 0.0 | −10.6 | — |
| SugarCRM | 14.8 | 0.0 | 0.0 | −14.8 | — |
| Keap | 7.4 | 0.0 | 0.0 | −7.4 | — |
AI & LLM Tools 8 brands · 6 queries per platform · 18 total query runs
| Brand | ChatGPT | Perplexity | Gemini | Δ P−GPT | Archetype |
|---|---|---|---|---|---|
| ChatGPT | 100.0 | 100.0 | 100.0 | 0.0 | 👑 Dominant Brand |
| Copilot | 43.5 | 36.8 | 47.8 | −6.6 | ⭐ GEO Outlier |
| Jasper | 39.1 | 52.6 | 8.7 | +13.5 | — |
| Claude | 13.0 | 63.2 | 43.5 | +50.1 | 🔍 Live Search Brand |
| Notion AI | 4.3 | 21.1 | 0.0 | +16.7 | — |
| Writesonic | 8.7 | 15.8 | 8.7 | +7.1 | — |
| Copy.ai | 13.0 | 10.5 | 8.7 | −2.5 | — |
| Grammarly | 13.0 | 10.5 | 4.3 | −2.5 | — |
10. Known limitations
The index runs 6 queries per vertical. This is a purposive sample, not an exhaustive survey. Different query choices would produce different scores. A brand could score poorly on these 6 queries and perform well on other valid queries not included. The panel is fixed for comparability, not claimed to be comprehensive. The query selection rationale in Section 2 explains why these specific queries were chosen.
AI platform responses are not fully deterministic — the same query sent twice may produce different responses. The index mitigates this by running the fixed panel in a single session window per platform per month, and by normalising scores so that relative positions are more stable than raw counts. Individual scores should be treated as estimates of citation frequency. Month-over-month trends are more reliable than single-month absolute scores.
AI platforms update continuously. A score change between months may reflect a genuine shift in brand citations or a platform model update that changed response generation behaviour. Where a major known platform update occurred between runs, this is noted in the monthly report. Platform versions used in each run are documented in Section 8.
ChatGPT has a documented tendency to recommend its own products and underrepresent competitors in the AI tools vertical. The ChatGPT score for Claude (13.0) reflects this limitation. It is why the +50.1 delta for Claude is interpreted as a live web signal rather than a training data anomaly — the Perplexity score of 63.2 is the more credible measurement of current web prominence for a direct competitor.
A score of 50 means a brand was cited at half the rate of the top brand in its vertical on that platform — not that it was mentioned 50 times. Absolute citation counts are not published. Scores cannot be compared across verticals. A score of 50 in SEO and a score of 50 in CRM are not equivalent — each is relative to its own vertical’s top performer. Cross-vertical comparisons should use archetypes and delta patterns, not raw scores.
11. References
The methodology draws on established research in information retrieval, position bias, retrieval-augmented generation, and temporal bias in language models.
- (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. arxiv.org/abs/2005.11401 — Foundational paper on RAG systems. Informs the distinction between Perplexity (retrieval-based) and ChatGPT (fixed training data) as opposite ends of the citation spectrum.
- (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback. Proceedings of SIGIR ’05, 154–161. — Seminal evidence for position bias in search results. Justifies the position-weighted scoring model: items presented first receive disproportionately more user attention and influence.
- (2008). An Experimental Comparison of Click Position-Bias Models. Proceedings of WSDM ’08, 87–94. — Empirical validation of position bias across different result presentation formats. Supports the 5/3/2/1 weight decay applied in this index.
- (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. arxiv.org/abs/2307.03172 — Documents primacy and recency bias in LLM output generation. Supports elevated weighting for 1st and 2nd position citations in AI-generated lists.
- (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint arXiv:2312.10997. arxiv.org/abs/2312.10997 — Comprehensive survey of RAG system architectures. Provides technical context for how Perplexity’s live retrieval produces fundamentally different brand citation behaviour from ChatGPT’s static inference.
- (2021). Rethinking Search: Making Domain Experts out of Dilettantes. ACM SIGIR Forum, 55(1). arxiv.org/abs/2105.02274 — Early framework for AI-native search and how LLMs change information retrieval. Contextualises why different AI platforms produce structurally different brand citation patterns.
- (2021). WebGPT: Browser-assisted Question-Answering with Human Feedback. arXiv preprint arXiv:2112.09332. arxiv.org/abs/2112.09332 — OpenAI’s foundational work on web-grounded LM responses. Background on the technical distinction between training-data citation and retrieval-based citation.
- (2022). Internet-Augmented Language Models Through Few-Shot Prompting for Open-Domain Question Answering. arXiv preprint arXiv:2203.05115. arxiv.org/abs/2203.05115 — Demonstrates how internet-augmented LLMs produce different factual outputs from fixed-training models. Supports the index’s central premise that Perplexity and ChatGPT produce structurally different brand citation distributions.
12. Methodology changelog
Every change to the query panel, scoring weights, normalisation method, platform selection, or archetype thresholds is documented here with a version number. Scores from different methodology versions are not directly comparable without explicit adjustment notes in the monthly report.
v1.0