🤖 Free Tool · No account needed
AI Crawler User-Agent Directory
Every major AI crawler user-agent string, what it does, and the exact robots.txt rules to allow or block it. Build your robots.txt with one click.
AI CRAWLER USER-AGENT DIRECTORY
GEOLAB.NET
ROBOTS.TXT BUILDER — click crawlers below to add them
# Your robots.txt rules will appear here as you select crawlers below
Copy
0 crawlers configured
Understanding AI crawlers
1
Training crawlers vs real-time crawlers
Some AI crawlers collect data to train models (GPTBot for training, ClaudeBot). Others fetch content in real-time to answer queries (PerplexityBot, GoogleBot for AI Overviews). Blocking a real-time crawler prevents citations. Blocking a training crawler prevents model training but may still allow citations.
2
Blocking a crawler is permanent until you reverse it
A Disallow rule in robots.txt takes effect at the next crawl cycle — typically within days. Removing it allows crawling again. But if a crawler was blocked for months, your content may need re-indexing, which can take additional weeks.
3
Accidentally blocking crawlers is common
Many sites accidentally block AI crawlers through overly broad Disallow rules, CDN bot protection, or WAF rules. The most common mistake: a “Disallow: /” rule targeting spam bots that unintentionally blocks all user-agents including AI crawlers.
4
Use the builder to generate exact rules
Click Allow or Block on any crawler to add it to the robots.txt builder. Copy the output and add it to your robots.txt. Always test with a robots.txt tester after making changes.
Technical GEO Series
Master AI crawler access
6 emails on Technical GEO — robots.txt, crawl budget, Core Web Vitals, and the server-level configuration that affects AI engine citation probability.
Have questions about this topic? Contact The GEO Lab · Return to homepage
