Know which AI bots reach your site
AI crawlers decide whether ChatGPT, Claude, Perplexity, and Gemini can read, cite, and recommend your content. This directory catalogues the bots that matter — who runs them, what they do, and whether they respect robots.txt.
33 bots from 22 operators · Last updated June 2026
The AI bot directory
Filter by what each bot is for, or search by name and operator.
GPTBot
AI CrawlerOpenAI
Crawls public web content to train OpenAI's foundation models. Content owners can block it in robots.txt to opt out of training.
OAI-SearchBot
AI SearchOpenAI
Indexes pages to surface and link them inside ChatGPT search results. Blocking it removes you from ChatGPT search, not training.
ChatGPT-User
AI AssistantOpenAI
Fetches a specific page when a ChatGPT user (or a GPT/plugin) asks about it. Triggered by users, not bulk crawling.
ClaudeBot
AI CrawlerAnthropic
Anthropic's primary crawler for collecting web data used to train Claude. Honours robots.txt and crawl-delay.
Claude-User
AI AssistantAnthropic
Retrieves a page in real time when a Claude user asks a question that requires fetching live content.
Claude-SearchBot
AI SearchAnthropic
Indexes content so Claude can cite and link to it when answering with web search enabled.
anthropic-ai
AI CrawlerAnthropic
Legacy Anthropic user-agent still referenced in many robots.txt files. ClaudeBot is the current crawler.
Googlebot
Search EngineGoogle's core search crawler. Its index also powers AI Overviews and grounding for Gemini in Search.
Google-Extended
Opt-Out TokenA robots.txt token, not a crawler. Disallowing it opts your site out of training Gemini and Vertex AI while keeping Google Search indexing intact.
GoogleOther
AI CrawlerA generic crawler used by internal Google teams for research and development, including AI data collection.
Bingbot
Search EngineMicrosoft
Microsoft's search crawler. The Bing index grounds Copilot and other Microsoft AI answer experiences.
Applebot
Search EngineApple
Powers Siri and Spotlight Suggestions. Its crawl also feeds Apple Intelligence features.
Applebot-Extended
Opt-Out TokenApple
A robots.txt token. Disallowing it opts your content out of training Apple's generative models while keeping Siri/Spotlight indexing.
PerplexityBot
AI SearchPerplexity
Indexes pages so Perplexity can surface and cite them in answers. Documented to honour robots.txt.
Perplexity-User
AI AssistantPerplexity
Fetches a page when a user action requires it. Perplexity states these user-initiated fetches may not follow robots.txt.
Meta-ExternalAgent
AI CrawlerMeta
Meta's crawler for collecting training data for Llama and other Meta AI products.
Meta-ExternalFetcher
AI AssistantMeta
Fetches specific links to support Meta AI assistant features when a user invokes them.
FacebookBot
AI CrawlerMeta
Crawls pages to improve language models that power Meta products such as speech recognition.
Amazonbot
AI AssistantAmazon
Crawls the web to answer questions through Alexa and to support Amazon's AI services.
DeepSeekBot
AI CrawlerDeepSeek
Crawler associated with DeepSeek for gathering web data to support its AI models. Its user-agent and robots.txt behaviour are not yet formally documented.
DeepSeekBot
Bytespider
AI CrawlerByteDance
ByteDance's crawler used to gather training data for its AI models. Widely reported to ignore robots.txt and crawl aggressively.
Bytespider
CCBot
AI CrawlerCommon Crawl
Builds the open Common Crawl dataset that is a major source of training data for many large language models.
cohere-training-data-crawler
AI CrawlerCohere
Collects web data used to train Cohere's enterprise language models.
cohere-training-data-crawler
MistralAI-User
AI AssistantMistral AI
Retrieves pages on demand to support Le Chat and other Mistral assistant features.
DuckAssistBot
AI AssistantDuckDuckGo
Supports DuckDuckGo's DuckAssist AI answers by fetching relevant content.
YouBot
AI SearchYou.com
Indexes content for the You.com AI search engine and assistant.
AI2Bot
AI CrawlerAllen Institute for AI
Crawls open web content for research datasets used to train open language models such as OLMo.
Diffbot
AI CrawlerDiffbot
Crawls and structures web pages into a knowledge graph that powers AI and data products.
PetalBot
Search EngineHuawei
Crawler for Huawei's Petal Search and AI assistant features.
ImagesiftBot
AI CrawlerImageSift (Hive)
Crawls images across the web to power visual search and AI training datasets.
AhrefsBot
SEO ToolAhrefs
Crawls the web to build Ahrefs' backlink and SEO index. Often allowed for SEO/AEO analysis.
SemrushBot
SEO ToolSemrush
Powers Semrush's backlink, keyword, and site-audit datasets.
DataForSeoBot
SEO ToolDataForSEO
Collects SERP and web data resold to many SEO and AI-visibility tools.
Bot details are compiled from each operator's published documentation and may change as operators update their crawlers.
Blocking the wrong bot can make you invisible to AI
There are three kinds of AI bot, and they call for three different decisions.
Training crawlers
GPTBot, ClaudeBot, and Google-Extended gather data to train models. Block them if you don't want your content used for training — but know that on some platforms this also reduces your odds of being cited later.
Search indexers
OAI-SearchBot, PerplexityBot, and Claude-SearchBot decide whether an AI answer engine can cite and link to you. You almost always want these allowed — they're how you earn AI visibility.
Live fetchers
ChatGPT-User and Claude-User fetch a page the moment a user asks about it. Blocking them stops AI tools from reading your page on a user's behalf — usually a missed opportunity.
The takeaway: the goal of AEO isn't to block AI bots — it's to make sure the right ones can read you, and that your content is structured to get cited when they do. Orbilo shows you exactly which AI crawlers hit your site and how often your brand surfaces in their answers.
Frequently asked questions
See which AI bots are reading you — and what they say about your brand
Orbilo tracks AI crawler activity on your site and monitors how ChatGPT, Claude, Perplexity, Gemini, and Grok mention your brand — so you can turn crawls into citations.