Glossary · Mar 15, 2026 · 4 min read

What is Robots.txt for AI?

Robots.txt for AI refers to using the robots.txt file to specifically control which AI crawlers can access your website content for training and retrieval purposes.

Orbilo Team

Definition

Robots.txt for AI refers to the practice of using a website's robots.txt file to control access by AI-specific crawlers — such as GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Gemini). While robots.txt has been used for decades to manage search engine crawlers, the emergence of AI crawlers has introduced a new layer of decisions about what content AI platforms can access for training and real-time retrieval.

Why robots.txt AI rules matter

Your robots.txt directly determines whether AI platforms can use your content. This impacts:

  • AI training — Blocking training crawlers means your content won't be included in future model updates
  • Real-time retrieval — Blocking retrieval crawlers means platforms like Perplexity can't cite your content in live responses
  • Brand presence — A blanket block on all AI crawlers effectively removes your brand from AI-generated answers
  • Content control — Selective rules let you choose which platforms and which content they can access

Common AI robots.txt configurations

Allow all AI crawlers (recommended for AEO):

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Block all AI crawlers:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Hybrid approach (allow retrieval, block training):

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

Known AI crawler user-agents

| User-Agent | Operator | Type | |------------|----------|------| | GPTBot | OpenAI | Training | | ChatGPT-User | OpenAI | Retrieval (browsing) | | ClaudeBot | Anthropic | Training | | PerplexityBot | Perplexity | Retrieval | | Google-Extended | Google | Gemini training | | Bytespider | ByteDance | Training | | CCBot | Common Crawl | Open dataset | | FacebookBot | Meta | Training |

Best practices

  1. Audit your current robots.txt — Check if you're accidentally blocking AI crawlers (or accidentally allowing them)
  2. Align with your AEO strategy — If you want AI visibility, ensure AI crawlers are allowed
  3. Use llms.txt alongside robots.txt — Even if you allow crawling, an llms.txt file gives AI systems a curated summary of your brand
  4. Review regularly — New AI crawlers appear frequently; update your rules as the landscape evolves
  5. Consider per-directory rules — Allow crawling of marketing pages while blocking premium content
  • AI Crawler — The bots that robots.txt rules control
  • LLMs.txt — A complementary file that tells AI what your brand is (while robots.txt controls access)
  • Training Data — The content corpus that AI crawlers help build

Tools

Share this article:

Ready to monitor your brand?

Track your brand mentions across ChatGPT, Claude, Perplexity, Grok, and Gemini with Orbilo.

Start Free Trial