# OPENAI # GPTBot - Trains GPT models (ChatGPT, GPT-4, etc.) User-agent: GPTBot Disallow: / # OAI-SearchBot - ChatGPT search feature User-agent: OAI-SearchBot Disallow: / # ChatGPT-User - Direct user URL requests User-agent: ChatGPT-User Disallow: / # ANTHROPIC (CLAUDE) # ClaudeBot - Primary training crawler User-agent: ClaudeBot Disallow: / # anthropic-ai - General AI research User-agent: anthropic-ai Disallow: / # Claude-Web - Web search/browsing User-agent: Claude-Web Disallow: / # GOOGLE # Google-Extended - Trains Gemini/Bard (not search indexing) User-agent: Google-Extended Disallow: / # PERPLEXITY AI # PerplexityBot - AI search engine crawler User-agent: PerplexityBot Disallow: / # Perplexity-User - User-requested content User-agent: Perplexity-User Disallow: / # COMMON CRAWL # CCBot - Archives web for AI training datasets User-agent: CCBot Disallow: / # AMAZON # Amazonbot - Alexa AI and services User-agent: Amazonbot Disallow: / # APPLE # Applebot-Extended - Trains Apple Intelligence User-agent: Applebot-Extended Disallow: / # Applebot - General web indexing User-agent: Applebot Disallow: / # BYTEDANCE (TIKTOK) # Bytespider - Aggressive AI training crawler User-agent: Bytespider Disallow: / # META/FACEBOOK # FacebookBot - AI training and analysis User-agent: FacebookBot Disallow: / # Meta-ExternalAgent - Various AI/ML purposes User-agent: Meta-ExternalAgent Disallow: / # DIFFBOT # Diffbot - AI-powered data extraction User-agent: Diffbot Disallow: / # IMAGE/DATASET CRAWLERS # ImagesiftBot - Image dataset collection User-agent: ImagesiftBot Disallow: / # Img2Dataset - Large-scale image datasets User-agent: Img2Dataset Disallow: / # CONTENT AGGREGATORS # Omgilibot - Content aggregation service User-agent: Omgilibot Disallow: / # Omgili - Alternative user agent User-agent: Omgili Disallow: / # YOU.COM # YouBot - AI-powered search engine User-agent: YouBot Disallow: / # COHERE # cohere-ai - LLM training data collection User-agent: cohere-ai Disallow: / # XAI (ELON MUSK / GROK) # GrokBot - Trains Grok chatbot User-agent: GrokBot Disallow: / # xAI-Grok - Alternative user agent User-agent: xAI-Grok Disallow: / # Grok-DeepSearch - Deep search variant User-agent: Grok-DeepSearch Disallow: / # DATA COLLECTION SERVICES # DataForSeoBot - SEO data and analytics User-agent: DataForSeoBot Disallow: / # FriendlyCrawler - Commercial web scraping User-agent: FriendlyCrawler Disallow: / # DEEPSEEK # DeepSeekBot - WARNING: May crawl anonymously and ignore robots.txt User-agent: DeepSeekBot Disallow: /