Skip to content

Free Robots.txt Generator: Control Crawlers and Block AI Scrapers

Your robots.txt controls which crawlers can access your site. Our free generator helps you allow search engines, block AI training scrapers like GPTBot and CCBot, and protect private paths.

MI

mubashar

· 2 min read
Share

Your robots.txt file is a set of instructions you leave for web crawlers — search engine bots, AI training scrapers, and other automated agents that visit your site. It sits at yourdomain.com/robots.txt and tells crawlers which parts of your site they can and cannot access. Getting it right protects your crawl budget, keeps sensitive paths out of search results, and — increasingly — blocks AI companies from scraping your content without permission.

Why robots.txt matters in 2025

Two years ago, most developers set User-agent: * Allow: / and forgot about it. In 2025, the calculus has changed. AI companies are aggressively crawling the web to build training datasets. OpenAI's GPTBot, Common Crawl's CCBot, and others consume vast amounts of bandwidth and content without compensation. Many website owners are now opting out.

Our Robots.txt Generator makes it easy to block specific AI crawlers while keeping Google and other search engines you want.

Understanding the format

# Allow all search engines
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

# Block OpenAI's training crawler
User-agent: GPTBot
Disallow: /

Sitemap: https://yourdomain.com/sitemap.xml

Each User-agent block applies rules to a specific crawler. * is a wildcard that matches all crawlers. Specific user-agents (like GPTBot) override the wildcard for that bot.

What to always disallow

Regardless of which crawlers you allow, these paths should almost always be in your Disallow list:

  • /admin/ — no benefit to having admin pages indexed
  • /api/ — API endpoints are not useful in search results
  • /accounts/ — login, signup, and account management pages
  • Any staging or draft preview paths

How to serve robots.txt in Django

Rather than a static file (which is hard to version control and easy to forget), serve it as a Django view:

# views.py
from django.http import HttpResponse

def robots_txt(request):
    lines = [
        "User-agent: *",
        "Allow: /",
        "Disallow: /admin/",
        f"Sitemap: https://yourdomain.com/sitemap.xml",
    ]
    return HttpResponse("
".join(lines), content_type="text/plain")

# urls.py
path('robots.txt', views.robots_txt, name='robots_txt'),

How to use the generator

  1. Visit the Robots.txt Generator
  2. Choose your default crawler policy
  3. Select which AI bots to block (GPTBot, CCBot, etc.)
  4. Enter any paths you want to disallow
  5. Add your sitemap URL
  6. Copy the generated file and deploy it to your root domain

Test your robots.txt using Google Search Console's robots.txt tester after deploying. It validates your syntax and shows you exactly what Googlebot sees.

MI

Written by

Mubashar Iqbal

Web developer, SEO expert, and independent maker. I build products, write about what I've learned, and create free tools for developers and marketers.