What is ai.txt and why is it different from robots.txt?

ai.txt is an emerging standard (originated by Spawning.ai and adopted by several creator platforms) for opting content out of generative-AI training, separately from search and indexing. Robots.txt is interpreted by every crawler but with different defaults per vendor; ai.txt is a single declaration of training intent that does not depend on per-bot rules. Adoption is still partial — useful as a belt-and-suspenders signal alongside robots.txt.

Free AI Permissions Auditor

Q: If I block all training crawlers, will my site still appear in AI answers?

Mostly yes, if you also allow the search and answering crawlers. Blocking GPTBot keeps your content out of future GPT model weights but does not stop ChatGPT from fetching your live URL when a user asks about it (that uses ChatGPT-User). Blocking ClaudeBot does not stop Claude-Web. This is the most common policy: allow search and answering, block training. The auditor scores how cleanly you have expressed this split.

Q: Why is my score low if I have not done anything wrong?

The score measures explicitness, not goodness. If your site is silent on AI permissions, every crawler defaults to its own vendor's policy — which is unpredictable, can change without notice, and varies per bot. A clear "Disallow" on every major AI crawler scores high. So does a clear "Allow." The score punishes silence, because silence is what most lawsuits are about right now.

Audit how clearly your site states its AI permissions across robots.txt, page-level meta tags, the X-Robots-Tag response header, ai.txt, and the EU TDM Reservation. Distinguishes training crawlers from search and on-demand crawlers — then generates a deploy-ready robots.txt for the policy you actually want.

Audit Your AI Permissions

Builds a per-bot matrix and tells you exactly which AI systems you have actually said yes or no to.

Fetching robots.txt, meta tags, ai.txt, and response headers — building the per-bot matrix...

0 policy clarity

Passed

Warnings

Failed

Get a free personalized review with a recommended policy and deploy-ready robots.txt.

You are all set. Check your inbox.

I will personally review your site and follow up within 1-2 business days.

Per-Bot Permissions Matrix

AI-Aware robots.txt Generator

Pick the stance you actually want and we'll output correctly formatted directives for every major AI crawler.

Your robots.txt

Deploy: save as /robots.txt at the root of your domain.

Why Explicit AI Permissions Matter

distinct AI crawlers actively scrape the web in 2026, each with its own default policy when you say nothing. Silence is not a stance — it is whichever stance each vendor decides for you. The opt-out cases the courts are weighing all turn on the same question: did you say no?

Training, Search, and Answering Are Different Decisions

The single biggest mistake site owners make with AI permissions is treating all AI crawlers as one category. They are not. There are three functionally different jobs, three different sets of bots, and three different commercial outcomes when you say yes or no to each.

Training crawlers

These collect content to train new AI models. GPTBot, ClaudeBot, Google-Extended, CCBot, Applebot-Extended, Bytespider, Meta-ExternalAgent. Allowing them adds your content to model weights — long-term reach, no direct attribution. Blocking them is the centre of every recent copyright lawsuit.

Search crawlers

These build the search index the AI consults at answer time. OAI-SearchBot, PerplexityBot, YouBot. Blocking these means your site cannot appear in the answers ChatGPT, Perplexity, and You.com generate — even if a user asks specifically about you.

Answering / on-demand crawlers

These fetch a URL when a user asks the AI to look at it. ChatGPT-User, Claude-Web, Perplexity-User, DuckAssistBot. Blocking these stops AI from ever rendering a faithful, current view of your page to its user — they end up summarizing whatever stale snapshot the training set contained.

The most common modern policy: allow search and answering, block training. The auditor scores how cleanly your robots.txt expresses that split — or any other deliberate stance.

The Five Layers of AI Permission Signals

Saying yes or no to AI is not one decision in one place — it is five overlapping signals, each interpreted by different vendors differently. This audit checks all of them.

1. robots.txt

The base layer. Per-bot allow and disallow directives at /robots.txt. Every reputable AI crawler reads it — but the defaults when you say nothing vary by vendor. Silence is interpreted; only explicit rules are guaranteed.

2. Page-level meta tags

<meta name="robots" content="noai"> and <meta name="robots" content="noimageai"> are the page-level training opt-outs that Adobe, DeviantArt, and others have aligned on. Useful when robots.txt is too coarse-grained — for instance, when you want to allow training on most of the site but opt out specific articles.

3. X-Robots-Tag response header

The HTTP-header version of the meta tag. Critical for non-HTML resources — PDFs, images, JSON feeds — where you cannot inject meta tags into the response body.

4. ai.txt

An emerging file at the site root from Spawning.ai, declaring training intent at a file-format level. Less widely adopted than robots.txt but increasingly cited in creator-protection conversations.

5. TDM Reservation (EU AI Act)

The only signal with explicit statutory backing. A <meta name="tdm-reservation" content="1"> tag reserves your work from text-and-data-mining for commercial AI training under EU Directive 2019/790 Article 4. Distinct from the others in that the legal weight is named.

The auditor checks all five and flags the most important kind of failure: inconsistencies between them. A meta tag saying noai while robots.txt allows GPTBot is a bug that silently invalidates the page-level opt-out — and it is one of the most common findings.

Reading Your Policy-Clarity Score

This score is unusual: it measures explicitness, not goodness. A site that explicitly blocks every AI crawler scores the same as one that explicitly allows every AI crawler. Both have decided. The score punishes only silence — because silence is what every active AI copyright case is currently arguing about.

80 and up — green

Every major AI crawler has an explicit rule. Your page-level meta tags align with your robots.txt. No vendor is guessing your intent. This is the goal state, regardless of whether you are open or closed.

50 to 79 — yellow

Some bots have explicit rules; others are covered only by a wildcard or not at all. Common cause: a robots.txt that addresses Googlebot but never names the AI crawlers individually. Fix: deploy the generator output below, customised to your stance.

Below 50 — red

Most or all AI crawlers are silent. Whatever each vendor's default policy is, that is what is happening to your site — and those defaults change quarterly. Worst case: page-level meta tags say noai while robots.txt is silent, producing a contradicted opt-out that does not legally hold.

The high-impact fix is almost always to deploy a robots.txt that names every major AI crawler. The generator below does exactly that for whichever stance you actually want.

Frequently Asked Questions

What is the difference between training, search, and answering crawlers?

Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) collect content to train future AI models — the content is absorbed into the model weights themselves. Search crawlers (OAI-SearchBot, PerplexityBot) build the search index the AI consults at answer time. Answering crawlers (ChatGPT-User, Claude-Web, Perplexity-User) fetch a URL on demand when a user asks about it. The three jobs have three different commercial outcomes and you can — and usually should — set different policies for each.

If I block all training crawlers, will my site still appear in AI answers?

Mostly yes, provided you also allow the search and answering crawlers. Blocking GPTBot keeps your content out of future GPT model weights but does not stop ChatGPT from fetching your live URL when a user asks about it (that uses ChatGPT-User). Blocking ClaudeBot does not stop Claude-Web. Blocking Google-Extended does not stop Google's classic Googlebot from continuing to index you for normal search. This split is what the "Block training, allow answering" policy is designed to express.

What is ai.txt and how is it different from robots.txt?

ai.txt is an emerging standard from Spawning.ai for opting content out of generative-AI training, declared at /ai.txt at the site root. Where robots.txt requires per-bot rules and is interpreted differently by every vendor, ai.txt is a single uniform training-opt-out declaration. Adoption is partial but growing among creator-platform tools. Useful as a belt-and-suspenders signal alongside robots.txt.

What is TDM Reservation?

TDM Reservation is a machine-readable opt-out signal recognized under the EU AI Act and Directive 2019/790 Article 4. A meta tag of name="tdm-reservation" with content="1" reserves your work from text-and-data-mining for the purpose of training commercial AI models in the EU. It is one of the few opt-out signals with explicit statutory backing rather than industry custom — if you publish under EU jurisdiction and want copyright protection from AI training, this tag is the named legal channel.

Why is my score low if I have not done anything wrong?

The score measures explicitness, not goodness. A clear "Disallow" on every major AI crawler scores high. So does a clear "Allow." The score punishes silence — because silence is what most lawsuits are about right now. If your robots.txt addresses Googlebot but never names GPTBot, ClaudeBot, or PerplexityBot, you have not actually consented or refused — you have left the decision to each vendor's default, which is whatever they say it is this quarter.

Will allowing AI crawlers hurt my human-traffic SEO?

No. None of the AI-specific crawlers in this matrix affect classic Google Search ranking. Googlebot (the classic crawler) is independent of Google-Extended (the AI training crawler). Blocking Google-Extended only opts you out of Gemini training; your Google Search rankings are unaffected. Same logic for OAI-SearchBot vs Googlebot, PerplexityBot vs Bingbot, and so on. AI permissions are an orthogonal layer to traditional SEO.

Need a Hand Setting the Right Policy?

The right AI permissions for your site depend on your industry, your content type, and your tolerance for being ingested. Book a call and I will review your audit, recommend a stance, deploy the robots.txt, and add the meta-tag and X-Robots-Tag layers where they belong — usually in a single afternoon.

Book a Call