AI Crawler Controls on Magic Pages

Jannis
By Jannis, Founder
|

AI training crawlers and AI assistants now make up a meaningful share of web traffic. Some respect the unwritten rules − they identify themselves clearly, honour robots.txt, and only fetch what they need. Others crawl aggressively, ignore robots.txt, or rotate user-agents to slip past blocks.

If you'd rather your content not be used to train large language models – or returned inside AI-generated answers without a click back to your site – Magic Pages lets you block known AI crawlers from your website with a single toggle.

What this blocks

When you enable AI crawler blocking, we add a rule at our CDN edge that returns 403 Forbidden for any request matching either of two lists:

  • Cloudflare's verified-bot categoriesAI Crawler, AI Assistant, and AI Search. These are maintained by Cloudflare and cover ChatGPT, Claude, Perplexity, Google's AI features, and many more. Cloudflare keeps this list fresh, so we don't have to.
  • A vendored list of known AI/ML user-agent strings, sourced from the open-source ai-robots-txt project. This catches crawlers that haven't been formally verified yet. The list refreshes weekly.

Currently on the list: GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, CCBot, Google-Extended, Anthropic-AI, Bytespider, FacebookBot, Meta-ExternalAgent, PerplexityBot, Applebot-Extended, Amazonbot, Cohere-AI, and around forty others.

What this doesn't block

Regular search engines – Googlebot, Bingbot, DuckDuckBot – are not affected. Your site stays indexable and shareable. Human visitors browse normally. The block is precisely scoped to AI training and assistant bots. Everything else continues exactly as before.

RSS feeds, the Ghost Admin API, webhooks, and any integrations you have running are also untouched.

How it works

The check happens at the Cloudflare edge, before the request ever reaches your site. That has two practical benefits:

  • AI crawlers don't consume any of your site's resources (though, this is more for our peace of mind than yours) – Cloudflare absorbs the request entirely.
  • The block is consistent across every page, including paid posts behind your membership, RSS-only routes, and static assets.

Changes propagate within seconds. We allow up to 60 seconds in the worst case.

Limits and honesty

A few things worth being upfront about:

  • UA detection is honour-based. A determined scraper can spoof a normal browser user-agent and bypass this. The well-behaved crawlers – the ones training the major AI products – identify themselves correctly, which is what this feature relies on.
  • Already-indexed content stays indexed. Blocking AI crawlers today stops new training data, but content crawled before you enabled the toggle may already exist inside trained models.
  • The list evolves. New AI products launch constantly. Both the vendored UA list and Cloudflare's verified-bot categories update automatically. However, it is perfectly possible that a new AI crawler emerges that slips through this for a few days.

How to enable

💡
This feature is currently in beta mode. If you'd like to give it a try, send us a quick message at [email protected].

Open the Customer Portal from inside your Ghost Admin, head to the Configuration tab, and flip the Block AI crawlers from accessing this site toggle. For a step-by-step walkthrough with screenshots, see our how-to guide.

Why we built this

Independent creators and small businesses on Magic Pages told us they wanted control over how their work shows up in AI products. Some are fine with their content being part of training data, others aren't. Both positions are valid.

This toggle gives you the choice – at the infrastructure layer, where it actually works – without you having to maintain a robots.txt or wrestle with Cloudflare rules yourself.

Still have questions?

We're here to help and want to make sure you get the most out of your Ghost site. Reach out directly and we'll get back to you as soon as possible.

Websites powered by Magic Pages

From personal blogs to growing businesses — published with Ghost®, hosted with care.

Loading showcase sites...

Start Your 14-Day Free Trial

No credit card required