Cloudflare to block AI crawlers that don't pay publishers

TL;DR: Cloudflare will block AI crawlers by default on pages with ads starting September, unless publishers opt in. This forces AI companies to pay for content used to train their models.

What happened?

Cloudflare, the web infrastructure company that protects and accelerates millions of sites, has announced that starting in September it will block by default artificial intelligence crawlers that collect content to train models. The measure will apply to any page that displays ads, unless the site owner indicates otherwise. In essence, Cloudflare reverses the burden: instead of publishers having to opt out of AI scraping, AI companies will now have to negotiate and pay to access content. According to The Next Web, Cloudflare argues that we must "stop giving away the web" and that its position as an intermediary for 20% of global web traffic gives it a unique ability to enforce this restriction.

Context: the battle for training data

Cloudflare's announcement comes at a time of growing tension between content creators and AI companies. Since 2023, several media outlets such as The New York Times, Reuters, and Getty Images have sued OpenAI and other companies for using their content without permission to train models. In parallel, platforms like Reddit and Stack Overflow have closed licensing agreements with Google and OpenAI, setting a precedent for paying for data. Cloudflare, which manages approximately 20% of web traffic, has a unique position to enforce these restrictions on a global scale. The company had already introduced in 2024 a system for websites to tag their content as prohibited for AI crawlers, but the new policy goes a step further by making it the default. This move is part of a broader trend: according to industry data, in 2025 AI crawler traffic accounted for up to 10% of total web traffic, and it is expected to grow exponentially.

Why is this important?

Cloudflare's decision could fundamentally change the economics of generative AI. Until now, most models were trained on data freely scraped from the web, drawing criticism from publishers and creators who saw their work used without compensation. If Cloudflare blocks crawlers by default, AI companies will be forced to negotiate licensing agreements with millions of sites, increasing their operating costs and potentially slowing the development of new models. Additionally, the measure could inspire other web infrastructure providers, such as Akamai or Fastly, to adopt similar policies. The economic impact is significant: a Stanford University study estimates that unauthorized data scraping for AI caused over $1.5 billion in lost advertising revenue for publishers in 2024. By blocking crawlers on pages with ads, Cloudflare aims to reverse this trend and return control to creators.

What consequences will it have?

For publishers, the measure is a relief: it gives them more control over their content and a way to monetize it. However, small sites that rely on search traffic could be affected if AI crawlers are also used by search engines like Google (which has its own crawler, Googlebot). Cloudflare has clarified that it will only block specific AI crawlers, not search ones, but the technical implementation could be complex. For AI companies, this means they will have to accelerate their licensing efforts or resort to synthetic and open-source datasets. It could also increase data fragmentation, reducing the quality of models trained on web data. Compared to previous events, such as Reddit's ban on AI crawlers in 2024, Cloudflare's measure is much broader: it affects millions of sites by default. In the long term, a market for web data licenses could emerge, similar to what already exists for social media and forum data.

What readers should know

If you are a website owner, Cloudflare will let you decide whether to allow AI crawlers or not. The default option will be to block them on pages with ads, but you can change it in the settings. If you are an AI user, the quality of models could be affected in the short term, but in the long term it could foster a fairer ecosystem where creators are compensated. It is important to closely follow how Cloudflare implements this policy and whether other players join in. Additionally, the measure could have legal implications: in the European Union, the Digital Single Market Copyright Directive already requires crawlers to respect publishers' exclusions, and Cloudflare could be aligning with this regulation. In the United States, the legality of scraping remains a gray area, but cases like hiQ Labs vs. LinkedIn have set some limits.

Industry reactions

The news has generated mixed reactions. Publishers like Axel Springer group have applauded the measure, while some AI startups criticize it for stifling innovation. Legal experts point out that the legality of AI scraping is still unclear in many jurisdictions, and that Cloudflare may be getting ahead of potential regulations. For now, the September deadline marks the beginning of a new phase in the relationship between the web and artificial intelligence. As a Cloudflare spokesperson told The Next Web: "It's not about blocking innovation, but about ensuring creators are compensated for their work." It remains to be seen whether other major infrastructure players, such as AWS CloudFront or Google Cloud CDN, will follow suit.

Cloudflare threatens to block AI crawlers if they don't pay publishers