Cloudflare filters AI trackers: site owners regain control
The platform allows websites to automatically block bots that collect data to train artificial intelligence models, a measure that seeks to balance power between content creators and tech companies.
July 3, 2026 · 3 min read

TL;DR: Cloudflare has launched a tool that automatically filters web crawlers used by AI companies, allowing sites to control how their content is used. This responds to growing concern over unauthorized scraping and could shift the balance of power between creators and big tech.
What happened?
Cloudflare, the web infrastructure platform that manages traffic for millions of sites (approximately 20% of all websites according to W3Techs data), has introduced a feature that allows administrators to automatically block web crawlers used by artificial intelligence companies to collect data. According to Engadget, the tool is activated with a single click and filters known bots serving companies like OpenAI, Google, and Anthropic, among others. The decision responds to growing concern among content creators about the unauthorized use of their materials to train language models and other AI systems. This move comes in a context where, according to a study by Originality.ai, more than 60% of websites already block OpenAI's GPTBot, and where companies like The New York Times have sued OpenAI and Microsoft for copyright infringement.
Why is it important?
Cloudflare's announcement arrives amid intense debate over copyright and intellectual property in the era of generative AI. Until now, websites could block crawlers via the robots.txt file, but many AI companies ignored these directives or found ways to circumvent them. For example, in 2023, a report by The Verge revealed that some AI bots impersonated legitimate browsers to access content. Cloudflare, by operating as an intermediary between the site and visitors, can apply blocking at the network level, making it harder for unauthorized bots to access content. This measure empowers small and medium-sized publishers, who often lack the technical resources to defend against mass scraping. Moreover, it sets a precedent for other infrastructure platforms to take a stand in protecting creators' rights. This is not the first time Cloudflare has acted in this area: in 2022, it launched a tool to block data scraping bots, but this new function is specifically aimed at AI.
What consequences will it have?
Cloudflare's tool could significantly reduce the amount of data available for training AI models, especially those relying on large-scale web scraping. Companies like OpenAI have already faced criticism and lawsuits for using content without permission; this filter could force them to negotiate licensing agreements with publishers or seek alternative data sources. On the other hand, websites that block crawlers might lose visibility in AI-based search tools like Google SGE or Bing Chat if these depend on their own bots. However, Cloudflare clarifies that the function only blocks specific AI crawlers, not those of traditional search engines. In the long term, this measure could accelerate the fragmentation of the web, where access to data becomes more restricted and costly, benefiting large platforms that already possess vast amounts of proprietary data, such as Facebook or Google. A Gartner report estimates that by 2025, 60% of organizations using generative AI will have implemented data control policies, reflecting a growing trend.
What should readers know?
Website owners using Cloudflare can activate the filter from the control panel, in the security section. It is important to review which bots are being blocked, as some may be necessary for legitimate services. Additionally, the tool is not foolproof: crawlers that change their user-agent could evade the filter. Cloudflare has promised to periodically update its list of AI bots, based on data from its network and sources like the community-maintained AI crawler list. For internet users, this measure may mean that some sites become inaccessible through AI assistants, but it also protects authorship and the value of original content. In comparison, in 2020, Google launched a similar tool to control bots for its own search engine, but Cloudflare's initiative is broader and more decentralized. As noted by TheVortiq's chief technology analyst: "This is a victory for content creators who want control over their work, but also a reminder that the open web is at a crossroads." Cloudflare's decision could inspire other infrastructure companies to follow suit, shifting the balance of power in the digital ecosystem.