Should I Block ChatGPT?

If you’re managing a website or doing SEO there’s a good chance you’ve asked yourself or been asked – should I block ChatGPT? To clarify, I mean blocking ChatGPT from crawling your website and using your content to train its model. And before we get to the reasons for and against blocking ChatGPT, we should probably cover how it’s done.

How to block ChatGPT

If you want to block ChatGPT from crawling your website you’ll need to add what’s known as a directive (or in this case, more specifically a disallow directive) to your site’s robots.txt file. Add the two lines below to your robots.txt file to block Chat GPT:

User-agent: GPTBot
Disallow: /

The user-agent for the ChatGPT web crawler is GPTBot, so blocking it will help prevent your content from showing up as responses. GPTBot is used by OpenAI (the creator or ChatGPT) to crawl websites and “potentially” use content to improve future models.

OpenAI also has another known as ChatGPT-User which is not used for crawling, but instead is used for ChatGPT’s browsing feature. You can block ChatGPT-User from crawling your website if you want by adding these two lines to your robots.txt file:

User-agent: ChatGPT-User
Disallow: /

It’s also worth pointing out that ChatGPT doesn’t rely solely on its own crawling data to train its models. In fact, 60% of the data used to GPT-3 came from Common Crawl – a large open-source crawl data repository. If you want to block ChatGPT from crawling your website, you’ll probably want to block Common Crawl as well. Here’s how you do it:

User-agent: CCBot
Disallow: /

So now that you know how to block ChatGPT, what are some reasons you may or may not want to?

Reasons not to block ChatGPT

First, let’s start with the easy option – doing nothing. There are plenty of good reasons to kick your feet up and leave your robots.txt file alone. The most obvious reason to let ChatGPT crawl your website is generative engine optimization (GEO). Google and Bing have been blending traditional search and generative AI – which has made optimizing for the latter more attractive. And in the case of Google SGE, there’s currently no way to block crawling without blocking Googlebot outright.

The sheer number ChatGPT users is also good reason to let your website be crawled if you’re looking for brand exposure. Having your content appear in ChatGPT responses isn’t necessarily a bad thing if you’re trying to be found. This is especially true for brands that want to “be where their customers are” and appear cutting edge by embracing AI.

Search Engine Land also did a great article on 3 reasons not to block GPTBot last month that I think is worth checking out.

Reasons to block ChatGPT

There’s also plenty of reasons to consider blocking ChatGPT (and other generative AI crawlers, but we’ll get there). In fact, over 26% of the top 1,000 website blocked GPTBot by September 2023. Some of the most popular websites, including Amazon.com, The New York Times, CNN.com, Wikihow.com and Medium.com block GPTbot from crawling.

Many news, journalist and blog websites have good reason to block GPTBot from crawling. If you feel your hard work is being plagiarized, then you have every right to stop it. Even as an AI enthusiast, I can see the argument. While I’m not a professional writer, I do spend a fair amount of time blogging and I don’t relish the idea of my work being reused without citation. And sure, generative AI is a lot more complicated than that, but it’s still a reasonable concern.

Simply not trusting AI technology is often a common reason for blocking it or regulating its use. Plenty of ethical and privacy concerns surround the development and use of AI. For many it’s viewed as the safer option to abstain.  

Other generative AI crawlers

If you’re considering blocking GPTbot then you might want to considered blocking other generative AI crawlers as well. Aside from GPTBot, some of the most common AI bots, to blocks are:

Blocking Google AI from crawling

Blocking Google SGE isn’t possible without blocking Googlebot and tanking your SEO. But you can block Google Gemini (formerly known as Bard) by adding this to your robots.txt:

User-agent: Google-Extended
Disallow: /

How to block Claudebot

Many of the websites I’ve seen block ChatGPT also block Claudebot. Claudebot is an AI assistant created by Anthropic. You can block it from crawling your website using:

User-agent: ClaudeBot
Disallow: /

Another user-agent Claude AI is known to use for crawling is Claude-Web. To effectively block Claude AI you should also add this to your robots.txt:

User-agent: Claude-Web
Disallow: /

Lastly, the creator of Claude AI, Anthropic, is known to use anthropic-ai to crawl. Make sure to include this in your robots.txt too:

User-agent: anthropic-ai
Disallow: /

How to block Omgili

Another crawler worth blocking if want to stop your content from training LLMs is Omgili (oh my god I love it). As Neil Clark points out in his article, Omgili’s parent company webz.io clearly states that they sell their crawling data to train LLMs. Here is how to block Omgili from crawling your website:

User-agent: Omgili
Disallow: /

User-agent: Omgilibot
Disallow: /

Preventing your content from training AI models completely isn’t easy if you’re publishing content on the internet, but you can limit it.

Summary

At the end of the day blocking ChatGPT and other generative AI crawlers is really a matter of choice. Depending on your website’s purpose and/or your business model it may make sense to. But in my opinion the vast majority of sites have nothing to fear from allowing AI crawlers to crawl their site.


Updated 3/23/24: Added information about Common Crawl, details about blocking Claudebot and Omgili to the list of “other generative AI crawlers.”

Scroll to Top