How to Prevent OpenAI and Google From Training Their LLMs on Your Website's Data: Difference between revisions
(Created page with "{{see also|Guides}} You can prevent OpenAI and Google from training your large language models (LLMs) on your website's data, content or information by adding these 4 lines to your website's ''robot.txt'' file. <pre> User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / </pre> Category:Guides") |
No edit summary |
||
Line 1: | Line 1: | ||
{{see also|Guides}} | {{see also|Guides}} | ||
You can prevent [[OpenAI]] and [[Google]] from training your [[large language models]] ([[LLMs]]) on your website's data, content or information by adding these 4 lines to your website's ''robot.txt'' file | You can prevent [[OpenAI]] and [[Google]] from training your [[large language models]] ([[LLMs]]) on your website's data, content or information by adding these 4 lines to your website's ''robot.txt'' file: | ||
<pre> | <pre> | ||
Line 9: | Line 9: | ||
Disallow: / | Disallow: / | ||
</pre> | </pre> | ||
The first 2 lines prevent [[OpenAI]]'s [[models]] like [[ChatGPT]], [[GPT-4]], [[GPT-5]] from training on your website's content. | |||
The last 2 lines prevent [[Google]]'s [[models]] like [[Bard]], [Gemini]] from training on your website's content. | |||
The code above blocks the [[models]] from using your data for the entire website. If you only want to block the models from specific sections or directories of your website, you can do this: | |||
<pre> | |||
User-agent: GPTBot | |||
Disallow: /name-of-the-section | |||
User-agent: Google-Extended | |||
Disallow: /name-of-the-section | |||
</pre> | |||
[[Category:Guides]] | [[Category:Guides]] |
Revision as of 10:31, 25 December 2023
- See also: Guides
You can prevent OpenAI and Google from training your large language models (LLMs) on your website's data, content or information by adding these 4 lines to your website's robot.txt file:
User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: /
The first 2 lines prevent OpenAI's models like ChatGPT, GPT-4, GPT-5 from training on your website's content.
The last 2 lines prevent Google's models like Bard, [Gemini]] from training on your website's content.
The code above blocks the models from using your data for the entire website. If you only want to block the models from specific sections or directories of your website, you can do this:
User-agent: GPTBot Disallow: /name-of-the-section User-agent: Google-Extended Disallow: /name-of-the-section