llms.txt
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,491 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,491 words
Add missing citations, update stale details, or suggest a clearer explanation.
llms.txt is a proposed web standard that defines a small markdown file served at the root path /llms.txt of a website, containing a curated, LLM-friendly summary and a list of links to authoritative content elsewhere on the site.[1] The proposal was published by Jeremy Howard, co-founder of fast.ai and Answer.AI, on 3 September 2024, accompanied by an announcement on the Answer.AI blog and a companion specification site at llmstxt.org.[1][2] A complementary convention, /llms-full.txt, packs an entire documentation set into a single markdown file for direct ingestion into a language model's context window.[3] The standard has been adopted by documentation platforms including Mintlify, which began auto-generating the files for hosted sites in November 2024, and by companies such as anthropic, Cloudflare, Stripe, Pydantic, and many others, while remaining the subject of skepticism from search engineers including Google's John Mueller about whether AI crawlers actually fetch it.[4][5][6]
By mid-2024, large language models had become a routine consumer of web documentation through inline retrieval inside coding assistants, browser tools, and answer engines, but the format of typical web pages was poorly matched to that role. Pages built for human readers are HTML wrappers around navigation menus, advertising, JavaScript widgets, and CSS, with the substantive prose distributed across many DOM nodes; an LLM that ingests the raw page wastes tokens on chrome and must guess at the underlying structure.[1][7] The problem is compounded by the limited size of model context windows relative to a full site, which can run into hundreds of thousands of tokens once boilerplate is included.[1]
Howard encountered the problem directly while shipping FastHTML, a Python web framework released by Answer.AI in 2024.[8] Because FastHTML appeared after the training cut-off of the then-current frontier models, developers complained that AI coding assistants could not give accurate guidance about the library, and copy-pasting documentation pages into a chat window produced messy context with low signal-to-noise.[2][8] FastHTML was therefore the first reference implementation for what became the llms.txt convention.[2][8]
The proposal positioned llms.txt as a parallel for /robots.txt and /sitemap.xml, two long-standing root files that web crawlers know how to look for, but with a different goal: instead of allowing or disallowing access, llms.txt provides a small, structured map that a model or agent can consume to find the right detailed pages.[1][9] The choice of markdown rather than JSON, XML, or a custom syntax was deliberate: markdown is the de facto in-band format that current large language models produce and consume best, and a markdown file is human-editable without tooling.[1][2] Howard's announcement noted that the format favors a small file that can be authored by hand and reviewed in a pull request, rather than a machine-generated artifact.[2]
The llms.txt specification is hosted at llmstxt.org and is maintained as an open repository by Answer.AI on GitHub.[1][3] It defines a single markdown file located at the site root, for example https://example.com/llms.txt, with a strict ordering of section types so that the file can be parsed by simple tools rather than requiring a full markdown engine to interpret semantics.[1]
In order, the file contains:[1]
[name](url), optionally followed by a colon and a short note about the file.The specification reserves one H2 section name, ## Optional, with a special meaning: links inside the Optional section may be skipped by tools that are operating under a tight token budget, while anything outside of it is treated as recommended reading.[1] All other H2 headings are author-chosen and group links thematically, for example by API area, by tutorial topic, or by content type.[1]
The motivation for the structured form is that small models, plain regex parsers, and ad-hoc scripts can extract sections without needing to model arbitrary markdown.[1] A site can therefore be served to an agent as a series of curated link bundles instead of a single opaque HTML document.[1]
An example file from the specification opens with # FastHTML, followed by a single-blockquote summary describing the framework, then short paragraphs about the project conventions, then H2 sections such as ## Docs and ## Examples containing link lists, and finally an ## Optional section with reference material that can be dropped when the consumer is short on context.[1] The same skeleton is used by Anthropic's developer documentation, by Stripe's docs, and by Mintlify's auto-generated output, with the H2 section names adapted to each site's information architecture.[4][17][22]
Howard's announcement and the reference repository describe a second, optional convention: a file at /llms-full.txt that contains the full markdown text of the documentation rather than just links.[2][3] This file is intended for use cases in which a developer wants to paste an entire documentation set into a long-context model in one shot, for example when troubleshooting a library or generating code that touches many parts of an API.[4]
The llms-txt repository on GitHub also publishes a Python tool, llms_txt2ctx, that reads a site's llms.txt, follows the listed links, and emits two derived files: llms-ctx.txt, containing only the required sections inlined, and llms-ctx-full.txt, which also pulls in the contents of the ## Optional section.[3] FastHTML's own documentation directory ships both files as reference output.[3][8]
Several adopters pair the root file with a per-page convention: every documentation URL is also served as a .md file at the same path, so that a model that follows a link from llms.txt can fetch a clean markdown body instead of HTML.[4][10] Mintlify's hosted documentation platform implements this automatically for every page, and Stripe's developer documentation does the same, exposing pages such as https://docs.stripe.com/building-with-llms.md.[4][10] The convention is not part of the llmstxt.org specification proper, but it has become the de facto companion to the root file across the larger adopters.[4][10]
The visual resemblance between /llms.txt and /robots.txt invites direct comparison, but the two files address different problems and can be deployed together.[1][9]
/robots.txt was standardized as the Robots Exclusion Protocol and codified in RFC 9309 in 2022; it is an access-control file that tells crawlers which paths they may or may not visit, identified by User-agent lines.[11] It does not describe content, and a site that wants to opt out of AI training typically uses /robots.txt to disallow specific user-agents such as GPTBot or ClaudeBot.[11][12] Anthropic, for example, documents three distinct bot identities: ClaudeBot for training, Claude-User for user-initiated fetches, and Claude-SearchBot for search indexing, each of which is governed by entries in /robots.txt.[12]
/sitemap.xml is a content-discovery file, also delivered at the site root, that lists every URL on a site along with optional metadata about last-modified time, change frequency, and priority.[13] Its purpose is to make it easier for search crawlers to find every page, not to summarize what the pages are about.[13]
/llms.txt differs from both. It does not authorize or prohibit access (that role remains with robots.txt), and it does not attempt to enumerate every page (that role remains with sitemap.xml). Instead, it provides a small curated set of links and explanatory text aimed at a model that needs to assemble a useful context window from a site, on the assumption that the model has already been allowed to read.[1][9] Search Engine Land described the file as "a treasure map for AI" rather than a successor to robots.txt for that reason.[9] The three files are complementary rather than substitutable.[9]
A persistent confusion in industry coverage during 2025 was to treat llms.txt as a way to control AI training; that interpretation is incorrect, since the proposal does not include any access-control mechanism and Howard has consistently described it as a hint for inference-time tools rather than a directive for training crawlers.[1][2]
Adoption of llms.txt grew in two distinct phases. Through October 2024 the file remained a niche experiment among a small group of developer-tools sites that knew Howard's work.[4][14] Adoption accelerated sharply in November 2024 when Mintlify, a hosted documentation platform used by a large fraction of developer-facing AI companies, began auto-generating /llms.txt, /llms-full.txt, and a .md version of every page for every hosted site.[4][10] That single rollout immediately put the files on thousands of documentation sites without per-customer configuration.[4]
| Adopter | Documentation host | Date file went live | Notes |
|---|---|---|---|
| FastHTML | answer.ai project | September 2024 | Reference implementation; published alongside the proposal[2][8] |
| Mintlify-hosted sites including Anthropic, Cursor, Coinbase, Pinecone, Windsurf | Mintlify | November 2024 | Auto-generated /llms.txt, /llms-full.txt, and .md page mirrors[4][10] |
| Cloudflare developer documentation | Self-hosted | 2025 | One root file plus product-specific llms-full.txt files per product such as Workers, Pages, R2, AI Gateway[15][16] |
| Stripe developer documentation | Self-hosted | March 2025 | Includes a machine-readable instructions section telling agents which API patterns to prefer and which deprecated endpoints to avoid[17][10] |
| Pydantic and PydanticAI documentation | Self-hosted on docs.pydantic.dev and ai.pydantic.dev | 2025 | Two-file convention with llms.txt and llms-full.txt[18] |
| Vercel AI SDK | Self-hosted | 2025 | Listed in the public directory.llmstxt.cloud directory of adopters[14] |
The llms.txt directory hosted at directory.llmstxt.cloud catalogues sites that have shipped the file, with token counts indicating documentation depth; entries include Perplexity, Hugging Face projects, ElevenLabs, Pinecone, and Coinbase, in addition to the larger adopters listed above.[14] Independent counts of adoption vary widely depending on methodology: a BuiltWith scan reported more than 844,000 sites with an llms.txt by late October 2025, while a narrower SE Ranking analysis of 300,000 domains found roughly 10% of them serving the file.[19][20] A measurement from Search Engine Roundtable referencing an OtterlyAI experiment found that out of 62,100 AI-bot requests in one window, only 84 fetched /llms.txt, roughly 0.1%, suggesting that adoption on the publishing side was far ahead of adoption on the consuming side.[21]
Adoption was concentrated in technical documentation: developer tools, API products, and AI infrastructure sites, where the files match the use case of helping an anthropic Claude or Cursor agent answer a question about an API. Consumer destinations such as Google Search, Facebook, Amazon, and major news sites did not adopt the standard in 2025.[19]
Anthropic's developer documentation, hosted on Mintlify, serves an /llms.txt at docs.anthropic.com/llms.txt (which redirects to the platform.claude.com host) listing the Messages API, Managed Agents, vision and tool-use guides, batch processing, prompt caching, extended thinking, and the rest of the API reference.[22] Cursor, the AI coding editor, was an early Mintlify-hosted adopter via the November 2024 rollout.[4] The files are routinely fetched by claude code and other agents when developers paste documentation links into a session.[15]
Cloudflare's developer documentation publishes both /llms.txt and /llms-full.txt at developers.cloudflare.com and additionally ships product-specific full-text files such as developers.cloudflare.com/workers/llms-full.txt and developers.cloudflare.com/pages/llms-full.txt.[16] Cloudflare's style guide describes the files as a way to point agents such as Cursor, GitHub Copilot, and Claude Code to clean documentation entry points so they can answer questions, generate configuration, and call Cloudflare APIs accurately.[15]
Stripe added /llms.txt and per-page .md mirrors to its developer documentation in March 2025.[10][17] The Stripe file is notable for an "Instructions" section directed at agents, with explicit statements about API patterns to use and to avoid, including the directive that an agent "must not call deprecated API endpoints such as the Sources API" and that it should not recommend the legacy Card Element.[17] Stripe engineer Ian McCrystal wrote on the day of release that the company expected "AI tools will eventually become the predominant readers of our documentation."[17]
The proposal has been contentious from launch, both inside the developer community and among search and SEO commentators.
On the day the file was announced, a Hacker News thread titled "Llms.txt" accumulated 206 points and 175 comments. The discussion was predominantly skeptical: top-voted comments argued that the file should live under /.well-known/ per RFC 8615 rather than at the site root, questioned whether language models really needed curated input given that HTML is already machine-readable, and worried that an adversary could publish one set of content to humans and a different set to LLMs.[7] Howard, posting as jph00, responded that the file was orthogonal to robots.txt and intended for inference-time use by tools such as Cursor rather than for training crawlers.[7]
The most influential criticism in 2025 came from John Mueller, a Search Advocate at Google, who wrote in an April 2025 Reddit thread that "AFAIK none of the AI services have said they're using LLMs.TXT (and you can tell when you look at your server logs that they don't even check for it). To me, it's comparable to the keywords meta tag, this is what a site-owner claims their site is about... (Is the site really like that? well, you can check it. At that point, why not just check the site directly?)."[5] Mueller's analogy to the long-deprecated keywords meta tag was widely repeated in the search trade press and became the standard reference for skepticism.[5][6]
The cloaking objection has been raised repeatedly. Because the llms.txt file is a separate document from the rendered site, an operator could in principle publish one set of claims for AI consumers and another for human visitors, a behavior that search engines have historically penalized when applied to crawlers.[5][6] Without a verification step in which the consuming model checks the linked content against the live pages, the file therefore depends on goodwill.[5][6]
The redundancy argument is the second strand of criticism. Mueller and others have observed that an LLM that retrieves a page already has the full content; the LLM does not save time by reading a curated summary of that page if it must still fetch the source pages to verify them.[5] A counterpoint from Mintlify's product team is that real-world coding agents typically do not crawl an entire site; they fetch a small number of pages chosen by the user or by a planner, and a curated index reduces the chance that the agent fetches the wrong pages.[4][15]
A measurement-based critique appeared throughout 2025. Server-log analyses by OtterlyAI and others reported that crawlers identified as belonging to OpenAI, Anthropic, Google, and Perplexity did not routinely request /llms.txt during normal site visits in 2024 and early 2025, with one analysis finding requests to the file at about 0.1% of overall AI-bot traffic.[21] No major AI lab has announced, as of mid-2025, that its production models or agents read /llms.txt at inference time, although several publish their own files.[6][21]
Defenders of the standard responded that the file's value does not depend on whether opaque commercial crawlers fetch it: developer tools, coding agents, and bespoke retrieval pipelines can be pointed at the file directly by a user, and operators benefit from a stable, machine-readable description of their own documentation even when no commercial bot requests it.[4][15] Search Engine Land published a counter to Mueller titled "No, llms.txt is not the 'new meta keywords'", arguing that the comparison conflated unsolicited self-description with explicit retrieval initiated by the user.[23]
A separate strand of debate concerned location. Several Hacker News commenters argued that the file belonged under /.well-known/ per RFC 8615, the IETF convention for site-metadata files, and that placing a new file at the site root would crowd a namespace already occupied by robots.txt, sitemap.xml, humans.txt, security.txt, favicon.ico, and similar files.[7] Howard responded that the existing root-file conventions for crawlers had survived without /.well-known/ and that putting the file at a discoverable root path lowered the barrier to adoption.[7] The site root location was retained in the specification.[1]
The reference toolchain for llms.txt is the llms-txt Python package published by Answer.AI on GitHub, which provides the llms_txt2ctx command-line program for converting a site's llms.txt into the inlined llms-ctx.txt and llms-ctx-full.txt formats.[3] The repository also lists community-built integrations including a JavaScript implementation, plugins for VitePress, Docusaurus, and Drupal, a PHP library, and a Visual Studio Code extension called PagePilot.[3]
Mintlify's hosted documentation platform was the largest single source of adoption. Beginning in November 2024 it auto-generated /llms.txt, /llms-full.txt, and a markdown version of every page for every customer with no configuration required, which is why a single product launch produced a sharp jump in the number of sites serving the file.[4][24]
For self-hosted sites, third-party generators and helpers proliferated through 2025. The directory.llmstxt.cloud and llmstxthub.com directories list adopters and provide validation against the llmstxt.org specification.[14] Several SEO platforms shipped llms.txt generators as features of broader "generative search optimization" tooling, though these were criticized by adopters who preferred to author the file by hand to keep curation meaningful.[6]
llms.txt operates at a different layer of the stack from the model context protocol (MCP), an open standard introduced by anthropic in November 2024 for connecting language models to external tools and data sources through a JSON-RPC interface.[25]
llms.txt is a static, read-only document hosted at a known path on a public website; it requires no server-side runtime beyond an HTTP server able to return a markdown file, and it provides no way for a model to invoke actions, query a database, or maintain state.[1] Its scope ends at telling a consuming tool what to read.[1]
The Model Context Protocol, by contrast, is an active runtime protocol; a server implements a set of tool, resource, and prompt endpoints, and a client model invokes those endpoints over a transport such as standard input/output or HTTP server-sent events.[25] MCP servers can perform retrieval, execute database queries, write files, or call third-party APIs on behalf of the model.[25]
The two are not in competition. A site that publishes structured documentation through /llms.txt and exposes runtime capabilities through an MCP server addresses different needs: the static file lets a coding assistant cite or summarize the site's documentation correctly, while the MCP server lets the same assistant perform an action against the site's live systems.[1][25] Anthropic publishes both: an /llms.txt at its developer documentation host and a public model_context_protocol specification with reference servers.[22][25]
Other proposals in the same neighborhood include AGENTS.md, a file convention for repository-level instructions to coding agents, and various ai.txt drafts aimed at training-data opt-outs.[26] These address related but distinct problems and have not converged with llms.txt as of mid-2025.[26]