"CustomGPT Instructions for Knowledge (Uploaded Files)" is the informal name for a short block of default instructions that OpenAI prepends to a Custom GPT's system prompt whenever the builder attaches files in the Knowledge section of the GPT configuration. The block tells the model how to treat those files: pull from them as a primary source, refer to them as the GPT's knowledge rather than as user uploads, prefer their contents over training data, admit when an answer is not in the documents, and refuse to disclose file names or download links.
The text was first widely seen in late 2023 after researchers extracted hidden system prompts from public GPTs in the GPT Store. The same wording has appeared, with only minor edits, in thousands of leaked prompts collected on GitHub and in the OpenAI Developer Community. OpenAI never published the snippet on its help pages, so builders treat it as a half-documented contract: it shows what the platform tells the model by default, and explains why a Custom GPT will sometimes refuse user requests for raw file contents or file lists.
The default instruction text
The canonical version of the prompt, as repeatedly leaked from production Custom GPTs with knowledge files, reads as follows:
You have files uploaded as knowledge to pull from. Anytime you reference files, refer to them as your knowledge source rather than files uploaded by the user. You should adhere to the facts in the provided materials. Avoid speculations or information not contained in the documents. Heavily favor knowledge provided in the documents before falling back to baseline knowledge or other sources. If searching the documents didn't yield any answer, just say that. Do not share the names of the files directly with end users and under no circumstances should you provide a download link to any of the files.
The phrase "files uploaded as knowledge" is what gives the snippet its informal title. "Knowledge source rather than files uploaded by the user" is a presentational rule: the model should talk about "the documentation" or "the knowledge base" instead of "the file you uploaded." The middle three sentences (adhere to the facts, avoid speculation, heavily favor the documents) are the part builders most often quote when they want the GPT to behave like a closed-book assistant. The last sentence is the privacy clause, which is where most of the platform's file-protection behavior begins.
When the snippet is added
OpenAI inserts the text automatically. Builders never see it inside the Configure tab and cannot edit it directly. The trigger is the presence of files in the Knowledge slot of the GPT editor: as soon as at least one file is attached, the snippet becomes part of the system prompt the model receives at the start of every conversation. Removing all files removes the block on the next save.
The snippet sits below the builder's own instructions. The two are concatenated, so a builder who writes "Always cite section numbers from the manual" ends up with a system prompt where that line is followed by the default passage, and the model treats both as instructions of equal weight.
The instructions only describe the policy. The actual mechanism that lets the model read uploaded documents is a separate built-in tool called myfiles_browser, exposed to the GPT alongside browsing, image generation, and the Python sandbox. Researchers reverse-engineering ChatGPT have published the tool's signature many times. It defines a small set of functions: search(query) runs a text query over the attached files; click(id) opens a specific document from those hits; back() returns to the previous list; scroll(amt) advances inside an open document; open_url(url) opens a document by its identifier; and quote_lines(start, end) extracts a passage for citation. Citations in the chat use a 【message_idx†link_text】 format, the same convention ChatGPT uses for browsing results.
The model is told to set the recipient to myfiles_browser and to use Python-style syntax such as search('quarterly revenue'). For longer questions the documented guidance tells the assistant to start with open_url on the most relevant document; for short factual lookups it should run up to three search calls and answer as soon as a clear hit appears. Combined with the policy text, those hints produce the typical "closed-book" behavior: search, quote, cite, and stop.
How OpenAI processes uploaded files
The knowledge feature is a retrieval-augmented generation pipeline. OpenAI has not published the exact production parameters used inside ChatGPT, but the closely related file_search tool in the Assistants API uses the same general architecture and serves as the best public reference. Files are split into chunks, each chunk is converted into an embedding vector, and the vectors are stored in a vector database. At query time the user's message is embedded, the closest chunks are retrieved by cosine similarity, and the top hits are pasted back into the context window before the model generates a response.
In the Assistants API the default chunk size is 800 tokens with a 400-token overlap, configurable up to 4,096 tokens per chunk. Custom GPTs are believed to use similar defaults, though the specific values are not exposed. Retrieved chunks share the same overall context limit as the rest of the conversation, which is one reason builders sometimes see only fragments of a long document in any given turn.
| Component | Role in a Custom GPT with knowledge |
|---|
| Builder instructions | Free-form text up to about 8,000 characters in the Configure tab. |
| Default knowledge prompt | The "You have files uploaded as knowledge" block, appended automatically. |
| myfiles_browser | Built-in tool that searches and reads the chunks created from uploaded files. |
| Vector store | Embedding index built from the chunks, used for similarity search. |
| Context window | Where retrieved chunks land before the model writes its answer. |
The Knowledge section of a Custom GPT accepts up to 20 files. Each file can be up to 512 MB. Text and document files are additionally capped at 2 million tokens per file, while spreadsheets are exempt from the token cap but still subject to the 512 MB size limit. Once a file is attached, it counts against the 20-file ceiling for that GPT until the builder removes it.
GPTs accept the common document, spreadsheet, image, text, and code file types. PDF, Microsoft Word (.docx), plain text (.txt), Markdown, comma-separated values (.csv), Microsoft Excel (.xlsx), JSON, and source files in the major programming languages are all routinely used in production GPTs. The available formats can vary depending on the underlying GPT-4 or GPT-4o model selected, and a few formats only become accepted when Code Interpreter & Data Analysis is turned on for the GPT. OpenAI's own guidance, repeated in builder forums for two years now, is to prefer simple text-forward files. Scanned PDFs without an embedded text layer, dense tables that rely on visual layout, and image-heavy slide decks all retrieve poorly because the chunks the embedding model sees are sparse, noisy, or empty.
Why the policy text matters
A Custom GPT without the default block tends to behave like ordinary ChatGPT and answers from training data even when the right document is sitting in the knowledge base. The block pushes the model in the other direction. Three sentences do most of the work.
The "adhere to the facts" and "avoid speculations" sentences narrow the model to material it can actually quote from a retrieved chunk. "Heavily favor knowledge provided in the documents before falling back to baseline knowledge" tells it to retrieve first and answer second. "If searching the documents didn't yield any answer, just say that" gives the assistant explicit permission to say no, which removes some of the pressure to make something up when the knowledge base is silent on a topic.
The presentational rule, treating the documents as the GPT's own knowledge rather than as user uploads, is more about tone. It is the reason a GPT built on a company handbook tends to talk about "the policy" instead of "the PDF you uploaded." Builders who want a different framing usually override it explicitly in their own instructions, for example by writing "Refer to the documentation by its filename whenever you cite it."
The privacy clause
The last sentence of the default block ("Do not share the names of the files directly with end users and under no circumstances should you provide a download link to any of the files") is the source of the platform's most visible file-protection behavior. Ask a typical knowledge-equipped GPT for the names of its files and the model will refuse, often quoting the policy almost word for word. Ask for a download link and it will refuse again.
The protection is real but shallow. The snippet, on its own, does not prevent file extraction. A 2025 academic study, When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs, mapped at least five leakage vectors: metadata returned by the GPT Store, prompts that surface chunks during initialization, retrieval traces with verbatim text, the sandboxed Python environment from Code Interpreter, and direct prompt-injection attacks. The most reliable vector is Code Interpreter: when Code Interpreter & Data Analysis is enabled, the original files are mounted at /mnt/data inside the sandbox, and a user can list and download them by asking the model to run ls -a /mnt/data and then read the bytes. The paper reports a 95.95% success rate for that specific exploit and notes that nearly 29% of files leaked in the wild are copyrighted material, including books from major publishers and internal documents from public companies.
Research from Palo Alto Networks, Cato Networks, and others has reached similar conclusions. The default policy block is a soft guardrail, not a confidentiality boundary. Builders should not attach files they would not be willing to publish, and organizations that want to ship internal data through a Custom GPT should either disable Code Interpreter, accept that the source files are recoverable, or move to a hosted retrieval system they control.
Common builder overrides
The default block is conservative and a little vague, so builders frequently layer their own rules on top. The patterns that show up most often in public GPTs are easy to describe.
Many builders enforce stricter sourcing by writing rules like "Use only the information retrieved from the uploaded documentation" and "If no relevant information is found in the file, respond with: 'The requested information is not available.'" Citation rules are also common, for example "Clearly cite the section, page, or excerpt from the documentation" or "Every response must reference the knowledge source file." These instructions sit alongside the default block and tend to push the GPT further toward closed-book behavior than the defaults alone would produce.
A second class of override relaxes the privacy clause for cases where the files are public anyway. A research GPT pointing at open-access papers, for example, often includes "You may share the file names and section titles with the user" so that citations work properly. A third class deals with retrieval pathologies: builders who notice that their GPT keeps answering from training data sometimes add hard "Do not answer from prior knowledge" rules, although those instructions, like the defaults, hold only as well as the underlying retrieval does.
Limitations of instructions alone
A recurring observation in the OpenAI Developer Community is that the default block, and even aggressive builder overrides, do not always force the model to consult the knowledge base. Custom GPTs sometimes answer from training data when a topic is well represented there, especially when the user's question is phrased generically. The most common diagnosis is that retrieval failed: the embedding match did not return chunks the model considered relevant, so it fell back on what it already knew. Better-structured source files, more specific queries, and explicit instructions to call myfiles_browser first usually help, but no prompt makes retrieval perfect. The architecture is RAG, not fine-tuning, and the model has not memorized the documents.
The other recurring complaint is token cost. A single retrieval round can pull in thousands of tokens of chunked content, which is why some users have reported responses with 30,000 tokens of duplicated context for a relatively short answer. In ChatGPT this is mostly invisible. In the Assistants API it shows up directly on the bill.
Place in the broader stack
The "Knowledge (Uploaded Files)" snippet is one of several default blocks that OpenAI prepends to a Custom GPT's system prompt. Other documented blocks govern web browsing, DALL-E image generation, the Python sandbox, and the Custom GPT's name and description. They all follow the same pattern: a paragraph or two of policy, optionally a tool definition, and a citation format. The knowledge block is the one most builders interact with because it directly shapes how the GPT behaves on questions that depend on the attached documents.
For builders, the snippet is best treated as a baseline rather than as a complete instruction set. It establishes the default behavior, sets the privacy posture, and explains why some user requests get refused. Anything beyond that, like specific citation formats, allowed file references, refusal text, or stricter sourcing, has to be written into the builder's own instructions. The snippet itself is short, fixed, and not part of the official documentation, but knowing what it says removes most of the surprise from working with the Knowledge feature.
See also
References
- OpenAI Help Center, "Knowledge in GPTs," https://help.openai.com/en/articles/8843948-knowledge-in-gpts.
- OpenAI Help Center, "Creating and editing GPTs," https://help.openai.com/en/articles/8554397-creating-and-editing-gpts.
- OpenAI Help Center, "Key Guidelines for Writing Instructions for Custom GPTs," https://help.openai.com/en/articles/9358033-key-guidelines-for-writing-instructions-for-custom-gpts.
- OpenAI Help Center, "File Uploads FAQ," https://help.openai.com/en/articles/8555545-file-uploads-faq.
- LouisShark, "chatgpt_system_prompt: A collection of GPT system prompts and various prompt injection/leaking knowledge," GitHub repository, https://github.com/LouisShark/chatgpt_system_prompt.
- spdustin, "ChatGPT-AutoExpert: _system-prompts/all_tools.md," GitHub, https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/_system-prompts/all_tools.md.
- OpenAI API documentation, "Assistants File Search," https://developers.openai.com/api/docs/assistants/tools/file-search.
- OpenAI Developer Community, "How does the knowledge of custom GPT actually work," https://community.openai.com/t/how-does-the-knowledge-of-custom-gpt-actually-work/517882.
- OpenAI Developer Community, "How to force custom gpt to respond exclusively from content of the knowledge files," https://community.openai.com/t/how-to-force-custom-gpt-to-respond-exclusively-from-content-of-the-knowledge-files/1026751.
- OpenAI Developer Community, "Limits on source uploads for custom GPTs," https://community.openai.com/t/limits-on-source-uploads-for-custom-gpts/1127135.
- OpenAI Developer Community, "What is the knowledge file limit that can be uploaded to train custom GPTs," https://community.openai.com/t/what-is-the-knowledge-file-limit-that-can-be-uploaded-to-train-custom-gpts/653244.
- OpenAI Developer Community, "Why is the My GPT Instructions limited by 8000 characters," https://community.openai.com/t/why-is-the-my-gpt-instructions-limited-by-8000-characters/1006936.
- Shen, X. et al., "When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs," arXiv:2506.00197, 2025, https://arxiv.org/abs/2506.00197.
- Palo Alto Networks Unit 42, "OpenAI Custom GPTs: What You Need to Worry About," 2024, https://www.paloaltonetworks.com/blog/cloud-security/openai-custom-gpts-security/.
- Cato Networks, "How to steal intellectual property from GPTs," https://www.catonetworks.com/blog/how-to-steal-intellectual-property-from-gpts/.
- Gold Penguin, "Custom GPTs Currently Let Anyone Download Leaked Files," https://goldpenguin.org/blog/custom-gpts-currently-let-anyone-download-context/.