Context window

In the field of Large Language Models (LLMs, a context window is a key parameter that refers to the range of tokens or words that an artificial intelligence (AI) model can process and respond to when generating responses to prompts or user inputs. ^[1] ^[2] ^[3]

A context window works by maintaining a sliding window of tokens, with the most recent tokens kept in focus. As new tokens are added to the input, the older ones may fall outside the context window, and the AI will no longer be able to access them. ^[3]

Smaller context windows lead models to "forget" the content of recent conversations, which can cause them to go off topic. Also, after a few thousand words, they forget their initial instructions leading to extrapolations based on the last information on their context window. ^[4]

Using GPT-4's 32k model as an example, if it is fed with a 6000-token input plus a 4000-token response, the total token count is 10,000, falling within the model's context window. If the total count exceeds the 32k limit, it loses access to some tokens, hampering the AI's responses on the full context. ^[3]

GPT-4's context window

OpenAI's GPT-4 has the ability to process a larger context window of 32,000 tokens in the largest model (25,000 words, or around 50 pages), a significant improvement over GPT-3 (4000 tokens, approximately 3000 words). ^[3] ^[4]

This improvement on the size of the context window impacts the model's performance and utility across different applications. Larger context windows enhance performance in handling more complex and lengthy inputs, like processing entire documents, understanding the full scope of an article, or creating content based on a broader set of information. It also results in more accurate and contextually relevant responses. ^[3]

The range of applications that comes from having a broader context window is vast, being able to process complex documents like academic articles, legal contracts, or novels. Conversational AI is another area of application that will benefit from it, with improvements in maintaining context during extended conversations and providing more engaging, coherent, and appropriate responses. ^[3]

Below are some example areas that will benefit from the expanded context window in GPT-4.

Customer support;

Data analysis and decision-making;

Language learning and translation;

Creative writing and content generation. ^[3]

Claude's context window

The 100k context window in Anthropic's language model Claude was considered a remarkable development in the AI world. The update from 9,000 to 100,000 tokens (approximately 75,000 words) means that the model can process and analyze very large documents or maintain extended conversations. As an example, this model can understand the novel "The Great Gatsby" and respond to questions about it in less that 30 seconds. ^[2] ^[5] ^[6]

According to Anthropic's blog, "Beyond just reading long texts, Claude can help retrieve information from the documents that help your business run. You can drop multiple documents or even a book into the prompt and then ask Claude questions that require synthesis of knowledge across many parts of the text. For complex questions, this is likely to work substantially better than vector search based approaches." ^[6]

Claude's expanded 100k context window enables it to:

Digest, summarize, and explain complex documents (e.g. financial statements or research papers);

Strategic risks and opportunities analysis based on a company's annual reports;

Evaluate the advantages and disadvantages of a piece of legislation;

Assist in academic research and writing;

Support lifelong learning and personal growth;

Risk and themes identification in legal documents;

Read hundreds of pages of developer documentation and provide answers to technical questions;

Quickly prototype a code by submitting it into the context to build on or modify it;

Summarize long audio content. ^[6] ^[7]

100k Claude vs. 32K GPT-4

Claude's larger context window compared to GPT-4 allows it to process more text. It can analyze around 75,000 words compared to GPT-4's 25,000 words. Therefore, Claude can provide a more comprehensive understanding of complex subjects while GPT-4 can be better suited for task that require a more focused analysis and systhesis of information. However, It's larger size may lead to slower processing times compared to GPT-4. ^[7]

The 100k context window is ideal for analyzing big documents like annual reports, legal briefs, and research papers, while GPT-4's 32k tokens are better suited for tasks that require less information like customer service, content generation, and social media management. ^[7]

Finally, for education and research, Claude's context window allow it to process vast amounts of academic and research material. The smaller context window of GPT-4 may be better suited for everyday educational tasks, assisting students and researchers in finding specific information, summarizing articles, and giving answers to questions on a smaller scale. ^[7]

References

↑ Ratner, N, Levine, Y, Belinkiv, Y, Ram, O, Abend, O, Karpas, E, Shashua, A, Leyton-Brown, K and Shoham, Y (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models. arXiv:2212.10947v1
↑ ^2.0 ^2.1 Kyei, K (2023). The Expanding Frontier of AI: A Deep Dive into Context Windows and Overcoming Hallucinations. LinkedIn. https://www.linkedin.com/pulse/expanding-frontier-ai-deep-dive-context-windows-overcoming-kwame-kyei
↑ ^3.0 ^3.1 ^3.2 ^3.3 ^3.4 ^3.5 ^3.6 Kristian (2023). GPT-4 Prompt Engineering: Why Larger Context Window is a Game-Changer. All About AI. https://www.allabtai.com/gpt-4-prompt-engineering-why-larger-context-window-is-a-game-changer/
↑ ^4.0 ^4.1 Wiggers, K (2023). OpenAI is Testing a Version of GPT-4 that Can ‘Remember’ Long Conversations. TechCrunch. https://techcrunch.com/2023/03/14/openai-is-testing-a-version-of-gpt-4-that-can-remember-long-conversations/
↑ Zhang, H (2023). Claude’s 100k Context Windows: An Impressive First-Hand Exploration. GenerativeAI. https://generativeai.pub/claudes-100k-context-windows-my-impressive-first-hand-exploration-6a6e9a47287f
↑ ^6.0 ^6.1 ^6.2 Anthropic (2023). Introducing 100K Context Windows. Anthropic. https://www.anthropic.com/index/100k-context-windows
↑ ^7.0 ^7.1 ^7.2 ^7.3 Ramlochan, S (2023). Claude's 100K Context Window: How This Will Transform Business, Education, and Research. Prompt Engineering Institute. https://www.promptengineering.org/claudes-100k-context-window-how-this-will-transform-business-education-and-research/

[”1”-1] Ratner, N, Levine, Y, Belinkiv, Y, Ram, O, Abend, O, Karpas, E, Shashua, A, Leyton-Brown, K and Shoham, Y (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models. arXiv:2212.10947v1

[”2”-2] 2.0 ^2.1 Kyei, K (2023). The Expanding Frontier of AI: A Deep Dive into Context Windows and Overcoming Hallucinations. LinkedIn. https://www.linkedin.com/pulse/expanding-frontier-ai-deep-dive-context-windows-overcoming-kwame-kyei

[”3”-3] 3.0 ^3.1 ^3.2 ^3.3 ^3.4 ^3.5 ^3.6 Kristian (2023). GPT-4 Prompt Engineering: Why Larger Context Window is a Game-Changer. All About AI. https://www.allabtai.com/gpt-4-prompt-engineering-why-larger-context-window-is-a-game-changer/

[”4”-4] 4.0 ^4.1 Wiggers, K (2023). OpenAI is Testing a Version of GPT-4 that Can ‘Remember’ Long Conversations. TechCrunch. https://techcrunch.com/2023/03/14/openai-is-testing-a-version-of-gpt-4-that-can-remember-long-conversations/

[”5”-5] Zhang, H (2023). Claude’s 100k Context Windows: An Impressive First-Hand Exploration. GenerativeAI. https://generativeai.pub/claudes-100k-context-windows-my-impressive-first-hand-exploration-6a6e9a47287f

[”6”-6] 6.0 ^6.1 ^6.2 Anthropic (2023). Introducing 100K Context Windows. Anthropic. https://www.anthropic.com/index/100k-context-windows

[”7”-7] 7.0 ^7.1 ^7.2 ^7.3 Ramlochan, S (2023). Claude's 100K Context Window: How This Will Transform Business, Education, and Research. Prompt Engineering Institute. https://www.promptengineering.org/claudes-100k-context-window-how-this-will-transform-business-education-and-research/

[1]

[2]

[3]

[4]

[5]

[6]

[7]