Context Window

Also known as: context window, context length, context size, token limit

technical beginner

What is a Context Window?

A context window is the maximum amount of text (measured in tokens) that a language model can process in a single interaction. It includes everything: the system prompt, conversation history, retrieved documents, tool definitions, the user’s message, and the model’s response. If the total input exceeds the context window, older content must be truncated or summarized. Context windows have grown dramatically — from 4,096 tokens in early GPT-3.5 to 200,000 tokens in Claude and up to 2 million tokens in Gemini — enabling models to work with entire codebases, long documents, and extended conversations.

Why Context Window Size Matters

The context window defines the practical boundary of what a model can “see” and reason about at once. A larger context window means the model can process longer documents without chunking, maintain longer conversation histories without losing earlier context, and work with more tools and instructions simultaneously. For agent applications, context window size determines how complex a task an agent can handle in a single session — more tools, more state, and more history all consume tokens. However, bigger is not always better: models may struggle to attend equally to all information in very long contexts, and processing more tokens costs more time and money.

Practical Considerations

For practitioners, context window management is a core engineering challenge. Strategies include summarizing older conversation turns, using RAG to retrieve only the most relevant documents instead of including everything, compressing tool definitions, and prioritizing which information is most important for the current task. The “lost in the middle” phenomenon — where models pay more attention to information at the beginning and end of the context than in the middle — means that information placement within the context window also matters. Effective context window utilization is less about having the biggest window and more about putting the right information in the right place.

  • Scaling Laws - How context window growth relates to model scaling

Related Terms