Context Window
コンテキスト・ウィンドウ
A context window is the amount of input and output a model can consider in one request. Larger windows help with long material, but relevance, cost, latency, and distraction still matter.
What it means
A context window is the token range a model can process in a single request. System instructions, user input, conversation history, retrieved documents, tool results, and generated output all consume that range. A larger window can support longer documents and multi-document tasks, but adding everything is not automatically better. Low-value context can hide important evidence, increase latency, and raise cost. Practical systems use summarization, chunking, retrieval, prioritization, and history compaction to make the window useful.
How to calculate it
Context usage is evaluated by token use, headroom, and evidence quality. Utilization | Used tokens / token limit | Indicates risk of truncation and cost Evidence density | Useful reference tokens / total reference tokens | Measures context noise Compression savings | Tokens before compaction - tokens after compaction | Shows summary impact
| Lens | Formula / treatment | When to use it |
|---|---|---|
| Utilization | Used tokens / token limit | Indicates risk of truncation and cost |
| Evidence density | Useful reference tokens / total reference tokens | Measures context noise |
| Compression savings | Tokens before compaction - tokens after compaction | Shows summary impact |
What counts / what does not
A context window is a working area for one request, not a permanent memory or knowledge base. Include | Instructions, history, documents, search results, tool output, generated answer | Used in one request Exclude | Permanent memory, entire databases, permissions, source guarantees | Requires other systems Make explicit | What to include, summarize, retrieve, or drop | Determines quality and cost
| Item | Treatment | Why it matters |
|---|---|---|
| Include | Instructions, history, documents, search results, tool output, generated answer | Used in one request |
| Exclude | Permanent memory, entire databases, permissions, source guarantees | Requires other systems |
| Make explicit | What to include, summarize, retrieve, or drop | Determines quality and cost |
What moves the number
Results depend on selection and structure, not only on window size. Priority | Put evidence relevant to the task first Chunking | Split long material into meaningful units History compaction | Preserve decisions and remove stale noise Cost | Larger inputs can increase latency and spend
| Driver | Metric impact |
|---|---|
| Priority | Put evidence relevant to the task first |
| Chunking | Split long material into meaningful units |
| History compaction | Preserve decisions and remove stale noise |
| Cost | Larger inputs can increase latency and spend |
When it helps
Teams can decide whether to pass long material directly or retrieve only relevant sections. Conversation products can decide when to summarize, compact, or drop history. Failures from missing context can be separated from failures caused by context noise.
- Teams can decide whether to pass long material directly or retrieve only relevant sections.
- Conversation products can decide when to summarize, compact, or drop history.
- Failures from missing context can be separated from failures caused by context noise.
How to use it
- The context window is the model's request-time working area.
- Long windows are useful, but relevance and structure matter.
- Instructions, history, documents, tools, and output share the same budget.
- RAG, summarization, chunking, and compaction improve effective use.
- Avoid designs that simply include everything.
Decision cautions
More context does not always mean better answers. Irrelevant documents can bury important evidence. Long conversations need conflict resolution between old and new constraints. Check logging and retention rules before placing confidential data in context.
- Irrelevant documents can bury important evidence.
- Long conversations need conflict resolution between old and new constraints.
- Check logging and retention rules before placing confidential data in context.
Read with
Context windows should be read with RAG, prompting, and LLM design. RAG | Retrieves only needed material | Improves context efficiency Prompt Engineering | Structures the input | Uses limited space well Large Language Model | Processes the context | Limits vary by model
| Metric | Role | Why read together |
|---|---|---|
| RAG | Retrieves only needed material | Improves context efficiency |
| Prompt Engineering | Structures the input | Uses limited space well |
| Large Language Model | Processes the context | Limits vary by model |
Example
A team builds a Q&A assistant over a 100-page policy document. Passing the whole document is slow and sometimes mixes old and new sections. The team chunks the policy by section and uses retrieval to include only the most relevant passages. Conversation history is compacted to the decisions and constraints that still matter. Inputs shrink and citation becomes easier. The system improves because it selects the right context, not because it blindly uses the largest possible window.
Compare with
Context Window | Request-time working area | Sets input and output range Memory | Longer-term retained information | Product-design dependent RAG | Retrieves external information | Selects what enters context
| Metric | Difference | Why read together |
|---|---|---|
| Context Window | Request-time working area | Sets input and output range |
| Memory | Longer-term retained information | Product-design dependent |
| RAG | Retrieves external information | Selects what enters context |
Common mistakes
- Longer is not always better. Noise can reduce answer quality.
- Context window is not the same as memory. It is not permanent storage.
- Putting all documents in context does not guarantee accurate citation.
Frequently asked questions
Is a larger context window always better?
No. It helps with long material, but irrelevant context can increase cost and reduce quality.
Is it the same as memory?
No. It is the request-time range a model can use. Memory is a product-level persistence feature.
How does RAG relate?
RAG retrieves relevant information and places it into the context window so the window is used more efficiently.