Skip to content
Business Term

Context Window

コンテキスト・ウィンドウ

A context window is the amount of input and output a model can consider in one request. Larger windows help with long material, but relevance, cost, latency, and distraction still matter.

Formula
Used tokens / token limit
Use when
Teams can decide whether to pass long material directly or retrieve only relevant sections.
Watch out
Instructions, history, documents, search results, tool output, generated answer
Updated: 07/04/2026Quality: ReviewedPage tier: Reviewed articleSources: 2

What it means

A context window is the token range a model can process in a single request. System instructions, user input, conversation history, retrieved documents, tool results, and generated output all consume that range. A larger window can support longer documents and multi-document tasks, but adding everything is not automatically better. Low-value context can hide important evidence, increase latency, and raise cost. Practical systems use summarization, chunking, retrieval, prioritization, and history compaction to make the window useful.

How to calculate it

Context usage is evaluated by token use, headroom, and evidence quality. Utilization | Used tokens / token limit | Indicates risk of truncation and cost Evidence density | Useful reference tokens / total reference tokens | Measures context noise Compression savings | Tokens before compaction - tokens after compaction | Shows summary impact

LensFormula / treatmentWhen to use it
UtilizationUsed tokens / token limitIndicates risk of truncation and cost
Evidence densityUseful reference tokens / total reference tokensMeasures context noise
Compression savingsTokens before compaction - tokens after compactionShows summary impact

What counts / what does not

A context window is a working area for one request, not a permanent memory or knowledge base. Include | Instructions, history, documents, search results, tool output, generated answer | Used in one request Exclude | Permanent memory, entire databases, permissions, source guarantees | Requires other systems Make explicit | What to include, summarize, retrieve, or drop | Determines quality and cost

ItemTreatmentWhy it matters
IncludeInstructions, history, documents, search results, tool output, generated answerUsed in one request
ExcludePermanent memory, entire databases, permissions, source guaranteesRequires other systems
Make explicitWhat to include, summarize, retrieve, or dropDetermines quality and cost

What moves the number

Results depend on selection and structure, not only on window size. Priority | Put evidence relevant to the task first Chunking | Split long material into meaningful units History compaction | Preserve decisions and remove stale noise Cost | Larger inputs can increase latency and spend

DriverMetric impact
PriorityPut evidence relevant to the task first
ChunkingSplit long material into meaningful units
History compactionPreserve decisions and remove stale noise
CostLarger inputs can increase latency and spend

When it helps

Teams can decide whether to pass long material directly or retrieve only relevant sections. Conversation products can decide when to summarize, compact, or drop history. Failures from missing context can be separated from failures caused by context noise.

  • Teams can decide whether to pass long material directly or retrieve only relevant sections.
  • Conversation products can decide when to summarize, compact, or drop history.
  • Failures from missing context can be separated from failures caused by context noise.

How to use it

  • The context window is the model's request-time working area.
  • Long windows are useful, but relevance and structure matter.
  • Instructions, history, documents, tools, and output share the same budget.
  • RAG, summarization, chunking, and compaction improve effective use.
  • Avoid designs that simply include everything.

Decision cautions

More context does not always mean better answers. Irrelevant documents can bury important evidence. Long conversations need conflict resolution between old and new constraints. Check logging and retention rules before placing confidential data in context.

  • Irrelevant documents can bury important evidence.
  • Long conversations need conflict resolution between old and new constraints.
  • Check logging and retention rules before placing confidential data in context.

Read with

Context windows should be read with RAG, prompting, and LLM design. RAG | Retrieves only needed material | Improves context efficiency Prompt Engineering | Structures the input | Uses limited space well Large Language Model | Processes the context | Limits vary by model

MetricRoleWhy read together
RAGRetrieves only needed materialImproves context efficiency
Prompt EngineeringStructures the inputUses limited space well
Large Language ModelProcesses the contextLimits vary by model

Example

A team builds a Q&A assistant over a 100-page policy document. Passing the whole document is slow and sometimes mixes old and new sections. The team chunks the policy by section and uses retrieval to include only the most relevant passages. Conversation history is compacted to the decisions and constraints that still matter. Inputs shrink and citation becomes easier. The system improves because it selects the right context, not because it blindly uses the largest possible window.

Compare with

Context Window | Request-time working area | Sets input and output range Memory | Longer-term retained information | Product-design dependent RAG | Retrieves external information | Selects what enters context

MetricDifferenceWhy read together
Context WindowRequest-time working areaSets input and output range
MemoryLonger-term retained informationProduct-design dependent
RAGRetrieves external informationSelects what enters context

Common mistakes

  • Longer is not always better. Noise can reduce answer quality.
  • Context window is not the same as memory. It is not permanent storage.
  • Putting all documents in context does not guarantee accurate citation.

Frequently asked questions

Is a larger context window always better?

No. It helps with long material, but irrelevant context can increase cost and reduce quality.

Is it the same as memory?

No. It is the request-time range a model can use. Memory is a product-level persistence feature.

How does RAG relate?

RAG retrieves relevant information and places it into the context window so the window is used more efficiently.

Sources

SourcesKindLink
NIST: AI RMFtier_sOpen
NIST: Generative AI Profiletier_sOpen
Context Window | YogoQ Core