Skip to content
Business Term

Prompt Injection

プロンプト・インジェクション

Prompt injection is an attack or failure pattern where untrusted text tries to override the AI system's intended instructions. It matters most when models read external content, retrieve documents, or use tools.

Formula
Number of untrusted-input paths
Use when
Teams can define trust boundaries before letting RAG or agents read external content.
Watch out
Malicious instructions embedded in web pages, emails, PDFs, tickets, or chat
Updated: 07/04/2026Quality: ReviewedPage tier: Reviewed articleSources: 3

What it means

Prompt injection occurs when user input, retrieved documents, web pages, emails, tickets, or other untrusted text contains instructions that attempt to redirect the model away from its intended system or developer instructions. It is difficult to treat as ordinary input validation because language models interpret text as both data and instruction. The risk grows in RAG, browsing, tool use, MCP integrations, and AI agents. Mitigation requires separating trusted instructions from untrusted content, minimizing tool permissions, adding human confirmation for high-impact actions, and logging tool calls and evidence.

How to calculate it

Risk is evaluated by exposure to untrusted input, tool permissions, and unconfirmed high-impact actions. Attack exposure | Number of untrusted-input paths | Grows with web, email, and document ingestion Impact | Tool authority x data sensitivity | Measures potential damage Defense pass rate | Blocked or confirmed dangerous cases / test cases | Measures mitigation coverage

LensFormula / treatmentWhen to use it
Attack exposureNumber of untrusted-input pathsGrows with web, email, and document ingestion
ImpactTool authority x data sensitivityMeasures potential damage
Defense pass rateBlocked or confirmed dangerous cases / test casesMeasures mitigation coverage

What counts / what does not

Prompt injection is not just a bad question; it is untrusted content being treated as instruction. Include | Malicious instructions embedded in web pages, emails, PDFs, tickets, or chat | External input risk Exclude | Ordinary mistakes, typos, generic model hallucination | Different failure path Make explicit | Trust boundary, tool permissions, confirmation UI, logs, test cases | Required for defense design

ItemTreatmentWhy it matters
IncludeMalicious instructions embedded in web pages, emails, PDFs, tickets, or chatExternal input risk
ExcludeOrdinary mistakes, typos, generic model hallucinationDifferent failure path
Make explicitTrust boundary, tool permissions, confirmation UI, logs, test casesRequired for defense design

What moves the number

Risk changes with external content, execution authority, permissions, UI, and logs. External input | More external reading means more exposure to adversarial text Tool permissions | Write or send tools increase impact Confirmation UI | Human review can stop dangerous actions Logs | Evidence and tool-call history support incident analysis

DriverMetric impact
External inputMore external reading means more exposure to adversarial text
Tool permissionsWrite or send tools increase impact
Confirmation UIHuman review can stop dangerous actions
LogsEvidence and tool-call history support incident analysis

When it helps

Teams can define trust boundaries before letting RAG or agents read external content. Tool scopes can be split into read-only, draft, execute, and external-send levels. Attack cases can be added to evaluation before launch.

  • Teams can define trust boundaries before letting RAG or agents read external content.
  • Tool scopes can be split into read-only, draft, execute, and external-send levels.
  • Attack cases can be added to evaluation before launch.

How to use it

  • Prompt injection is the risk that untrusted content becomes an instruction to the AI.
  • It is especially important in RAG, browsing, tool use, and agents.
  • Prompt wording alone cannot fully prevent it.
  • Least privilege, human confirmation, input separation, logs, and tests are required.
  • High-impact actions should not execute solely on model judgment.

Decision cautions

Assume external documents may be adversarial. Show users the evidence and action before calling high-impact tools. Do not treat text such as ignore previous instructions as privileged instruction when it comes from untrusted content. Use separate approval layers for secrets, external sending, deletion, purchases, and permission changes.

  • Show users the evidence and action before calling high-impact tools.
  • Do not treat text such as ignore previous instructions as privileged instruction when it comes from untrusted content.
  • Use separate approval layers for secrets, external sending, deletion, purchases, and permission changes.

Read with

Prompt injection should be read with AI agents, MCP, and tool use. AI Agent | Reads external information and uses tools | Higher potential impact MCP | Connects tools and resources | Requires scoped exposure and confirmation AI Evaluation | Tests attack cases | Measures mitigation effectiveness

MetricRoleWhy read together
AI AgentReads external information and uses toolsHigher potential impact
MCPConnects tools and resourcesRequires scoped exposure and confirmation
AI EvaluationTests attack casesMeasures mitigation effectiveness

Example

An AI agent reads web pages for competitor research. One page contains hidden text telling the model to send internal notes outside the company. If the agent has no external-send tool, the attack has little impact; if it has send permission, the risk is high. The team changes the design so external page text is never treated as trusted instruction, send tools can only draft, human confirmation is required before sending, tool-call logs are kept, and attack strings are added to the evaluation set.

Compare with

Prompt Injection | Hidden instruction in input redirects AI | External input plus tools Hallucination | Model gives incorrect content | Handled by grounding and evaluation Authorization flaw | User can do what they should not | Handled by access control

MetricDifferenceWhy read together
Prompt InjectionHidden instruction in input redirects AIExternal input plus tools
HallucinationModel gives incorrect contentHandled by grounding and evaluation
Authorization flawUser can do what they should notHandled by access control

Common mistakes

  • A stronger system prompt is not enough. Permissions and confirmation still matter.
  • The risk is not only malicious users. Web pages and documents can contain hostile instructions.
  • Reviewing final output is not enough. Tool calls and data access must be audited.

Frequently asked questions

Can prompt engineering prevent prompt injection?

It can reduce some cases, but permissions, tool design, confirmation UI, logs, and evaluation are also needed.

Does this matter for RAG?

Yes. Retrieved documents can contain adversarial instructions that the model may follow if controls are weak.

What is the first mitigation?

Separate untrusted input from trusted instructions, minimize tool permissions, and require confirmation for high-impact actions.

Sources

SourcesKindLink
NIST: Generative AI Profiletier_sOpen
NIST: AI RMFtier_sOpen
Model Context Protocol: Toolstier_sOpen
Prompt Injection | YogoQ Core