Business Term

Prompt Injection

プロンプト・インジェクション

Prompt injection is an attack or failure pattern where untrusted text tries to override the AI system's intended instructions. It matters most when models read external content, retrieve documents, or use tools.

Formula

Number of untrusted-input paths

Use when

Teams can define trust boundaries before letting RAG or agents read external content.

Watch out

Malicious instructions embedded in web pages, emails, PDFs, tickets, or chat

Updated: 07/04/2026Quality: ReviewedPage tier: Reviewed articleSources: 3

What it means

Prompt injection occurs when user input, retrieved documents, web pages, emails, tickets, or other untrusted text contains instructions that attempt to redirect the model away from its intended system or developer instructions. It is difficult to treat as ordinary input validation because language models interpret text as both data and instruction. The risk grows in RAG, browsing, tool use, MCP integrations, and AI agents. Mitigation requires separating trusted instructions from untrusted content, minimizing tool permissions, adding human confirmation for high-impact actions, and logging tool calls and evidence.

How to calculate it

Risk is evaluated by exposure to untrusted input, tool permissions, and unconfirmed high-impact actions. Attack exposure | Number of untrusted-input paths | Grows with web, email, and document ingestion Impact | Tool authority x data sensitivity | Measures potential damage Defense pass rate | Blocked or confirmed dangerous cases / test cases | Measures mitigation coverage

Lens	Formula / treatment	When to use it
Attack exposure	Number of untrusted-input paths	Grows with web, email, and document ingestion
Impact	Tool authority x data sensitivity	Measures potential damage
Defense pass rate	Blocked or confirmed dangerous cases / test cases	Measures mitigation coverage

What counts / what does not

Prompt injection is not just a bad question; it is untrusted content being treated as instruction. Include | Malicious instructions embedded in web pages, emails, PDFs, tickets, or chat | External input risk Exclude | Ordinary mistakes, typos, generic model hallucination | Different failure path Make explicit | Trust boundary, tool permissions, confirmation UI, logs, test cases | Required for defense design

Item	Treatment	Why it matters
Include	Malicious instructions embedded in web pages, emails, PDFs, tickets, or chat	External input risk
Exclude	Ordinary mistakes, typos, generic model hallucination	Different failure path
Make explicit	Trust boundary, tool permissions, confirmation UI, logs, test cases	Required for defense design

What moves the number

Risk changes with external content, execution authority, permissions, UI, and logs. External input | More external reading means more exposure to adversarial text Tool permissions | Write or send tools increase impact Confirmation UI | Human review can stop dangerous actions Logs | Evidence and tool-call history support incident analysis

Driver	Metric impact
External input	More external reading means more exposure to adversarial text
Tool permissions	Write or send tools increase impact
Confirmation UI	Human review can stop dangerous actions
Logs	Evidence and tool-call history support incident analysis

When it helps

Teams can define trust boundaries before letting RAG or agents read external content. Tool scopes can be split into read-only, draft, execute, and external-send levels. Attack cases can be added to evaluation before launch.

Teams can define trust boundaries before letting RAG or agents read external content.
Tool scopes can be split into read-only, draft, execute, and external-send levels.
Attack cases can be added to evaluation before launch.

How to use it

Prompt injection is the risk that untrusted content becomes an instruction to the AI.
It is especially important in RAG, browsing, tool use, and agents.
Prompt wording alone cannot fully prevent it.
Least privilege, human confirmation, input separation, logs, and tests are required.
High-impact actions should not execute solely on model judgment.

Decision cautions

Assume external documents may be adversarial. Show users the evidence and action before calling high-impact tools. Do not treat text such as ignore previous instructions as privileged instruction when it comes from untrusted content. Use separate approval layers for secrets, external sending, deletion, purchases, and permission changes.

Show users the evidence and action before calling high-impact tools.
Do not treat text such as ignore previous instructions as privileged instruction when it comes from untrusted content.
Use separate approval layers for secrets, external sending, deletion, purchases, and permission changes.

Read with

Metric	Role	Why read together
AI Agent	Reads external information and uses tools	Higher potential impact
MCP	Connects tools and resources	Requires scoped exposure and confirmation
AI Evaluation	Tests attack cases	Measures mitigation effectiveness

Example

An AI agent reads web pages for competitor research. One page contains hidden text telling the model to send internal notes outside the company. If the agent has no external-send tool, the attack has little impact; if it has send permission, the risk is high. The team changes the design so external page text is never treated as trusted instruction, send tools can only draft, human confirmation is required before sending, tool-call logs are kept, and attack strings are added to the evaluation set.

Compare with

Metric	Difference	Why read together
Prompt Injection	Hidden instruction in input redirects AI	External input plus tools
Hallucination	Model gives incorrect content	Handled by grounding and evaluation
Authorization flaw	User can do what they should not	Handled by access control

Common mistakes

A stronger system prompt is not enough. Permissions and confirmation still matter.
The risk is not only malicious users. Web pages and documents can contain hostile instructions.
Reviewing final output is not enough. Tool calls and data access must be audited.

Frequently asked questions

Can prompt engineering prevent prompt injection?

It can reduce some cases, but permissions, tool design, confirmation UI, logs, and evaluation are also needed.

Does this matter for RAG?

Yes. Retrieved documents can contain adversarial instructions that the model may follow if controls are weak.

What is the first mitigation?

Separate untrusted input from trusted instructions, minimize tool permissions, and require confirmation for high-impact actions.

Sources

Sources	Kind	Link
NIST: Generative AI Profile	tier_s	Open
NIST: AI RMF	tier_s	Open
Model Context Protocol: Tools	tier_s	Open

On this page

What it means How to calculate it What counts / what does not What moves the number When it helps How to use it Decision cautions Read with Example Compare with Common mistakes Frequently asked questions Sources Related topics

Trust

Quality: Reviewed
Page tier: Reviewed article
Updated: 07/04/2026
COI: None
Sources: 3

This page is reference information for research and learning. For accounting, legal, finance, health, security, or other individual decisions, confirm against primary sources or qualified professionals.

Read editorial policy Send a correction

AI-readable

Read-only preview for Reviewed terms.

JSON Markdown

Trust

Quality: Reviewed
Page tier: Reviewed article
Updated: 07/04/2026
COI: None
Sources: 3

Read editorial policy Send a correction

AI-readable

Read-only preview for Reviewed terms.

JSON Markdown