Go up one level Meta

Close submenu Main menu

BACK

Clear

Try Meta AI

FEATURED

Agents Rule of Two: A Practical Approach to AI Agent Security

October 31, 2025•

14 minute read

Imagine a personal AI agent, Email-Bot, that’s designed to help you manage your inbox. In order to provide value and operate effectively, Email-Bot might need to:

While the automated email assistant can be of great help, this hypothetical bot can also demonstrate how AI agents are introducing novel risks. Notably, one of the biggest challenges for the industry is that of agents’ susceptibility to prompt injection.

Prompt injection is a fundamental, unsolved weakness in all LLMs. With prompt injection, certain types of untrustworthy strings or pieces of data — when passed into an AI agent’s context window — can cause unintended consequences, such as ignoring the instructions and safety guidelines provided by the developer or executing unauthorized tasks. This vulnerability could be enough for an attacker to take control of the agent and cause harm to the AI agent’s user.

Using our Email-Bot example, if an attacker puts a prompt injection string in an email to the targeted user, they might be able to hijack the AI agent once that email is processed. Example attacks could include exfiltrating sensitive data, such as private email contents, or taking unwanted actions, such as sending phishing messages to the target’s friends.

Like many of our industry peers, we’re excited by the potential for agentic AI to improve people’s lives and enhance productivity. The path to reach this vision involves granting AI agents like Email-Bot more capabilities, including access to:

At Meta, we’re thinking deeply about how agents can be most useful to people by balancing the utility and flexibility needed for this product vision while minimizing bad outcomes from prompt injection, such as exfiltration of private data, forcing actions to be taken on a user’s behalf, or system disruption. To best protect people and our systems from this known risk, we’ve developed the Agents Rule of Two. When this framework is followed, the severity of security risks is deterministically reduced.

Inspired by the similarly named policy developed for Chromium, as well as Simon Willison’s “lethal trifecta, our framework aims to help developers understand and navigate the tradeoffs that exist today with these new powerful agent frameworks.

Agents Rule of Two

At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection.

[A] An agent can process untrustworthy inputs

[B] An agent can have access to sensitive systems or private data

[C] An agent can change state or communicate externally

It’s still possible that all three properties are necessary to carry out a request. If an agent requires all three without starting a new session (i.e., with a fresh context window), then the agent should not be permitted to operate autonomously and at a minimum requires supervision — via human-in-the-loop approval or another reliable means of validation.

How the Agents Rule of Two Stops Exploitation

Let’s return to our example Email-Bot to see how applying the Agents Rule of Two can prevent a data exfiltration attack.

Attack Scenario: Prompt injection within a spam email contains a string that instructs a user’s Email-Bot to gather the private contents of the user’s inbox and forward them to the attacker by calling a Send-New-Email tool.

This attack is successful because:

With the Agents Rule of Two, this attack can be prevented in a few different ways:

With the Agents Rule of Two, agent developers can compare different designs and their associated tradeoffs (such as user friction or limits on capabilities) to determine which option makes the most sense for their users’ needs.

Hypothetical Examples and Implementations of the Agents Rule of Two

Let’s look at three other hypothetical agent use cases to see how they might choose to satisfy the framework.

Travel Agent Assistant [AB]

Web Browsing Research Assistant [AC]

High-Velocity Internal Coder [BC]

As is common for general frameworks, the devil is ultimately in the details. In order to enable additional use cases, it can be safe for an agent to transition from one configuration of the Agents Rule of Two to another within the same session. One concrete example would be starting in [AC] to access the internet and completing a one-way switch to [B] by disabling communication when accessing internal systems.

While all of the specific ways this can be done have been omitted for brevity, readers can infer when this can be safely accomplished through focus on disrupting the exploit path — namely preventing an attack from completing the full chain from [A] → [B] → [C].

Limitations

It’s important to note that satisfying the Agents Rule of Two should not be viewed as sufficient for protecting against other threat vectors common to agents (e.g., attacker uplift, proliferation of spam, agent mistakes, hallucinations, excessive privileges, etc.) or lower consequence outcomes of prompt injection (e.g., misinformation in the agent’s response).

Similarly, applying the Agents Rule of Two should not be viewed as a finish line for mitigating risk. Designs that satisfy the Agents Rule of Two can still be prone to failure (e.g., a user blindly confirming a warning interstitial), and defense in depth is a critical component towards mitigating the highest risk scenarios when the failure of a single layer may be likely. The Agents Rule of Two is a supplement — and not a substitute — for common security principles such as least-privilege.

Existing Solutions

For further AI protection solutions that complement the Agents Rule of Two, read more about our Llama Protections. Offerings include Llama Firewall for orchestrating agent protections, Prompt Guard for classifying potential prompt injections, Code Shield to reduce insecure code suggestions, and Llama Guard for classifying potentially harmful content.

What’s Next

We believe the Agents Rule of Two is a useful framework for developers today. We’re also excited by its potential to enable secure development at scale.

With the adoption of plug-and-play agentic tool-calling through protocols such as Model Context Protocol (MCP), we see both emerging novel risks and opportunities. While blindly connecting agents to new tools can be a recipe for disaster, there’s potential for enabling security-by-default with built-in Rule of Two awareness. For example, by declaring an Agents Rule of Two configuration in supporting tool calls, developers can have increased confidence that an action will succeed, fail, or request additional approval in accordance with their policy.

We also know that as agents become more useful and capabilities grow, some highly sought-after use cases will be difficult to fit cleanly into the Agents Rule of Two, such as a background process where human-in-the-loop is disruptive or ineffective. While we believe that traditional software guardrails and human approvals continue to be the preferred method of satisfying the Agents Rule of Two in present use cases, we’ll continue to pursue research towards satisfying the Agents Rule of Two’s supervisory approval checks via alignment controls, such as oversight agents and the open source LlamaFirewall platform. We look forward to sharing more in the future.


Share:


Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

See all open positions

Our approach

About AI at Meta

People

Careers

Research

Infrastructure

Resources

Demos

Meta AI

Explore Meta AI

Get Meta AI

AI Studio

Latest news

Blog

Newsletter

Foundational models

Llama









Our approach

Our approach About AI at Meta People Careers

Research

Research Infrastructure Resources Demos

Meta AI

Meta AI Explore Meta AI Get Meta AI AI Studio

Latest news

Latest news Blog Newsletter

Foundational models

Llama









Privacy Policy

Terms

Cookies

Meta © 2025









Facebook