Skip to content
IC Inline Code
All posts

AI · Authorisation

OAuth scopes weren't built for AI agents: the delegation model that holds up under prompt injection

OAuth scopes assume a human approves once, an app does narrow work, and the trust horizon is months. AI agents break every part of that assumption. The architecture that holds is a two-principal model with short-lived delegation tokens, ReBAC for structure, ABAC for context, and per-action consent gating destructive operations. Here is the design and the rollout.

Mathew Sayed Mathew Sayed
· · 20 min read

The recent agent security incidents most people have read about — the ones where an agent posted something the user did not ask it to post, exfiltrated data the user did not know it could read, or executed an action across an integration the user thought was idle — share a single architectural failure. The platforms involved authorised their agents the same way they authorised traditional applications: OAuth scopes, granted once, durable for the life of the integration, broad enough to cover whatever the agent might want to do. That model was correct for the world it was designed for. It is not correct for AI agents acting on behalf of users, and the longer a platform leans on it, the larger the surface that an inevitable prompt injection or instruction conflict gets to operate against.

This piece is the architecture we recommend for platforms adding AI agents that read user data and take actions on connected systems. It is opinionated about the model — two principals, short-lived delegation tokens, ReBAC for structural permissions, ABAC for contextual constraints, per-action consent for destructive operations — and about the rollout cadence. It is also opinionated about what to build and what to buy, because the wrong build/buy line is how teams end up reinventing a relationship store at the same time as they are trying to ship an agent.

The MCP-specific authorisation surface — how Model Context Protocol exposes internal tools to agents and what teams typically miss when wiring it up — is a narrower problem and is covered in a companion post. What follows here is the broader architecture, of which the MCP integration is one component.

Why OAuth scopes are the wrong primitive

The OAuth pattern most platforms ship today goes like this. The application declares the scopes it needs — documents.read, messages.send, social.write. The user is shown a consent screen, clicks Allow once, and the application receives a long-lived access token. From that point until the user revokes it, any call within those scopes is authorised. That model was correct for a world where the application was a deterministic piece of code, written by a known developer, making a small and predictable number of calls per session.

An AI agent breaks every assumption in that sentence. The agent is not deterministic — its decisions are probabilistic and influenced by the user’s prompt, by content the agent reads, and by instructions injected into that content by third parties. The number of distinct actions per session is not small or predictable; an agent given a vague task may make tens of API calls before completing it. Adversarial input is not rare; for any agent that reads user-supplied content, prompt injection is a constant operating condition. The trust horizon is not months; the right horizon for an agent acting on a specific task is minutes. And the audit question is not “which application did this” — it is “which agent, on behalf of which user, did this, in response to which prompt, with reference to which consent.”

A coarse, durable, application-level scope cannot answer those questions. It cannot constrain the agent to specific resources within a session, cannot bind the agent’s actions to the consent the user actually gave, and cannot meaningfully be revoked at the speed at which an agent operates. Treating OAuth scopes as a sufficient authorisation primitive for agents is the architectural bet that produces the failures.

The two-principal model

The single design call that matters most: the agent is a distinct principal from the user. They appear in audit logs, in policy queries, and in token claims as separate identities. The user authenticates and grants the platform a set of permissions over their resources. The agent authenticates against the platform with its own credentials, has its own lifecycle, and is registered separately. When the agent acts on behalf of the user, what flows is not the user’s token but a delegation token — a strict reduction of the user’s grants, scoped to a specific session, a specific task, and a specific set of resources.

Three properties follow from treating these as distinct principals.

The user never gives the agent their token. A token that grants “everything user U can do” cannot exist for the agent. The only token the agent ever holds is one the platform has minted specifically for this delegation, and that token is a strict subset of the user’s grants. If the agent loses the token to a prompt injection or a runtime compromise, the blast radius is bounded by what was in the token, not by what the user can do.

The agent has its own auditable identity. Every call carries both principals — the agent acting, the user being acted for. The audit log answers “which agent did this” separately from “which user was involved,” which is the only way to ask sensible questions like “did agent X do something its operators did not intend” or “did user U’s session cover this action.” Without two principals, you can answer one of those questions or the other, but not both.

Revocation has two axes. A user revokes the agent’s right to act on their behalf when the task is done. An agent operator revokes the agent globally when the agent itself is compromised. A platform admin revokes a specific delegation when an anomaly is detected. These are different operations with different blast radii, and they are only expressible as separate operations if the principals are separate.

Why ReBAC for structure, ABAC for context

Once two principals exist, the next question is what authorisation model expresses their relationship to resources. The three candidates are familiar — RBAC, ABAC, ReBAC — and the answer for agents is a hybrid.

RBAC is too coarse on its own. Roles cannot natively express “this user owns this specific document and the agent should be allowed to read this one and only this one.” The workaround — a role per resource — produces the role explosion every RBAC implementation eventually meets. For agent authorisation, where the unit of access is typically a specific resource within a specific session, RBAC alone is structurally wrong.

ABAC alone is hard to debug. ABAC’s strength is policies over arbitrary attributes — device posture, session age, time of day, the presence of a specific consent receipt. Its weakness is that “why was I denied?” requires re-running the policy engine with full context, which is a poor experience for users and an expensive one for incident response. ABAC works well as a layer, not as the sole foundation.

ReBAC fits the shape of the data. Most platforms that add AI agents have a relationship graph as their underlying data model already — users own documents, documents live in folders, folders belong to teams, teams have shared resources, channels are connected to user accounts. ReBAC, the model Google described in the Zanzibar paper and is now widely implemented, expresses permissions as queries over that graph. “Is this agent, acting on behalf of this user, ever allowed to read this specific document?” is a graph query, and a fast one with the right indexes.

The recommendation is a hybrid. Use ReBAC as the structural layer — the relationship store answers “is this agent ever allowed to act on this resource on this user’s behalf?” Layer ABAC on top via a policy engine — the policy answers “is it allowed right now, given the session is two minutes old, the user has consented to this category of action, the request comes from a known device, and the agent’s reasoning trace does not match a flagged injection pattern?” The two layers are reasoned about independently and composed at the gateway.

For tooling, this is now mostly a buy decision. SpiceDB and OpenFGA are the production-grade open-source Zanzibar-style relationship stores; the choice between them is a managed-versus-self-hosted question. Open Policy Agent (OPA) is the obvious choice for the policy layer — it is CNCF-graduated, runs as a sidecar with sub-millisecond decision latency, and lets policies be authored in Rego rather than custom code. AWS Cedar is a credible alternative if the platform is AWS-native and willing to take on the language. Building a relationship store in-house is a last resort; the bar Google set with Zanzibar is high enough that most teams that try this regret it within a year.

What goes in the delegation token

The delegation token is the artefact every API call ultimately checks. Its claims need to express six things — the agent and user as distinct principals, the strict scope reduction, the consent envelope, the session context, the audience, and the lifetime — and they need to express them using standards rather than ad-hoc fields.

The token is short-lived. Five minutes is the right starting point — long enough for an agent to complete a coherent task, short enough that revocation through token expiry is meaningful even before active revocation kicks in. Use PASETO over JWT for new builds where the ecosystem permits — the cryptographic footprint is smaller and the implementation pitfalls are fewer — but JWT with strict algorithm enforcement is acceptable.

The principal claims follow RFC 8693 token exchangesub is the agent, act is the user. Using the standard rather than inventing a custom field matters; downstream services and SIEMs will eventually parse these tokens and the standard claim is what they will look for.

The scope reduction is explicit and resource-shaped. Rather than documents.read covering every document the user owns, the token names specific resources and specific actions — read this document, post to this channel, send this message to this recipient. If the agent attempts something not in the token, the gateway denies it. The reduction is computed at delegation time from the user’s grants and the task; the agent runtime does not get to expand it.

The consent envelope binds the user’s most recent consent to the token. It records which actions the user explicitly approved at consent time, and which categories require step-up authentication if the agent attempts them. If the agent tries an action outside the envelope — even one its scope might cover — the gateway returns a step-up requirement and the UI surfaces a fresh consent dialog before the agent can proceed.

Context constraints bind the token to a session — the device identity, optionally a network range, a maximum invocation count, a session identifier that the audit log will join on. Audience binding (aud) ensures a token issued for the documents API cannot be replayed against the messaging API. A unique token identifier (jti) supports replay prevention for high-risk single-use actions.

The token is minted by a single token service. No application backend issues these tokens directly. Centralising the token mint is what makes the delegation pattern auditable; if every service is allowed to mint its own delegation tokens, the consistency the model depends on is gone.

A token shaped this way looks roughly like the following — the exact field names will vary by platform, but the structure is what teams should be comparing their own implementation against.

{
  "iss": "auth.platform.example",
  "aud": "documents-api",
  "sub": "agent_a1b2c3",
  "act": {
    "sub": "user_d4e5f6",
    "session_id": "sess_g7h8i9"
  },
  "scope_reduction": {
    "documents": ["doc_uuid_1", "doc_uuid_2"],
    "actions": ["read", "post_to_channel"],
    "channels": ["channel:user-handle"]
  },
  "consent_envelope": {
    "consented_actions": ["post_to_channel"],
    "high_risk_actions_require_step_up": true
  },
  "context_constraints": {
    "device_id": "dev_uuid",
    "ip_range": "203.0.113.0/24",
    "max_invocations": 10
  },
  "iat": 1714780800,
  "exp": 1714781100,
  "jti": "tok_unique_id"
}

Three properties of this shape are worth pointing out. First, sub is the agent and act.sub is the user — that is the RFC 8693 token-exchange pattern in claim form, and it is what makes the two-principal model express itself in every API call. Second, scope_reduction lists specific resource identifiers, not categories — the agent cannot read a document outside that list, even if the user owns it. Third, the consent_envelope is decoupled from scope_reduction deliberately; the scope says what the agent might do, the consent envelope says what the user has actually approved, and the gap between them is where step-up authentication lives.

The hardest design call is consent UX. Get it wrong by surfacing too often, and users banner-blind themselves into approving anything. Get it wrong by surfacing too rarely, and the architecture’s protection against destructive actions evaporates. The pattern that works is three tiers with one rule.

Low-risk, repeatable read actions. Reading the user’s own documents, listing their folders, retrieving their connected accounts. Single consent at agent install, cached for the agent’s lifetime against the user’s account. The cache is invalidated on scope change, role change, account change, abuse signal, or device change.

Medium-risk, repeatable read actions with broader sensitivity. Reading brand assets, listing connected channels, accessing organisation-shared resources. Single consent per session, cached for the session, surfaced again on the next session.

High-risk, destructive, irreversible actions. Posting to a connected channel, sending a message, deleting a record, sharing a resource publicly, altering a permission. Per-action consent. Never cached. Summarised in the user’s language at the moment of the action — “your agent is about to post this image to this channel; approve?” — with the actual content of the action visible to the user before they approve.

The one rule that holds the model together: an action outside the consent envelope requires step-up authentication, no exceptions, regardless of what the scope allows. This is the line that defends against prompt injection. Even if injected content tricks the agent into attempting a destructive action, and even if the scope reduction was generous enough that the action is technically within scope, the consent envelope check forces a fresh human approval before the action lands. The user sees the dialog, the user reads what is about to happen, the user says no.

That is the senior point about authorisation in the agent era: prompt injection is an authorisation problem, not an isolation problem. Sandboxing the agent, restricting its outbound network, scrubbing its inputs — these are useful, but none of them stop an agent that has been instructed to do something the user did not intend. The thing that stops it is an authorisation layer that does not believe the agent’s claim that the user wanted this, and that requires fresh evidence before destructive actions complete.

Reference architecture

The architecture is six components and a small number of well-defined boundaries.

Identity provider. The user’s authentication, MFA, and session establishment. This is a buy decision in almost every case — the OIDC providers in the market handle this better than any internal team will.

Agent registry. The lifecycle store for agents — who deployed this agent, what version is it, what is the agent’s own credentials, what categories of action does it declare, what is its operational status. Built in-house because it is tightly coupled to the platform’s billing, ownership, and lifecycle models.

Token service. The single component that mints delegation tokens. Takes the user’s session context, the agent’s request, and the task scope as input; computes the strict reduction; mints the short-lived token. Built in-house because the reduction logic is platform-specific.

Relationship store. SpiceDB or OpenFGA. Holds the graph of who relates to what — users own documents, documents live in folders, agents act on behalf of users, channels are connected to user accounts. Bought as managed service or self-hosted; not built.

Policy engine. OPA sidecars at the API gateway, running policy in Rego, consuming snapshots of the relationship graph. Every API call consults the engine before serving. Bought as the engine; the policies are built in-house because they encode the platform’s specific model of consent, risk, and context.

Audit log. Append-only, hash-chained, archived to immutable storage. Records both principals on every entry — agent acting, user being acted for — plus the delegation token’s identifier, the policy decision, and the action outcome. Bought as the log infrastructure; the schema is built in-house because the agent surface adds fields that generic audit systems do not have. This is also the artefact a regulator demands when an account does something unexpected, and the artefact a customer demands when their integration produces an outcome they did not approve.

The boundaries that matter: the API never trusts the agent’s identity for resource access — the delegation token is the only thing the API checks, and an agent runtime call without a valid delegation token is denied. The policy engine is consulted on every API call, with caching strictly bounded by request lifetime for medium-risk actions and not at all for destructive ones. The audit log is the system of record; if a decision did not land in the log, it did not happen.

The hard problems and how they are answered

Four problems break this architecture if not designed for at the outset.

The confused deputy. The classic AuthZ pitfall — a privileged service acts on behalf of an unprivileged user but uses its own permissions, accidentally letting the user do something they should not. The defence is the inversion described above: the API never trusts the agent runtime’s own identity for resource access. The delegation token is the only basis for authorisation. Agent identity is informational on every call; the token’s claims are normative.

Revocation latency. A user clicks “revoke this agent” and the question is how long until in-flight requests stop. The bar is sub-second. The implementation pattern is: the token service publishes revocation events to a low-latency pub/sub channel; policy engine sidecars subscribe and maintain an in-memory bloom filter of revoked tokens; every check consults the filter; the filter is rebuilt periodically from the authoritative store. The bloom filter has acceptable false-positive rates and zero false-negatives, which is the right trade-off for revocation. Short-lived tokens cap the worst case even when active revocation lags.

Multi-tenancy in the relationship store. The store holds graphs for every tenant. A cross-tenant relationship leak is catastrophic. The defences are structural: the tenant identifier is a first-class part of every relationship tuple, never inferred from context; policy queries are tenant-scoped at the gateway and the engine never sees a cross-tenant query; a regression suite asserts, for every public API, that a user from tenant A cannot resolve to a resource in tenant B. These tests are run on every deploy.

Performance. Every API call hits the policy engine, which sounds expensive. With local sidecars consuming snapshots of the relationship graph and decision caching scoped to a single request, decisions are sub-millisecond in process. The authoritative store is consulted on changes, not on every read; Zanzibar’s “zookie” pattern (named in the original paper) provides bounded staleness with explicit consistency choices per action class — destructive actions consult fresh; reads can tolerate seconds-old. SpiceDB and OpenFGA both implement this pattern.

What to build and what to buy

Most of this architecture is now buyable. The pattern that holds is to buy the engines and build the policy and the consent UX, because the engines are commodity and the platform’s specific model of consent and risk is not.

Buy the OIDC provider, the relationship store, the policy engine, the audit log infrastructure, and the SIEM rules base for known agent-attack signatures. These are commodity, mature, and well-supported; reinventing any of them is a multi-year detour.

Build the agent registry, the token service, the policy content, the consent UX, the audit schema, and the agent-specific anomaly detection. These are where the platform’s model of how it wants agents to behave is encoded; they are the part regulators, customers, and incident responders will eventually scrutinise; and they are the part that differentiates one platform’s agent surface from another.

The build/buy line is a small number of decisions but they compound. A team that builds its own relationship store will spend a quarter on the store and arrive at the policy work with the budget gone. A team that builds its own consent UX and skips the relationship store will spend a quarter on consent dialogs and end up with a generic policy engine that does not fit its data model. The line above is the one we recommend for platforms with mid-market scale and a credible roadmap to enterprise.

Rollout cadence

The rollout that ships and survives looks like this.

First quarter. Token service issuing delegation tokens for one agent type — the obvious one already on the roadmap, typically a content-and-channel agent that reads user documents and posts to a connected channel. Relationship store modelling the basic relationships (user owns document, agent acts on behalf of user, channel is connected to user account). OPA sidecars at the gateway for the two services that agent type touches. Per-action consent on the destructive operation only; cached consent on reads. Audit log streaming to tamper-evident storage. Beta with a small, paying, opt-in cohort.

Day 30 after first ship. Step-up authentication wired into the consent envelope. Bloom-filter-based revocation in production. Consent receipts visible to users in their security dashboard — every active agent, every active delegation, every action taken. Cross-tenant regression suite running on every deploy. Public docs for the consent model.

Day 90. Agent runtime integration extended to the broader set of agent types on the roadmap. Anomaly detection on agent reasoning patterns, on token-denial rates, and on consent-dialog dismissal patterns. A customer-facing API for enterprise tenants to write their own agent policies in Rego — the multi-tenant policy authoring story is what unblocks the enterprise sales motion.

Day 180. Compliance mapping complete — the existing controls for SOC 2 and ISO 27001 extended with the agent-specific evidence regulators are starting to ask for. Audit log federation to enterprise SIEMs over a streaming protocol. A revocation latency SLO published externally. Public security paper on the agent authorisation model.

The rollout is deliberately conservative on the first quarter because the cost of a public agent failure on a model not yet hardened is greater than the cost of a slower release. Teams that compress the first quarter into six weeks find this out the hard way.

Metrics that tell the truth

The metric set that matters is small and falls into three groups.

Security metrics. Per-tenant revocation latency at p99 — target sub-500ms. Cross-tenant access attempts blocked — non-zero is news, and the news is either a bug or an attack. Step-up consent dialogs shown versus satisfied — a UX-health and abuse signal in one. Agent reasoning patterns matching known injection signatures — research-grade signal, raw count, reviewed weekly.

Product metrics. Percentage of agent installs that complete a first successful action without abandonment — the cleanest signal that consent UX is comprehensible. Percentage of agent invocations blocked at authorisation — a non-zero is correct; a high number is a false-positive ceiling and a UX issue. Mean tokens issued per agent session — an efficiency and billing alignment signal.

Platform metrics. Time from “an internal team wants to introduce a new agent type” to “agent type is live with full delegation tokens, consent UX, and audit coverage.” This is the metric that tells you whether the platform is making it easy to do the right thing. If the time is measured in months, the architecture has succeeded as a paper artefact and failed as an operational one.

Where this lands against Australian regulation

For platforms operating in Australia, three regulatory hooks are already live and a fourth is approaching.

Privacy Act post-2024 reforms. Where an agent’s action constitutes a substantially automated decision that affects an individual, the Privacy Act’s automated decision-making disclosure obligation applies. A delegation model that records, per action, which agent acted on whose behalf and which consent was held at the time is the only practical way to evidence compliance during a complaint or a Commissioner inquiry. The audit log schema described above is what that evidence looks like.

APRA CPS 234. Information security obligations require access to information assets to be limited to what is necessary for the user’s role and authorised. An agent operating on broad, durable OAuth scopes is hard to defend against the “necessary and authorised” test in an audit. A scope-reduced delegation token tied to a specific session and resource is the access control pattern that survives the audit conversation.

ISO 42001. For organisations implementing or planning to implement ISO 42001, Annex A’s controls on operational planning, control of inputs, transparency, and accountability map directly onto the components of this architecture. The relationship store and policy engine are the operational planning evidence; the consent envelope is the accountability evidence; the audit log is the transparency evidence.

EU AI Act extraterritorial reach. For platforms with EU users, the AI Act’s transparency and accountability requirements for agents that interact with users on a substantive basis are coming into force on a published timetable. The two-principal model and the audit schema are the artefacts that evidence the “human in the loop where required” obligation; without them, the evidence is reconstructive and brittle.

The pattern across all four: the regulators do not require this exact architecture, but they do require the artefacts this architecture produces. Building the artefacts deliberately, by way of an opinionated authorisation model, is materially cheaper than reverse-engineering them later under audit pressure.

The thesis, in one paragraph

The agent and the user are distinct principals. The agent presents a short-lived delegation token that is a strict reduction of the user’s grants, scoped to specific resources and specific actions, bound to the consent the user actually gave and to the context of the current session. The structural permissions live in a relationship store; the contextual constraints live in a policy engine; per-action consent gates the destructive operations; every decision lands in a tamper-evident audit log that records both principals. The platform buys the engines, builds the policy and the consent experience, and rolls the model out behind a conservative first release. That is the architecture that holds up under prompt injection, under regulatory inquiry, and under the operational pressure of an agent surface growing faster than the security review cycle that surrounds it. Anything less is a bet that the failures other platforms have publicly absorbed will not arrive on yours, and that bet has not aged well.

Get started

Bring AI risk under board oversight in two weeks.

A thirty-minute discovery call costs nothing. We confirm fit, scope, and timing, then issue a fixed-fee statement of work within two business days.