Skip to content
IC Inline Code
All posts

Platform engineering

Digital employees on the platform: the eight integration decisions nobody briefs

When a business unit deploys a digital employee, the platform engineering team gets the bill, the audit findings, and the on-call ticket — usually without being involved in the decision. The integration decisions that protect both sides are not the ones the AI vendor's solution architect will brief you on.

Mathew Sayed Mathew Sayed
· · 13 min read

When a business unit deploys a digital employee — an agent that acts in your environment under its own credentials, completing tasks that previously required a human worker — the platform engineering team finds out later. Usually after the agent has been provisioned through procurement rather than identity, after data has flowed through it for a few weeks, and after the first audit question makes the deployment visible. The integration decisions that protect the platform team and the deploying business unit at the same time are not the ones the AI vendor’s solution architect will brief you on. They are the questions that surface the moment the deployment touches your IAM, your audit log, your cost centre, your secrets vault, and your incident response runbook — which is to say, the moment it stops being an experiment.

This post is the platform engineer’s view of the integration. Eight decisions, in roughly the order they have to be made, with the consequences of getting each one wrong.

We’ve written previously about the governance posture for digital employees and the authorisation surface that MCP introduces. This piece is the platform engineering layer that sits between those two — what the controls look like once they meet your actual infrastructure.

1. Identity: workload, not human

The most common digital employee deployment we audit was provisioned with a service account, which is the wrong answer for the same reasons it has been the wrong answer for human users since 2010. Service accounts are typically named generically, shared across more than one consumer, hard to attribute in logs, slow to rotate, and outside the joiner-mover-leaver process that governs every other identity in your environment.

The correct primitive is workload identity — an identity that authenticates as a non-human, with the cryptographic proof of identity that comes from the platform itself rather than a credential the agent holds. In Microsoft environments this is Entra workload identity (with federated credentials for non-Azure consumers), in AWS this is IAM roles with federation via OIDC, in GCP this is workload identity federation. Each lets the agent assert who it is without holding a long-lived secret, which removes the secret rotation question, the credential exfiltration concern, and the question of which agent took an action.

The follow-on decision is naming and lifecycle. We recommend treating each digital employee deployment as a discrete identity, named by purpose (agent-customer-onboarding-prod), tagged to a business owner, with a retention policy attached to the procurement record. When the contract terminates, the identity terminates. The same joiner-mover-leaver discipline that governs your human staff has to govern your digital ones.

The follow-on of the follow-on is the supervisor relationship — covered in the governance post — which lives at the identity layer because that is where the audit log will need to find it.

2. Scope: the privilege envelope

The scope conversation has the same shape as it does for human privileged access — defined access, time-bounded if possible, justified, logged — with two complications specific to agents.

First, agents amplify a scoping mistake. A human with broad access who slips up exposes the surface they happened to touch that day. An agent with broad access exercises the entire surface within its task envelope, every time. If the digital employee can read the entire CRM, every action it takes will read the entire CRM, and every prompt injection that successfully redirects it will redirect it across the entire CRM. Scope at the agent level is a load-bearing control in a way it is not at the human level.

Second, the scoping primitive most teams reach for — RBAC — is a coarse fit for what an agent actually does. Agents act on context, and context-aware authorisation is the natural shape of attribute-based access control. The agent that summarises a customer email may need to read the email and the related case record; it should not be able to read every case in the queue, even though the role-level RBAC permission usually grants it that. ABAC policies — commonly via policy-as-code or a vendor-native equivalent — close the gap.

For Microsoft 365 deployments, the practical scope question is sensitivity labels. Most organisations have sensitivity labels configured for human use; few have audited them against the agent’s effective access, and the Copilot data access surface is what humans have plus what humans usually do not bother to read. The agent will read the ten-year-old SharePoint site nobody visits anymore. Plan for it.

3. Secrets: ephemeral by default

If you have stood up workload identity correctly (decision 1), the secrets question is mostly resolved at the access tier. The agent does not hold a long-lived API key for the systems it calls; it presents a short-lived token derived from its workload identity, and the receiving system validates the token against the issuer.

The residual secret surface is whatever the agent needs that is not yet on workload identity — typically a SaaS API that still authenticates via API key, an internal service that has not adopted OIDC yet, or a vendor with its own bespoke credential model. Those secrets need to be in your central secrets vault (Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) and accessed by the agent at runtime, not bundled into the deployment manifest. The pattern is well-established for human-facing services and translates directly to agents.

The failure mode we see most often is the credential bundled into the agent’s runtime configuration — pasted into the YAML that ships with the deployment, hard-coded into the environment variable list, or stored in the agent framework’s “secrets” UI which is a config field with a slightly different label. The reason this keeps happening is that the agent vendor’s documentation usually shows it that way, and the platform team is not in the room when the deployment is configured. Make the secrets handoff a platform-team review gate. The cost of catching this on day one is trivial; the cost of finding it during an audit eighteen months later is materially higher.

This is the same pattern we wrote about in secrets sprawl in modern stacks — agents are not a new failure mode, they are a familiar failure mode at higher concurrency.

4. Observability: the agent-versus-human distinction

The audit log question for digital employees is not whether you have logs — most platforms log generously — but whether your logs allow you to reconstruct who (or what) took the action and on whose authority. The two things that have to be distinguishable in the same log line are: was this action taken by a human, an agent, or an agent acting on behalf of a human; and which one.

The schema decision matters. If your access logs look like user=alice@company action=read resource=/customers/4421, that is a fine human-action log and a useless agent-action log, because nothing in the line tells you whether alice@company was Alice or the customer onboarding agent operating under Alice’s delegation, or the customer onboarding agent operating under its own identity, or the customer onboarding agent that Alice just authorised to act on her behalf for the next thirty seconds. The CPS 230 audit reconstruction question — show me the action taken at 14:32 UTC by the system that made the decision in case 4421 — is genuinely hard to answer from a log without the agent distinction in the schema.

The schema we recommend has, at minimum:

  • actor.id — the identity that took the action
  • actor.typehuman, agent, or agent-on-behalf-of-human
  • actor.delegation_chain — present when an agent is acting on a human’s authority, the chain of consent
  • agent.deployment_id — when actor.type is agent or delegated, the specific deployment
  • agent.model.version — model version that produced the decision
  • agent.prompt_id — reference to the prompt template version used

This is not the OpenTelemetry span schema — those are application performance traces. This is your audit log, which is a different system, retained on a different cadence (typically seven years for financial services, longer if litigation hold applies). If your platform writes both into the same store, you will pay for retaining application performance data for seven years; this is rarely the intended outcome.

5. Lifecycle: agents are software artifacts

The mental model that fails most consistently is treating the digital employee as a configuration. Configurations are static, deployed once, occasionally updated. The agent is a software artifact — its prompt is code, its tool list is code, its model version is a dependency, its evaluation suite is a test suite, and the whole thing has versions, releases, rollbacks, and a deployment pipeline.

In practice that means:

  • The agent’s prompt is in source control. Changes go through code review.
  • The tool registry available to the agent is in source control, with policy attached to who can register a new tool.
  • The model version is pinned, not floated, in production. We will return to this in decision 7.
  • The agent has tests. The tests are evals — see our evals as a risk control post for what useful ones look like.
  • The deployment pipeline has the same gates as your other production software: review, build, test, staged rollout, observability, rollback. The same principles as our CI/CD post apply.
  • The agent has a version visible in the audit log (decision 4) so a regression can be tied to a release.

The temptation is to treat the agent’s prompt as content rather than code, particularly in low-code agent platforms where the prompt is edited in a UI. The audit consequence is that you cannot answer “what did the agent know on the fifth of March” because the prompt has been edited a dozen times since and there is no version history. The remedy is to treat prompt edits as deployments, version them, and attach the version to the audit log.

6. Change management: vendor model updates are dependencies

Every digital employee depends on a foundation model that the vendor controls. The vendor will update that model. Sometimes they will tell you. Sometimes they will tell you with notice you cannot meaningfully act on. Sometimes the update will materially change the agent’s behaviour on your specific use case, and the regression will surface in production before your evaluation suite finds it.

The platform engineering response is to treat vendor model versions like every other dependency:

Pin in production. If your agent vendor offers a way to pin to a specific model version, take it. If they do not, raise it as a contract issue. The default of “we’ll update you to the latest” is acceptable for an autocomplete plugin and unacceptable for a system that takes business actions on your behalf.

Update on a deliberate cadence. New model versions are tested in your non-production environment against your evaluation suite before being promoted. The cadence depends on your tolerance — most organisations should be on a quarterly review for non-critical agents and a per-update review for material ones.

Have a rollback plan. If the update degrades your agent’s performance, you should be able to revert to the prior version within a defined window. If your vendor only supports forward updates, you have a single point of failure that your CPS 230 scenario testing should be exercising.

The CPS 230 paragraph 53 minimum on notification of material change is the contract clause that makes this enforceable. Most vendor agreements we audit either do not have it or have it written in a way that excludes model updates from “material change”. Negotiate that clause. The runbook includes a worked clause you can lift.

7. Cost, quota, and fairness

Digital employees are unusually good at consuming resources unevenly. A misconfigured agent, a prompt regression, or a feedback loop with another agent can produce ten thousand model calls in an afternoon. The platform team will see the spike on the bill before the deploying business unit notices anything is wrong.

The platform-side controls are familiar from any multi-tenant infrastructure:

Per-deployment quota. Each digital employee deployment has a defined ceiling — calls per minute, total calls per day, dollar spend per month. Hitting the ceiling triggers an alert and either degrades the agent gracefully or stops it.

Per-action rate limits. Some actions cost more than others. A summary of an email is cheap; a research agent that opens twenty tabs and synthesises across them is not. Rate limits at the action level catch the second case before it becomes the bill.

Chargeback. Costs are attributed to the deploying business unit, not absorbed into the platform budget. This is partly an accounting discipline and mostly a behavioural one — business units that pay for their agents tend to scope them better.

Cost anomaly detection. A 5x spend day from a deployment that historically averages a known baseline is a signal. Most cloud cost platforms (AWS Cost Anomaly Detection, Azure Cost Management, GCP Cost Recommender) have this primitive built in; few teams have configured it for AI line items.

The reason this matters beyond the bill is that uncapped agents are an availability concern for the rest of your platform. If your agent is hitting your CRM API at high volume because of a runaway loop, your CRM rate limits will start firing, the agent’s actions will start failing, and the human users of the CRM will notice it before the agent’s supervisor does.

8. Incident response: agents in the playbook

The eighth decision is whether your incident response playbooks know that digital employees exist. For most platform teams we work with, the answer is no — the playbooks were written for human users, the SIEM detection content fires on human-shaped patterns, and the response steps assume a human attacker behind a compromised credential.

The agent-specific incident scenarios that need to be in the playbook:

Prompt injection that redirects the agent. The agent is doing what it was told by an attacker who got their content into the agent’s input. Detection: anomalous tool calls, calls outside the normal envelope, calls in unusual sequences. Response: revoke the agent’s session, freeze the deployment, retain the input that triggered the redirection.

Agent acting on stale or revoked authority. The human delegated authority has been revoked, but the agent has cached or buffered actions queued. Detection: agent actions taken after the principal’s session was revoked. Response: terminate any in-flight agent activity tied to the revoked principal.

Model misuse via the agent. The agent is being used to extract data via clever prompt construction (the “summarise everything you can read” pattern). Detection: read patterns that don’t match the agent’s task profile, particularly broad enumeration. Response: rate-limit, then audit the requesting principal.

Agent supply chain compromise. A model update, a prompt change pushed through the wrong gate, or a tool added to the agent’s registry that should not have been there. Detection: pipeline alerting, change auditing. Response: rollback the deployment and audit what the agent did between the change and the rollback.

This connects to our ransomware response and CPS 230 post on the broader incident decision frame — the digital employee scenarios are a subset of the operational risk scenarios you should be exercising in your scenario testing program.

What “ready” looks like

The shorthand we use in fractional engagements: a digital employee deployment is platform-ready when the answer to each of these is yes, with documented evidence:

  1. The agent has a workload identity, named by purpose, in your IAM
  2. Its access scope is documented, ABAC where possible, and reviewed quarterly
  3. Its credentials are short-lived; persistent secrets are vaulted
  4. The audit log distinguishes its actions from human actions, with the schema above
  5. The agent is versioned, tested, and deployed through a pipeline
  6. The model version is pinned and updated deliberately
  7. Per-deployment quotas, rate limits, and chargeback are configured
  8. Incident response playbooks have agent-specific scenarios

If the deployment is not yet at that bar — most are not, on first review — the runbook gives you the phased twelve-month plan for getting there, with the configuration baselines and the acceptance criteria for each phase. The free readiness checklist is the diagnostic version of the same controls, in eighteen lines.

The platform engineering team is the leverage point for digital employee governance. The deployment that arrives without their involvement will eventually arrive at their desk anyway, in the form of an audit finding, a cost spike, an outage, or a regulator query. The work is cheaper before that.

Get started

Bring AI risk under board oversight in two weeks.

A thirty-minute discovery call costs nothing. We confirm fit, scope, and timing, then issue a fixed-fee statement of work within two business days.