AI · Architecture
Long-term memory for agnostic agents: a working architecture on Bedrock AgentCore
Bedrock AgentCore gives you a managed runtime, a long-term memory store, and a deliberately framework- and model-agnostic SDK — which means the agent code stays portable while the memory plane becomes the new audit liability. Here is the architecture that uses AgentCore Memory properly, the governance controls memory-poisoning and Privacy Act obligations force on top of it, and the rollout cadence that keeps the deployment defensible.
Most agent deployments that reach production stall in the same place. The pilot worked because every session was self-contained — the agent read the user’s prompt, called a few tools, returned an answer, and forgot the encounter as soon as the response landed. The production requirement breaks that assumption. The agent has to remember the user’s preferences across sessions, recall the outcomes of previous workflows, learn from interactions, and behave in a way that is recognisably continuous to the human on the other side of the conversation. Memory is what closes that gap, and memory is also where the deployment quietly acquires a new attack surface and a new audit liability that the original threat model did not contemplate.
AWS released Amazon Bedrock AgentCore — first announced at AWS Summit New York in mid-2025 and now broadly available — as a managed control plane for the parts of an agent deployment that are operationally undifferentiated. The piece most teams reach for first is AgentCore Memory, the managed short-term and long-term memory service. The piece most teams under-appreciate is that AgentCore is deliberately framework- and model-agnostic — the runtime, memory, identity, and gateway primitives are designed to host an agent written in Strands, LangGraph, CrewAI, LlamaIndex, Google ADK, the OpenAI Agents SDK, or any custom framework, running on any foundation model inside or outside Bedrock. The “Show & Tell” walkthrough AWS published on building your first production-ready AI agent with AgentCore makes that posture explicit: you do not adopt AgentCore by rewriting your agent in an AWS-native framework. You adopt it by wrapping the agent you already have in AgentCore’s runtime contract.
That posture changes the architectural conversation. The question is no longer “do I rewrite my agent in Strands or stay on LangGraph?” — both are first-class on AgentCore. The question is what to put on the managed plane, what to keep portable, and how to design the memory model so it is operationally defensible under prompt injection, Privacy Act access requests, and the cross-tenant leakage tests that mid-market deployments eventually face.
This piece is the architecture we recommend for organisations adopting AgentCore for agent workloads — particularly in regulated environments. It assumes a working knowledge of agent frameworks and the AWS account model. It covers the AgentCore Memory data model, the strategies that populate the long-term store, the agnostic-agent integration pattern, the governance controls that the memory plane forces on top, and the rollout cadence that lets the deployment land without the failures other teams have already absorbed.
Why memory is harder than the agent
A stateless agent has a small and well-understood threat model. The user supplies a prompt, the agent reasons over it with tools, and the result returns. Adversarial input has one place to land — the prompt and the tool outputs the agent reads during the current invocation. Once the session ends, the influence is gone. The audit story is simple: every action is a function of the inputs visible in the session record.
A stateful agent breaks that model in three ways at once.
Influence has a longer half-life. Content that enters the agent’s long-term memory store — extracted facts, summarised episodes, user preference inferences — informs future sessions. An adversarial instruction embedded in a document the agent read on Tuesday can shape the agent’s behaviour on Thursday, in a session the Tuesday content is no longer visible in. Prompt injection becomes memory poisoning, and the half-life is no longer “until end of session” — it is “until the memory record is invalidated or the agent stops trusting that strategy.”
The memory store accumulates sensitive data the system was not designed to hold. A semantic extraction strategy that pulls factual content from conversations will, by default, pull whatever factual content the conversation contains — including the personal information the user happened to mention, the contractual terms the user pasted from a sensitive document, the internal identifiers the user used to refer to a system. The memory record becomes a derived data store whose contents are a function of what users happened to say, which is rarely what the data minimisation policy assumed.
The audit question gets harder. For a stateless agent, the auditor’s question — “why did the agent do that?” — is answered by inspecting the session. For a stateful agent, the answer requires reconstructing the long-term memory state at the moment of the decision. If the memory store is not append-only and the extraction process is not traceable, the question is not answerable with the evidence the system retains.
These are the three pressures the architecture has to absorb. AgentCore’s memory primitives give you the shape to absorb them. They do not absorb them on your behalf.
What AgentCore Memory actually is
AgentCore Memory exposes two distinct layers — short-term memory and long-term memory — and a small number of organisational primitives that determine how data flows through them. Both layers live behind a single managed API; the difference is what they store and how they are populated.
Short-term memory is the system of record for conversation events. Every turn the agent sees — the user message, the assistant response, tool calls, tool results, system events — is written synchronously to AgentCore Memory as an immutable event, scoped by an actor identifier and a session identifier. The actor identifier names the entity the conversation is with (typically the end user, but possibly another agent or a system); the session identifier scopes a coherent unit of interaction. Each actor has isolated event storage, which is the first structural defence against cross-actor leakage.
The short-term store is the answer to the operational question “what did the agent see?” — it is the conversation log, complete and tamper-evident, with ordering preserved. For the audit conversation, this is the artefact. For the agent runtime, the short-term store is also the source of context the agent retrieves on each new turn — list-events queries return recent raw context, summaries return condensed session history.
Long-term memory is the layer most discussions of agent memory mean when they say “memory.” It is populated asynchronously from the short-term store by memory strategies — declarative rules that govern what is extracted, how it is processed, and where it is stored. The strategies run in the background, using foundation-model invocations under the hood, and write derived records into a vector-indexed long-term store that supports semantic search at retrieval time.
The key architectural property is the asynchrony. The short-term write path is synchronous and fast — the agent’s turn-by-turn experience is not gated on long-term ingestion. The long-term records appear seconds to minutes after the event that produced them. That decoupling is what makes AgentCore Memory operationally viable; it is also what makes the governance controls non-trivial, because the extraction process is happening between when the agent saw the content and when the long-term consequence shows up in a future session.
The strategies that populate the long-term store
AgentCore ships four built-in strategy types and supports custom strategies for domain-specific extraction. The choice of which to enable is the single most consequential decision in the memory design, because it determines what derived state about the user and the work the agent will accumulate over time.
Semantic strategy. Extracts factual statements and contextual knowledge from conversations. The user mentioned their company is in the renewable energy sector; the user is the CFO; the user is currently working on a board paper for a March meeting. These become discrete factual records, vector-indexed, retrievable by semantic similarity at the start of future sessions. This is the strategy that delivers the “agent remembers what I told it” experience most users expect. It is also the strategy that quietly accumulates personal information across sessions, which is the Privacy Act exposure.
Summarization strategy. Compresses conversation history into running summaries. Where the semantic strategy stores discrete facts, the summarization strategy stores narrative context — what the previous session was about, what was decided, what remained open. Useful for agents that need to resume work where they left off without re-reading the full event log.
User preference strategy. Identifies and extracts preferences, choices, and stylistic decisions — the user prefers terse responses, prefers metric units, does not want suggestions about pricing. Builds a per-user profile that shapes the agent’s style and behaviour. Operationally the least sensitive of the four built-ins, because the data shape is bounded; the user is not casually pasting medical information into “preferences.”
Episodic strategy. Captures interactions as structured episodes — scenario, intent, thoughts, actions taken, outcome, artefacts produced. Closer in shape to a workflow trace than a conversation summary. Most useful for agents that need to learn from the success or failure of previous attempts at similar tasks — the agent recognises a new request as similar to an episode where a particular tool sequence worked, or where it failed in a recognisable way.
Custom strategies. Where the built-in strategies do not fit, you define your own extraction logic — a Lambda function or model-driven pipeline that consumes short-term events and writes long-term records in a shape specific to the domain. The platform allows one strategy of each built-in type and multiple custom strategies per memory resource, subject to the overall cap.
The design rule that holds: enable the smallest set of strategies that meets the product requirement. Every strategy you enable becomes a data store of derived facts about the user, with the data minimisation and access posture that implies. The reflex of enabling all four built-ins because the product roadmap “might need them” produces the largest derived-data footprint, the broadest deletion exposure, and the widest surface for memory poisoning — for the smallest marginal benefit at retrieval time.
Namespaces, actors, sessions — the organisational primitives
The control surface for memory is three concepts that compose: the actor, the session, and the namespace.
Actor. Identifies the entity interacting with the agent. Typically the end user; can also be another agent or a system. Short-term events are scoped by actorId + sessionId and the platform guarantees isolation — events for actor A are not retrievable in queries scoped to actor B. The actor identifier is the unit at which Privacy Act access and deletion requests operate; getting this scheme right is what makes those requests cheap to serve.
Session. Scopes a coherent unit of interaction within an actor. The session is the boundary at which short-term events are organised and at which summary records are produced; sessions can be ended, archived, and queried independently. For agents that operate in distinct task contexts — a contact-centre agent that handles multiple unrelated tickets for the same customer, a coding assistant that works on multiple repositories — the session is how those contexts stay distinguishable in the audit log.
Namespace. A logical grouping within long-term memory under which records produced by a strategy are saved. Namespaces let you separate, for example, semantic facts about the user from semantic facts about the work, even if both are populated by the same strategy. They are the operational primitive that determines what a retrieval query can see — a query targeting one namespace cannot accidentally pull records from another.
The discipline that makes these primitives work as a control plane: the namespace scheme is set deliberately at memory-resource creation, the actor identifier is always the authoritative user identity (never a derived or aliased identifier), and the session identifier is rotated at logical task boundaries rather than artificially extended. Teams that treat the session identifier as a connection identifier — same session for the lifetime of the user’s web session, regardless of what work happens inside it — collapse the audit story; they cannot answer “what did the agent see for this task” because the task boundary is not represented anywhere.
The agnostic posture: framework, model, and what it actually buys you
AgentCore is documented and marketed as both framework-agnostic and model-agnostic. Both claims are real, but they buy different things, and the difference matters for the architecture.
Framework-agnostic means the runtime accepts agents written in Strands Agents, LangGraph, CrewAI, LlamaIndex, Google ADK, the OpenAI Agents SDK, or any custom Python or Node framework. The contract is a small SDK that exposes the runtime’s primitives — memory, identity, gateway, observability — through framework-neutral primitives. You wrap your existing agent in the AgentCore SDK, deploy it to the runtime, and the framework choice becomes an internal implementation detail. The portability story this produces is the practical one: you can change agent frameworks without changing the deployment, the memory store, or the identity model. The investment in AgentCore is not an investment in any specific agent framework.
Model-agnostic means the runtime does not constrain which foundation model the agent uses. You can run an agent on Bedrock-hosted Claude, Amazon Nova, Llama, or Mistral; on OpenAI’s API; on Anthropic’s API direct; on Google’s Gemini. The agent’s model calls go wherever your framework directs them — AgentCore is concerned with the runtime, memory, identity, and observability layers around those calls, not with the call itself. This buys you the ability to switch models for cost, capability, or compliance reasons without rebuilding the surrounding infrastructure, and to run multi-model agents — a Claude planner with a Llama executor, for example — on a single managed plane.
What “agnostic” does not buy you, and this is the architecturally important caveat: portability of the memory data. The memory store lives in AgentCore, which lives in your AWS account, which is AWS-managed infrastructure. The agent code is portable; the agent’s accumulated memory of the user is not, at least not without a migration exercise. For most deployments this is the right trade — you wanted the managed memory plane precisely because building your own was a multi-quarter detour — but it should be a deliberate trade, not a discovery after the fact.
The agnostic posture also does not absolve the architecture of memory governance. The same controls apply regardless of which framework the agent is written in, because the controls operate at the AgentCore primitive layer (strategies, namespaces, actors), not at the framework layer. A LangGraph agent and a Strands agent that share an AgentCore Memory resource share the same exposure to memory poisoning, the same Privacy Act deletion obligations, and the same audit requirements. The framework choice is genuinely internal.
Reference architecture
The architecture is five components and a small number of well-defined boundaries.
Agent runtime. Your agent code, written in your framework of choice, deployed to AgentCore Runtime — the managed serverless runtime that supports long-running agent invocations and isolates sessions at the container level. The runtime is opaque to the framework; the agent calls memory, identity, and gateway through the AgentCore SDK regardless of how it is internally structured.
Memory plane. AgentCore Memory, configured with the smallest necessary set of strategies, a deliberate namespace scheme, and an actor identifier that is always the authoritative user identity. Short-term writes are synchronous; long-term extraction runs asynchronously in the background; retrieval is by list_events, get_summary, or semantic_search against named namespaces.
Identity plane. AgentCore Identity for both the workload identity of the agent and the OAuth/OIDC integration with downstream services. The two-principal model — the user is one principal, the agent is another — that we described in the agent authorisation post maps directly onto AgentCore Identity’s primitives. The agent has its own credentials, the user’s grants flow through delegation, and the audit log records both.
Tool plane. AgentCore Gateway for exposing internal APIs as MCP tools to the agent, with the authorisation surface and audit posture covered in the companion piece on MCP authorisation. The managed gateway means you stop running an MCP server per integration; the security review you still owe is the per-tool authorisation model, which the gateway does not decide for you.
Observability. AgentCore Observability emitting OpenTelemetry traces to CloudWatch, with the trace spans covering every memory read, every memory extraction, every tool invocation, and every model call. This is the artefact the auditor demands when a deployment produces an outcome someone disputes.
The boundaries that matter: the agent never bypasses the memory plane to read or write memory directly; every memory operation goes through the AgentCore SDK, which means every operation is logged. The strategy configuration is the only thing that decides what gets written to long-term memory; agents cannot write derived records on their own behalf. The actor identifier is set from the authenticated user identity at session start and is never overridden by the agent.
Memory governance: the five controls that are not optional
The architecture above is necessary and not sufficient. Five governance controls sit on top of it, and the deployment is not defensible without them.
Pre-extraction redaction. The semantic and episodic strategies will extract whatever the conversation contains. If the conversation contains primary keys, medical identifiers, financial account numbers, or other sensitive personal information, those will land in the long-term store. The defence is a pre-extraction pipeline that redacts or tokenises sensitive content before it reaches the strategy’s extraction model. For Australian deployments handling personal information under the Privacy Act, this is the difference between a memory store that can be defensibly retained and one that triggers an APP 11.1 data security obligation the design cannot meet.
Strategy filters. A custom strategy can encode the extraction rule the built-in strategies do not — extract facts about the work, not the user; never extract content from documents tagged with a specific sensitivity label; never extract content from a session where the consent state did not include long-term retention. Custom strategies are the operational lever that makes the long-term store comply with the platform’s data minimisation policy.
Deletion workflow. A Privacy Act access or correction request, an APP 12 deletion, or an enterprise customer’s right-to-be-forgotten clause requires you to remove a user’s information from both the short-term events and the long-term records. The actor identifier makes the scope of the query trivially answerable, but the deletion process needs to traverse short-term events, summaries derived from those events, semantic records that may have aggregated content from many sessions, and the embeddings indexed for retrieval. Building this workflow at the start, when you have one strategy enabled, is materially cheaper than building it after the deployment has four strategies populated across two years of users.
Memory poisoning defences. The same controls that defend against prompt injection in the immediate session apply, with extensions, to the long-term store. The pre-extraction pipeline filters obvious injection patterns. Strategy configuration excludes content from low-trust sources from extraction entirely — content from emails, attachments, or shared documents is read in-session but not extracted. The retrieval path applies a trust filter — long-term records produced from low-trust source content are surfaced to the agent with provenance metadata, and the agent’s prompt frames them as “the user previously said” rather than “it is a fact that.” This is also where the broader eval discipline earns its keep — the regression suite includes adversarial inputs designed to poison the memory and verifies that the agent’s behaviour in subsequent sessions is not coerced.
Cross-actor leakage tests. The platform guarantees actor isolation, but the application’s use of the actor identifier is the thing that has to be right. A regression suite that constructs scenarios where actor A’s content could leak into actor B’s session — same user identifier across tenants, mis-scoped retrieval queries, shared namespace records that should have been per-actor — and asserts that the leakage does not occur is the only way to know the application’s discipline matches the platform’s guarantee. Run this on every deploy. A failure is not a flaky test; it is a Privacy Act breach in slow motion.
What to build, what to buy
The architecture is mostly a buy decision at the infrastructure layer, and almost entirely a build decision at the policy layer.
Buy the runtime, the memory store, the identity primitives, the gateway, and the observability backbone. These are the components AgentCore is designed to provide and where the AWS investment compounds on a roadmap your team cannot match. Trying to build any of them — a managed long-running runtime, a vector-indexed memory store with automatic extraction strategies, an OAuth/OIDC plane wired to Cognito — is a multi-quarter detour for parity with a managed offering.
Build the namespace scheme, the strategy configuration, the pre-extraction redaction pipeline, the custom strategy logic, the deletion workflow, the audit aggregation across short-term events and long-term records, and the cross-actor regression suite. These are the components that encode your platform’s specific model of what should be remembered, what should not, and who can ask for it to be forgotten. They are also the components a regulator will read when the deployment is reviewed.
The line is sharper than for traditional cloud workloads because the managed primitives are more opinionated. You do not get to choose how AgentCore Memory stores its records; you do get to choose what goes into them. The build investment is in the policy and the data hygiene, and the build investment is what differentiates a defensible deployment from a fast one.
Rollout cadence
The rollout that lands and survives looks like this.
First quarter. One agent, one framework (whichever you already use; this is not the moment to swap to Strands “because AWS recommends it”), one foundation model, deployed to AgentCore Runtime. Short-term memory enabled. One long-term strategy — typically semantic — with a tight namespace scheme. Pre-extraction redaction in place for known sensitive content types. Deletion workflow implemented end-to-end and tested against a real test actor. Observability emitting to CloudWatch with traces visible. No production traffic yet — internal cohort only.
Day 30 after first ship. User preference strategy added if the product requires it. Custom strategy authoring environment in place for the first domain-specific extractor. Cross-actor regression suite running on every deploy. Privacy Act access request workflow tested with a real test user submitting a real request and the response returned within the policy SLA. Memory poisoning eval suite running with a baseline set of adversarial inputs.
Day 90. Episodic strategy added if the workflow learning requirement materialises. Multi-framework support exercised — at least one secondary agent in a different framework deployed to the same AgentCore plane, sharing identity and observability, to verify the agnostic posture in practice. Tool plane migrated to AgentCore Gateway with MCP authorisation controls in place. External customer pilot opened.
Day 180. Compliance mapping complete — short-term events archive policy, long-term record retention policy, deletion SLAs, and audit log federation all evidenced in writing. Multi-model exercise complete — at least one workload exercising a non-Bedrock model to verify the model-agnostic posture is real for your specific architecture, not just for AWS’s marketing surface. Production cutover with the residual risk position approved by the relevant risk committee.
The first quarter is deliberately narrow. The temptation to enable all four built-in strategies and run a multi-framework pilot in the first month is the temptation that produces the failures other teams have absorbed. A single strategy, with a deliberate namespace scheme and a working deletion workflow, is a defensible foundation. A four-strategy pilot with no deletion workflow is a future audit finding waiting for the breach notification email that surfaces it.
Where this lands against Australian regulation
For organisations operating in Australia, four regulatory hooks bear on the AgentCore Memory architecture.
Privacy Act and the post-2024 reforms. Memory strategies produce derived personal information, and that derived information sits within the scope of the Privacy Act in the same way as collected personal information. APP 5 notification obligations apply at collection — the user needs to be informed that their interactions will produce a persistent memory record. APP 11 security obligations apply to the memory store. APP 12 access and APP 13 correction obligations apply to both short-term events and long-term records, which is what makes the deletion workflow non-optional. The automated decision-making disclosure obligation under the post-2024 reforms applies whenever the agent’s action constitutes a substantially automated decision affecting the individual — and memory-shaped decisions are squarely within that scope.
APRA CPS 234. The long-term memory store is an information asset under CPS 234. The information security capability has to be sized to the criticality and sensitivity of the information held, the access controls have to be limited to what is necessary, and the testing program has to verify the controls work. The cross-actor regression suite is one piece of that evidence; the strategy filters and redaction pipeline are another; the observability trace is the third.
APRA CPS 230. Memory is now a third-party dependency for the agent service, even though it is delivered through your own AWS account. The CPS 230 operational resilience requirements — service definition, tolerance levels, scenario testing, business continuity — apply to the agent service as a whole, and the memory dependency is part of the service definition. The CPS 230 AI tooling checklist is the structured way to position this.
ISO 42001. Annex A’s controls on data management, transparency, accountability, and lifecycle apply to the memory plane. The strategy configuration is the data management evidence; the deletion workflow is the lifecycle evidence; the observability trace and audit aggregation are the transparency and accountability evidence. ISO 42001 readiness work maps cleanly onto the components of the reference architecture above; the work is not duplicative.
The pattern across all four is the same as for the rest of the AI estate. The regulators do not specify the exact architecture; they specify the artefacts the architecture has to produce. Building those artefacts deliberately, by way of an opinionated memory design, is materially cheaper than reverse-engineering them later under audit pressure.
A note on the “Strands or LangGraph” question
The framework choice that consumes a disproportionate share of architectural debate inside teams adopting AgentCore is the question of whether to standardise on Strands Agents — AWS’s own open-source framework, with a deliberately minimalist model — or on LangGraph, CrewAI, or another framework with broader open-source mindshare. The agnostic posture means this is a much smaller decision than the debate implies.
Three observations hold. First, the choice is reversible on AgentCore in a way it would not be on a framework-specific platform — you can run two frameworks on the same memory and identity plane, migrate one workload at a time, and converge on a single framework only once the production experience tells you which one fits. Second, the cost difference at the framework layer is small compared to the cost of the surrounding architecture; the memory configuration, the redaction pipeline, the deletion workflow, and the cross-actor regression suite cost the same in any framework. Third, the framework’s primary contribution is to the agent’s internal control flow — the loops, the tool dispatch, the planner-executor patterns. The contribution to the production properties — memory, identity, observability, deployment — is small once AgentCore is the substrate.
The pragmatic recommendation: pick the framework your team already knows; verify it works on AgentCore before committing; revisit the choice in the second year on the basis of operational evidence, not architectural taste.
The thesis, in one paragraph
Bedrock AgentCore gives you a managed runtime, a memory plane with short-term events and strategy-driven long-term records, an identity model that supports the two-principal pattern agent deployments need, and a gateway that subsumes the per-tool MCP server you would otherwise build. The framework- and model-agnostic posture means the agent code stays portable while the memory data acquires a managed home in your AWS account. The architectural work that matters is at the policy layer above the primitives: the smallest necessary set of memory strategies, a deliberate namespace and actor scheme, a pre-extraction redaction pipeline, custom strategies that encode the platform’s data minimisation policy, a deletion workflow that traverses both short-term events and long-term records, and a regression suite that asserts cross-actor isolation on every deploy. The regulatory artefacts — Privacy Act access responses, CPS 234 information security evidence, CPS 230 service definition, ISO 42001 data management documentation — fall out of the architecture if the architecture is right, and have to be reverse-engineered under audit pressure if it is not. The first quarter ships narrow, the rollout adds strategies and frameworks only as the operational evidence supports it, and the deployment that emerges is one the regulators, the customers, and the on-call engineers can all defend.
Continue reading
Related pieces
AI · Channel security
OpenClaw, WhatsApp and Telegram: the phone-linked AI agent threat model, the attacks already in the wild, and the gold-standard alternatives
OpenClaw lets anyone wire a personal WhatsApp or Telegram account to an AI agent in ten minutes. Bitsight found 30,000 instances exposed on the public internet in a fortnight. This is the architecture, the attacks, the config that breaks it, and the official-API pattern that holds.
3 June 2026
AI · Authorisation
OAuth scopes weren't built for AI agents: the delegation model that holds up under prompt injection
OAuth scopes assume a human approves once, an app does narrow work, and the trust horizon is months. AI agents break every part of that assumption. The architecture that holds is a two-principal model with short-lived delegation tokens, ReBAC for structure, ABAC for context, and per-action consent gating destructive operations. Here is the design and the rollout.
3 May 2026
Board reporting
Reporting AI risk to the board: a one-page position summary that actually works
What the board actually wants on the AI risk page is the answer to four specific questions. Most AI risk reports answer different questions. Here is the structure that lands, four worked examples by sector, and a template you can lift verbatim.
2 May 2026