AI Agent Async
Shows how to run async operations with Strands SDK and AWS Bedrock AgentCore using Lambda durable functions for zero-cost wait. Native macOS SwiftUI client, all communication over a single WebSocket.
Architecture
The Problem
MCP tool calls are synchronous: the agent calls a tool and blocks until it gets a response. But what if the tool needs to:
- Send a command to the user's local machine (no response needed)
- Query real-time status from the client and wait for the answer
- Collect logs from the client for the agent to analyze
The Solution
A single Lambda durable function (lambda_mcp_proxy_server.py) implements all three MCP tools. The AgentCore Gateway has one target per tool (Gateway limitation: 1 tool_schema per target), but they all point to the same Lambda. Argument-based routing determines which handler runs.
The entire Lambda is decorated with @durable_execution from the AWS Durable Execution SDK. This means:
- All three tools run inside a durable execution context
run_client_commandsimply doesn't use theDurableContext— it returns immediatelyget_vpn_statusandanalyze_logsusecontext.create_callback()+callback.result()to suspend at zero cost
| Pattern | Tool | Uses Durable? | Wait Time | How It Works |
|---|---|---|---|---|
| Fire-and-forget | run_client_command |
No | 0s | Push command via WebSocket, return immediately |
| Durable callback (fast) | get_vpn_status |
Yes | ~1-2s | Suspend at zero cost, client responds instantly via callback |
| Durable callback (slow) | analyze_logs |
Yes | ~5-30s | Suspend at zero cost, client collects logs and sends back via callback |
@durable_execution
def handler(event: dict, context: DurableContext):
if "command" in event:
result = handle_run_client_command(event) # fire-and-forget
elif "query" in event and event["query"] == "vpn_status":
result = handle_get_vpn_status(event, context) # durable callback
elif "log_type" in event:
result = handle_analyze_logs(event, context) # durable callback
return {"content": [{"type": "text", "text": json.dumps(result)}]}Each AgentCore Gateway Target maps to one tool but points to the same Lambda alias. The Lambda infers which tool was called by checking the arguments.
Components
| Component | Lambda / Service | Purpose |
|---|---|---|
| WebSocket API Gateway | — | Single client-facing endpoint for all communication |
| AgentCore Runtime | — | Hosts the Strands agent (pay-per-use container, per-session instances) |
| AgentCore Gateway (MCP) | — | Exposes tools via MCP protocol with Cognito OAuth |
| Cognito | — | OAuth client_credentials flow for Gateway auth |
| WS Manager | lambda-ws-manager |
Handles all WS routes: chat, callback, connect/disconnect |
| MCP Proxy Server | lambda-mcp-proxy-server |
Lambda durable function — implements all 3 MCP tools with argument-based routing |
DynamoDB ws-connections |
— | Tracks WebSocket connections by session_id (GSI) |
| CloudWatch + X-Ray | — | OTEL traces, metrics, GenAI observability dashboard |
Pattern 1: Fire-and-Forget (run_client_command)
Does NOT use Lambda durable functions. The handler runs inside the @durable_execution decorator but never calls context.create_callback(), so it behaves like a normal Lambda invocation.
What it does
Pushes a command to the user's macOS client via WebSocket. Returns immediately with an acknowledgement. The client executes the command locally (e.g., launches Cisco Secure Client to connect VPN).
Supported commands
| Command | Client action |
|---|---|
connect_vpn |
Opens Cisco Secure Client and initiates VPN connection via AppleScript |
disconnect_vpn |
Disconnects VPN via Cisco CLI |
Step-by-step
User: "Help me connect to my VPN"
1. Client sends WS message: {action: "chat", prompt: "Help me connect to my VPN", session_id: "abc-123"}
2. lambda-ws-manager receives chat route
3. ws-manager calls InvokeAgentRuntime(runtimeSessionId=session_id)
4. Agent gets OAuth token from Cognito for Gateway auth
5. Agent calls run_client_command(command="connect_vpn", session_id="abc-123") via MCP
6. Gateway invokes lambda-mcp-proxy-server (durable function, but no callback used)
7. MCP Lambda queries ws-connections by session_id, calls PostToConnection
8. Client receives WS push: {event: "client_action", data: {action: "run_client_command", command: "connect_vpn"}}
9. macOS client launches Cisco Secure Client via AppleScript and clicks Connect
10. Lambda returns {status: "sent_to_client"} to agent — tool call complete
Total tool execution time: <1 second. The agent doesn't wait for the VPN to actually connect.
Pattern 2: Durable Callback — Fast (get_vpn_status)
Uses Lambda durable functions. The handler creates a callback, pushes a status request to the client via WebSocket, then suspends at zero cost until the client responds.
What it does
Queries the real-time VPN connection state from the user's macOS client. The client runs vpn state via the Cisco Secure Client CLI and sends the result back through the durable callback.
Step-by-step
Phase 1 — Agent calls tool, durable function suspends
| Step | From → To | What happens |
|---|---|---|
| 1 | macOS Client → WS API GW | Client sends: {action: "chat", prompt: "What's my VPN status?", session_id: "abc-123"} |
| 2 | WS API GW → lambda-ws-manager |
Routes chat action to ws-manager |
| 3 | lambda-ws-manager → AgentCore Runtime |
Calls InvokeAgentRuntime(runtimeSessionId=session_id) |
| 4 | AgentCore Runtime → Cognito | Agent fetches OAuth token |
| 5 | AgentCore Runtime → AgentCore Gateway | Agent calls get_vpn_status(query="vpn_status", session_id="abc-123") via MCP — synchronous blocking call, agent waits for the tool to return. |
| 6 | AgentCore Gateway → lambda-mcp-proxy-server |
Gateway invokes the durable function |
| 7 | lambda-mcp-proxy-server |
Calls context.create_callback(name="vpn-status-{id}") — generates a callback_id |
| 8 | lambda-mcp-proxy-server → DynamoDB |
Queries GSI by session_id to find client's connection_id |
| 9 | lambda-mcp-proxy-server → WS API GW |
Pushes {action: "get_vpn_status", callback_id: "cb-xxx"} to client |
| 10 | lambda-mcp-proxy-server |
Calls callback.result() — SUSPENDS at zero cost |
Phase 2 — Client responds, durable function resumes
| Step | From → To | What happens |
|---|---|---|
| 11 | macOS Client | Runs vpn state via Cisco CLI, gets "connected" or "disconnected" |
| 12 | macOS Client → WS API GW | Sends: {action: "async_callback", callback_id: "cb-xxx", content: "{\"vpn_status\": \"connected\"}"} |
| 13 | WS API GW → lambda-ws-manager |
Routes async_callback to ws-manager |
| 14 | lambda-ws-manager → Lambda API |
Calls SendDurableExecutionCallbackSuccess(CallbackId=callback_id, Result=payload) |
| 15 | lambda-mcp-proxy-server |
RESUMES — callback.result() returns the VPN status |
| 16 | lambda-mcp-proxy-server → Agent |
Returns {status: "connected", message: "VPN status retrieved from client."} |
Total tool execution time: ~1-2 seconds. The client responds almost instantly since it just runs a CLI command.
Timeout chain
The synchronous blocking at step 5 means multiple timeouts are in play simultaneously:
| Layer | Timeout | What happens when exceeded |
|---|---|---|
| WebSocket API GW integration | 29 seconds (AWS hard limit) | Client receives "Endpoint request timed out" — but the ws-manager Lambda keeps running |
| WS Manager Lambda | 300 seconds (5 min) | Lambda terminates — but AgentCore Runtime + durable function keep running |
| AgentCore Runtime → Gateway (MCP call) | Blocks until Lambda returns | Strands Agent instance is busy; concurrent call on same session returns error from agent code (not platform) |
| MCP Proxy Lambda (durable execution) | 900 seconds (15 min) | Durable function times out — callback is abandoned |
For get_vpn_status, the client responds in ~1-2 seconds so none of these timeouts are hit. For analyze_logs (5-30s), the 29s WebSocket timeout can be exceeded — the ws-manager Lambda continues running behind the scenes and delivers the response when ready.
How SendDurableExecutionCallbackSuccess works
The ws-manager can't use the standard boto3 SDK method (it may not be available in the Lambda runtime yet). Instead, it signs and sends a raw HTTP request:
# POST /2025-12-01/durable-execution-callbacks/{CallbackId}/succeed
url = f"https://lambda.{region}.amazonaws.com/2025-12-01/durable-execution-callbacks/{encoded_cb}/succeed"
aws_req = AWSRequest(method="POST", url=url, data=result_payload, headers={"Content-Type": "application/octet-stream"})
SigV4Auth(credentials, "lambda", region).add_auth(aws_req)Pattern 3: Durable Callback — Slow (analyze_logs)
Uses Lambda durable functions. Same pattern as get_vpn_status but the client takes longer to respond because it collects logs from multiple sources.
What it does
Requests logs from the user's macOS client for the agent to analyze. The client collects VPN logs, network diagnostics, or system logs depending on the log_type parameter, then sends them back via the durable callback. The agent receives the raw logs inline and analyzes them using Claude Sonnet.
Supported log types
| log_type | What the client collects |
|---|---|
vpn |
Cisco Secure Client state + stats (vpn state, vpn stats) |
network |
Network interfaces (ifconfig) + routing table (netstat -rn) |
system |
macOS system logs filtered for VPN processes (log show --predicate ...) |
Step-by-step
Phase 1 — Agent calls tool, durable function suspends
Same as get_vpn_status (steps 1-10), except:
- Step 5: Agent calls
analyze_logs(log_type="vpn", session_id="abc-123") - Step 9: Pushes
{action: "request_logs", log_type: "vpn", callback_id: "cb-xxx"}to client
Phase 2 — Client collects logs (5-30 seconds)
| Step | From → To | What happens |
|---|---|---|
| 11 | macOS Client | Runs log collection commands locally. For vpn: runs vpn state + vpn stats. For network: runs ifconfig + netstat -rn. For system: queries macOS unified log. |
| 12 | macOS Client → WS API GW | Sends: {action: "async_callback", callback_id: "cb-xxx", content: "<log data up to 8KB>"} |
Phase 3 — Resume + agent analysis
| Step | From → To | What happens |
|---|---|---|
| 13-15 | Same as get_vpn_status |
ws-manager calls SendDurableExecutionCallbackSuccess, durable function resumes |
| 16 | lambda-mcp-proxy-server → Agent |
Returns {status: "logs_received", log_data: "...", message: "Please analyze the log_data for issues."} |
| 17 | Agent → Bedrock (Claude Sonnet) | Agent feeds the raw logs to the LLM for analysis — identifies errors, warnings, root causes, and provides actionable recommendations |
| 18 | Agent → Client | Final response pushed via WebSocket with the analysis |
Total tool execution time: ~5-30 seconds depending on log collection. The durable function is suspended at zero cost during this entire wait.
Why Lambda Durable Functions?
| Approach | Wait Cost | Max Wait | Complexity |
|---|---|---|---|
| Lambda sleep | Lambda billing | 15 min | Low |
| SQS long poll | Lambda billing | 20s per call | Medium |
| DDB polling | ~$0.00001/poll | Unlimited | Medium |
| Step Functions waitForTaskToken | $0 | 1 year | High (SFN + dispatcher + analyzer Lambdas) |
| Lambda durable callback | $0 | 15 min | Low (single Lambda) |
Lambda durable functions replace the previous Step Functions approach. Instead of needing a state machine + dispatcher Lambda + analyzer Lambda + task-tokens DynamoDB table, the entire async flow lives in a single Lambda function. The durable function suspends at zero cost when calling callback.result() and resumes when SendDurableExecutionCallbackSuccess is called.
Terraform configuration
The Lambda is configured with durable_config to enable durable execution:
resource "aws_lambda_function" "mcp_proxy" {
function_name = "${var.stack_name}-mcp-proxy"
runtime = "python3.13"
timeout = 300
memory_size = 512
durable_config {
execution_timeout = 900 # 15 min max durable execution
retention_period = 1 # 1 day checkpoint retention
}
}Gateway targets must point to a Lambda alias (qualified ARN), not the unqualified function:
resource "aws_lambda_alias" "mcp_proxy_prod" {
name = "prod"
function_name = aws_lambda_function.mcp_proxy.function_name
function_version = aws_lambda_function.mcp_proxy.version
}Project Structure
strands-agentcore-agent/
strands_agentcore_agent.py # Strands agent (AgentCore Runtime + MCP Gateway)
Dockerfile # Container image for AgentCore deployment
requirements.txt # Python dependencies
client-macos/
Sources/
AIAgentAsyncApp.swift # App entry point
ContentView.swift # SwiftUI chat interface
ChatViewModel.swift # Message handling, VPN commands (AppleScript), log collection
WebSocketManager.swift # WebSocket connection + durable callback responses
Models.swift # Data models
Config.swift # WebSocket URL + session configuration
lambda-mcp-proxy-server/
lambda_mcp_proxy_server.py # MCP tool handler — Lambda durable function
# (argument-based routing for all 3 tools)
lambda-ws-manager/
lambda_ws_manager.py # All WS routes: chat, async_callback, connect/disconnect
# Retry logic for busy AgentCore sessions
# Raw SigV4-signed SendDurableExecutionCallbackSuccess
scripts/
build-image.sh # Docker build via CodeBuild
terraform/
agentcore.tf # AgentCore Runtime + endpoint
gateway.tf # AgentCore Gateway + Cognito OAuth + MCP tool targets
websocket.tf # WebSocket API GW (the only client-facing API)
lambda.tf # WS Manager Lambda + MCP Proxy Lambda (durable function)
dynamodb.tf # ws-connections table with session_id GSI
s3.tf # Agent source code S3 bucket
ecr.tf # ECR repository for agent Docker image
codebuild.tf # CodeBuild project for image builds
iam.tf # IAM roles and policies
observability.tf # CloudWatch dashboards, X-Ray, alarms
vpc.tf # VPC + subnets (ElastiCache)
elasticache.tf # Redis for agent session memory
variables.tf # Input variables
outputs.tf # Output values (URLs, ARNs)
versions.tf # Provider versions and config
terraform.tfvars.example # Example variable values
generated-diagrams/ # Architecture and flow diagrams (PNG)
Prerequisites
- AWS account with Bedrock model access for
us.anthropic.claude-sonnet-4-20250514-v1:0 - Python 3.10+
- Terraform >= 1.6
- Swift 5.9+ and Xcode (for the macOS client)
- Docker (for building the agent container image)
- AWS CLI configured with appropriate credentials
Setup
macOS Client
cd client-macos
# Edit Sources/Config.swift with your WebSocket URL and Cognito settings
swift runOn first run, the app will request Accessibility permissions (System Settings → Privacy & Security → Accessibility) for VPN AppleScript automation.
Production Deployment
cd terraform
# First time only — copy and edit variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your AWS account details, stack name, etc.
terraform init
terraform plan
terraform applyAfter terraform apply, the outputs will include:
- WebSocket URL (
wss://...) - AgentCore Runtime endpoint
- Cognito client credentials
- ECR repository URI
To build and push the agent Docker image:
bash scripts/build-image.shLicense
MIT



