GitHunt
KP

kpx-dev/strands-agentcore-async-demo

Demo to show Strands AgentCore Async

AI Agent Async

Shows how to run async operations with Strands SDK and AWS Bedrock AgentCore using Lambda durable functions for zero-cost wait. Native macOS SwiftUI client, all communication over a single WebSocket.

Architecture

Architecture

The Problem

MCP tool calls are synchronous: the agent calls a tool and blocks until it gets a response. But what if the tool needs to:

  • Send a command to the user's local machine (no response needed)
  • Query real-time status from the client and wait for the answer
  • Collect logs from the client for the agent to analyze

The Solution

A single Lambda durable function (lambda_mcp_proxy_server.py) implements all three MCP tools. The AgentCore Gateway has one target per tool (Gateway limitation: 1 tool_schema per target), but they all point to the same Lambda. Argument-based routing determines which handler runs.

The entire Lambda is decorated with @durable_execution from the AWS Durable Execution SDK. This means:

  • All three tools run inside a durable execution context
  • run_client_command simply doesn't use the DurableContext — it returns immediately
  • get_vpn_status and analyze_logs use context.create_callback() + callback.result() to suspend at zero cost
Pattern Tool Uses Durable? Wait Time How It Works
Fire-and-forget run_client_command No 0s Push command via WebSocket, return immediately
Durable callback (fast) get_vpn_status Yes ~1-2s Suspend at zero cost, client responds instantly via callback
Durable callback (slow) analyze_logs Yes ~5-30s Suspend at zero cost, client collects logs and sends back via callback
@durable_execution
def handler(event: dict, context: DurableContext):
    if "command" in event:
        result = handle_run_client_command(event)          # fire-and-forget
    elif "query" in event and event["query"] == "vpn_status":
        result = handle_get_vpn_status(event, context)     # durable callback
    elif "log_type" in event:
        result = handle_analyze_logs(event, context)        # durable callback
    return {"content": [{"type": "text", "text": json.dumps(result)}]}

Each AgentCore Gateway Target maps to one tool but points to the same Lambda alias. The Lambda infers which tool was called by checking the arguments.

Components

Component Lambda / Service Purpose
WebSocket API Gateway Single client-facing endpoint for all communication
AgentCore Runtime Hosts the Strands agent (pay-per-use container, per-session instances)
AgentCore Gateway (MCP) Exposes tools via MCP protocol with Cognito OAuth
Cognito OAuth client_credentials flow for Gateway auth
WS Manager lambda-ws-manager Handles all WS routes: chat, callback, connect/disconnect
MCP Proxy Server lambda-mcp-proxy-server Lambda durable function — implements all 3 MCP tools with argument-based routing
DynamoDB ws-connections Tracks WebSocket connections by session_id (GSI)
CloudWatch + X-Ray OTEL traces, metrics, GenAI observability dashboard

Pattern 1: Fire-and-Forget (run_client_command)

run_client_command Flow

Does NOT use Lambda durable functions. The handler runs inside the @durable_execution decorator but never calls context.create_callback(), so it behaves like a normal Lambda invocation.

What it does

Pushes a command to the user's macOS client via WebSocket. Returns immediately with an acknowledgement. The client executes the command locally (e.g., launches Cisco Secure Client to connect VPN).

Supported commands

Command Client action
connect_vpn Opens Cisco Secure Client and initiates VPN connection via AppleScript
disconnect_vpn Disconnects VPN via Cisco CLI

Step-by-step

User: "Help me connect to my VPN"

1. Client sends WS message: {action: "chat", prompt: "Help me connect to my VPN", session_id: "abc-123"}
2. lambda-ws-manager receives chat route
3. ws-manager calls InvokeAgentRuntime(runtimeSessionId=session_id)
4. Agent gets OAuth token from Cognito for Gateway auth
5. Agent calls run_client_command(command="connect_vpn", session_id="abc-123") via MCP
6. Gateway invokes lambda-mcp-proxy-server (durable function, but no callback used)
7. MCP Lambda queries ws-connections by session_id, calls PostToConnection
8. Client receives WS push: {event: "client_action", data: {action: "run_client_command", command: "connect_vpn"}}
9. macOS client launches Cisco Secure Client via AppleScript and clicks Connect
10. Lambda returns {status: "sent_to_client"} to agent — tool call complete

Total tool execution time: <1 second. The agent doesn't wait for the VPN to actually connect.


Pattern 2: Durable Callback — Fast (get_vpn_status)

get_vpn_status Flow

Uses Lambda durable functions. The handler creates a callback, pushes a status request to the client via WebSocket, then suspends at zero cost until the client responds.

What it does

Queries the real-time VPN connection state from the user's macOS client. The client runs vpn state via the Cisco Secure Client CLI and sends the result back through the durable callback.

Step-by-step

Phase 1 — Agent calls tool, durable function suspends

Step From → To What happens
1 macOS Client → WS API GW Client sends: {action: "chat", prompt: "What's my VPN status?", session_id: "abc-123"}
2 WS API GW → lambda-ws-manager Routes chat action to ws-manager
3 lambda-ws-manager → AgentCore Runtime Calls InvokeAgentRuntime(runtimeSessionId=session_id)
4 AgentCore Runtime → Cognito Agent fetches OAuth token
5 AgentCore Runtime → AgentCore Gateway Agent calls get_vpn_status(query="vpn_status", session_id="abc-123") via MCP — synchronous blocking call, agent waits for the tool to return.
6 AgentCore Gateway → lambda-mcp-proxy-server Gateway invokes the durable function
7 lambda-mcp-proxy-server Calls context.create_callback(name="vpn-status-{id}") — generates a callback_id
8 lambda-mcp-proxy-server → DynamoDB Queries GSI by session_id to find client's connection_id
9 lambda-mcp-proxy-server → WS API GW Pushes {action: "get_vpn_status", callback_id: "cb-xxx"} to client
10 lambda-mcp-proxy-server Calls callback.result()SUSPENDS at zero cost

Phase 2 — Client responds, durable function resumes

Step From → To What happens
11 macOS Client Runs vpn state via Cisco CLI, gets "connected" or "disconnected"
12 macOS Client → WS API GW Sends: {action: "async_callback", callback_id: "cb-xxx", content: "{\"vpn_status\": \"connected\"}"}
13 WS API GW → lambda-ws-manager Routes async_callback to ws-manager
14 lambda-ws-manager → Lambda API Calls SendDurableExecutionCallbackSuccess(CallbackId=callback_id, Result=payload)
15 lambda-mcp-proxy-server RESUMEScallback.result() returns the VPN status
16 lambda-mcp-proxy-server → Agent Returns {status: "connected", message: "VPN status retrieved from client."}

Total tool execution time: ~1-2 seconds. The client responds almost instantly since it just runs a CLI command.

Timeout chain

The synchronous blocking at step 5 means multiple timeouts are in play simultaneously:

Layer Timeout What happens when exceeded
WebSocket API GW integration 29 seconds (AWS hard limit) Client receives "Endpoint request timed out" — but the ws-manager Lambda keeps running
WS Manager Lambda 300 seconds (5 min) Lambda terminates — but AgentCore Runtime + durable function keep running
AgentCore Runtime → Gateway (MCP call) Blocks until Lambda returns Strands Agent instance is busy; concurrent call on same session returns error from agent code (not platform)
MCP Proxy Lambda (durable execution) 900 seconds (15 min) Durable function times out — callback is abandoned

For get_vpn_status, the client responds in ~1-2 seconds so none of these timeouts are hit. For analyze_logs (5-30s), the 29s WebSocket timeout can be exceeded — the ws-manager Lambda continues running behind the scenes and delivers the response when ready.

How SendDurableExecutionCallbackSuccess works

The ws-manager can't use the standard boto3 SDK method (it may not be available in the Lambda runtime yet). Instead, it signs and sends a raw HTTP request:

# POST /2025-12-01/durable-execution-callbacks/{CallbackId}/succeed
url = f"https://lambda.{region}.amazonaws.com/2025-12-01/durable-execution-callbacks/{encoded_cb}/succeed"
aws_req = AWSRequest(method="POST", url=url, data=result_payload, headers={"Content-Type": "application/octet-stream"})
SigV4Auth(credentials, "lambda", region).add_auth(aws_req)

Pattern 3: Durable Callback — Slow (analyze_logs)

analyze_logs Flow

Uses Lambda durable functions. Same pattern as get_vpn_status but the client takes longer to respond because it collects logs from multiple sources.

What it does

Requests logs from the user's macOS client for the agent to analyze. The client collects VPN logs, network diagnostics, or system logs depending on the log_type parameter, then sends them back via the durable callback. The agent receives the raw logs inline and analyzes them using Claude Sonnet.

Supported log types

log_type What the client collects
vpn Cisco Secure Client state + stats (vpn state, vpn stats)
network Network interfaces (ifconfig) + routing table (netstat -rn)
system macOS system logs filtered for VPN processes (log show --predicate ...)

Step-by-step

Phase 1 — Agent calls tool, durable function suspends

Same as get_vpn_status (steps 1-10), except:

  • Step 5: Agent calls analyze_logs(log_type="vpn", session_id="abc-123")
  • Step 9: Pushes {action: "request_logs", log_type: "vpn", callback_id: "cb-xxx"} to client

Phase 2 — Client collects logs (5-30 seconds)

Step From → To What happens
11 macOS Client Runs log collection commands locally. For vpn: runs vpn state + vpn stats. For network: runs ifconfig + netstat -rn. For system: queries macOS unified log.
12 macOS Client → WS API GW Sends: {action: "async_callback", callback_id: "cb-xxx", content: "<log data up to 8KB>"}

Phase 3 — Resume + agent analysis

Step From → To What happens
13-15 Same as get_vpn_status ws-manager calls SendDurableExecutionCallbackSuccess, durable function resumes
16 lambda-mcp-proxy-server → Agent Returns {status: "logs_received", log_data: "...", message: "Please analyze the log_data for issues."}
17 Agent → Bedrock (Claude Sonnet) Agent feeds the raw logs to the LLM for analysis — identifies errors, warnings, root causes, and provides actionable recommendations
18 Agent → Client Final response pushed via WebSocket with the analysis

Total tool execution time: ~5-30 seconds depending on log collection. The durable function is suspended at zero cost during this entire wait.


Why Lambda Durable Functions?

Approach Wait Cost Max Wait Complexity
Lambda sleep Lambda billing 15 min Low
SQS long poll Lambda billing 20s per call Medium
DDB polling ~$0.00001/poll Unlimited Medium
Step Functions waitForTaskToken $0 1 year High (SFN + dispatcher + analyzer Lambdas)
Lambda durable callback $0 15 min Low (single Lambda)

Lambda durable functions replace the previous Step Functions approach. Instead of needing a state machine + dispatcher Lambda + analyzer Lambda + task-tokens DynamoDB table, the entire async flow lives in a single Lambda function. The durable function suspends at zero cost when calling callback.result() and resumes when SendDurableExecutionCallbackSuccess is called.

Terraform configuration

The Lambda is configured with durable_config to enable durable execution:

resource "aws_lambda_function" "mcp_proxy" {
  function_name = "${var.stack_name}-mcp-proxy"
  runtime       = "python3.13"
  timeout       = 300
  memory_size   = 512

  durable_config {
    execution_timeout = 900  # 15 min max durable execution
    retention_period  = 1    # 1 day checkpoint retention
  }
}

Gateway targets must point to a Lambda alias (qualified ARN), not the unqualified function:

resource "aws_lambda_alias" "mcp_proxy_prod" {
  name             = "prod"
  function_name    = aws_lambda_function.mcp_proxy.function_name
  function_version = aws_lambda_function.mcp_proxy.version
}

Project Structure

strands-agentcore-agent/
  strands_agentcore_agent.py  # Strands agent (AgentCore Runtime + MCP Gateway)
  Dockerfile                  # Container image for AgentCore deployment
  requirements.txt            # Python dependencies

client-macos/
  Sources/
    AIAgentAsyncApp.swift     # App entry point
    ContentView.swift         # SwiftUI chat interface
    ChatViewModel.swift       # Message handling, VPN commands (AppleScript), log collection
    WebSocketManager.swift    # WebSocket connection + durable callback responses
    Models.swift              # Data models
    Config.swift              # WebSocket URL + session configuration

lambda-mcp-proxy-server/
  lambda_mcp_proxy_server.py  # MCP tool handler — Lambda durable function
                              # (argument-based routing for all 3 tools)

lambda-ws-manager/
  lambda_ws_manager.py        # All WS routes: chat, async_callback, connect/disconnect
                              # Retry logic for busy AgentCore sessions
                              # Raw SigV4-signed SendDurableExecutionCallbackSuccess

scripts/
  build-image.sh              # Docker build via CodeBuild

terraform/
  agentcore.tf                # AgentCore Runtime + endpoint
  gateway.tf                  # AgentCore Gateway + Cognito OAuth + MCP tool targets
  websocket.tf                # WebSocket API GW (the only client-facing API)
  lambda.tf                   # WS Manager Lambda + MCP Proxy Lambda (durable function)
  dynamodb.tf                 # ws-connections table with session_id GSI
  s3.tf                       # Agent source code S3 bucket
  ecr.tf                      # ECR repository for agent Docker image
  codebuild.tf                # CodeBuild project for image builds
  iam.tf                      # IAM roles and policies
  observability.tf            # CloudWatch dashboards, X-Ray, alarms
  vpc.tf                      # VPC + subnets (ElastiCache)
  elasticache.tf              # Redis for agent session memory
  variables.tf                # Input variables
  outputs.tf                  # Output values (URLs, ARNs)
  versions.tf                 # Provider versions and config
  terraform.tfvars.example    # Example variable values

generated-diagrams/           # Architecture and flow diagrams (PNG)

Prerequisites

  • AWS account with Bedrock model access for us.anthropic.claude-sonnet-4-20250514-v1:0
  • Python 3.10+
  • Terraform >= 1.6
  • Swift 5.9+ and Xcode (for the macOS client)
  • Docker (for building the agent container image)
  • AWS CLI configured with appropriate credentials

Setup

macOS Client

cd client-macos
# Edit Sources/Config.swift with your WebSocket URL and Cognito settings
swift run

On first run, the app will request Accessibility permissions (System Settings → Privacy & Security → Accessibility) for VPN AppleScript automation.

Production Deployment

cd terraform

# First time only — copy and edit variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your AWS account details, stack name, etc.

terraform init
terraform plan
terraform apply

After terraform apply, the outputs will include:

  • WebSocket URL (wss://...)
  • AgentCore Runtime endpoint
  • Cognito client credentials
  • ECR repository URI

To build and push the agent Docker image:

bash scripts/build-image.sh

License

MIT

kpx-dev/strands-agentcore-async-demo | GitHunt