Zero-Secret Cloud Access: How Credential Exchange Works

When you call hexr_tool("aws_s3"), you get back a fully authenticated boto3 S3 client — with no AWS access keys anywhere in your code, your environment, or your container image. Hexr exchanges your agent’s cryptographic SPIFFE identity for real, time-limited cloud credentials through a three-tier caching system that makes repeated calls nearly instantaneous.

How the cache works

The first time your agent calls a cloud tool in a session, Hexr performs a full credential exchange. Every subsequent call returns from cache:

# This is all you write:
s3 = hexr_tool("aws_s3")
bucket = s3.list_buckets()

# What actually happens:
# 1. Check in-memory cache (L1) → ~0.001ms
# 2. Check Valkey cache (L2)    → ~1-3ms
# 3. Full credential exchange (L3) → ~50-200ms
#    JWT-SVID → OPA check → STS AssumeRoleWithWebIdentity

L1: In-memory cache

~0.001ms latency. Credentials live in the Python process’s memory using ContextVar-based storage. The TTL equals the credential expiry minus a 10-minute buffer. The cache is cleared when the process restarts.

L2: Valkey distributed cache

~1-3ms latency. Shared across all pods in your cluster (3-node HA). Key format: cred:{spiffe-id}:{service}:{region}. If another agent on the same cluster already fetched S3 credentials for your role, your call returns immediately from this cache.

L3: Full credential exchange

~50-200ms latency. The complete round-trip: Agent → Envoy (mTLS) → Credential Injector → OPA policy check → Cloud STS AssumeRoleWithWebIdentity. Happens only on first access or after credential expiry.

On a cache hit, the call returns in microseconds (L1) or low single-digit milliseconds (L2). The 50-200ms exchange only happens on first access or after credential expiry.

The full exchange flow

When both L1 and L2 miss — on first access or after expiry — here’s the complete sequence:

hexr_tool('aws_s3') called

Your agent calls hexr_tool("aws_s3"). The SDK checks L1 (memory) — miss. Checks L2 (Valkey) — miss.

Request sent via Envoy

The agent sends POST /exchange {service: "aws_s3"} to the Envoy sidecar on localhost. Envoy adds your X.509-SVID as the client certificate and forwards the request over mTLS to the Credential Injector in hexr-system.

JWT-SVID verification

The Credential Injector verifies your agent’s JWT-SVID via the SPIRE Workload API, confirming that the calling process has a legitimate SPIFFE identity.

OPA policy check

The Credential Injector queries OPA: {spiffe_id, service: "aws_s3", tenant: "acme-corp"}. OPA evaluates your Rego policies and returns ALLOW or DENY. If denied, the call fails immediately with a policy error.

Cloud STS exchange

The Credential Injector calls AssumeRoleWithWebIdentity on AWS STS, presenting your JWT-SVID as the web identity token. AWS trusts Hexr’s OIDC endpoint and returns temporary credentials.

Credentials cached and client returned

{AccessKeyId, SecretAccessKey, SessionToken} with a 15-minute TTL flows back through Envoy to your agent. The SDK stores them in L1 and L2 cache, then creates and returns an authenticated boto3.client('s3').

Supported cloud providers

AWS

Exchange: JWT-SVID → STS AssumeRoleWithWebIdentityServices: S3, EC2, DynamoDB, SQS, Lambda, Bedrock, and any AWS SDK service.Credential TTL: 15 minutes (configurable up to 12 hours)

GCP

Exchange: JWT-SVID → Workload Identity Federation → Service Account tokenServices: BigQuery, Cloud Storage, Vertex AI, Pub/Sub, and any Google Cloud API.Credential TTL: 1 hour

Azure

Exchange: JWT-SVID → Federated Token → Managed Identity tokenServices: Blob Storage, Cosmos DB, Azure OpenAI, and any Azure SDK service.Credential TTL: 1 hour

Multi-cloud in one agent

A single agent can access multiple cloud providers simultaneously. Each tool call goes through its own exchange and cache entry:

@hexr_agent(
    name="multi-cloud-analyst",
    tenant="acme-corp",
    resources=["aws_s3", "gcp_bigquery", "azure_storage"]
)
def analyze():
    # Each call goes through the same credential exchange
    # but targets a different cloud provider's STS
    s3 = hexr_tool("aws_s3")              # → AWS STS
    bq = hexr_tool("gcp_bigquery")        # → GCP WIF
    blob = hexr_tool("azure_storage")     # → Azure Federated Token

    # All three clients are authenticated and ready
    data = bq.query("SELECT * FROM dataset.table")
    s3.put_object(Bucket="results", Key="output.json", Body=data)

OPA policy enforcement

Every credential exchange is gated by an OPA policy check. You can write Rego policies that control which agent roles can access which services:

# Example policy: only allow S3 access for data-pipeline agents
package hexr.credentials

default allow = false

allow {
    input.service == "aws_s3"
    startswith(input.spiffe_id, "spiffe://hexr.cloud/agent/acme-corp/data-pipeline")
}

# Deny all EC2 access
deny {
    input.service == "aws_ec2"
}

Policies are distributed via Kubernetes ConfigMaps and reload within 30 seconds.

Proactive credential refresh

A background daemon proactively refreshes credentials before they expire, so your agents never hit a credential expiry error during a long-running task:

Property	Value
Check interval	Every 60 seconds
Refresh buffer	10 minutes before expiry
Behavior	Silent background refresh — no disruption to the agent

Timeline:
  T=0:00  → Credential issued (TTL: 15 min)
  T=4:00  → Background check: 11 min remaining (OK)
  T=5:00  → Background check: 10 min remaining (REFRESH!)
  T=5:01  → New credential fetched, cached in L1 + L2
  T=15:00 → Old credential would have expired (already replaced)

Observability

Every cache lookup and exchange emits OpenTelemetry spans you can inspect in Jaeger or Grafana:

Span: hexr.cache.lookup
  ├── tier: "L1" | "L2" | "L3"
  ├── hit: true | false
  ├── service: "aws_s3"
  └── duration_ms: 0.001 | 2.3 | 150

Span: hexr.credential.exchange
  ├── provider: "aws" | "gcp" | "azure"
  ├── service: "aws_s3"
  ├── spiffe_id: "spiffe://hexr.cloud/agent/..."
  └── duration_ms: 150

Grafana dashboards show cache hit rates, exchange latencies, and credential refresh patterns in real time.

Documentation Index

​How the cache works

​The full exchange flow

​Supported cloud providers

AWS

GCP

Azure

​Multi-cloud in one agent

​OPA policy enforcement

​Proactive credential refresh

​Observability

How the cache works

The full exchange flow

Supported cloud providers

Multi-cloud in one agent

OPA policy enforcement

Proactive credential refresh

Observability