Skip to main content
Hexr’s observability layer instruments your agents automatically. Every tool call, LLM request, credential exchange, and agent-to-agent message generates OpenTelemetry spans and metrics — without any configuration or extra code from you. The @hexr_agent decorator sets up all OTel providers at decoration time, so adding observability is as simple as using the SDK.

How telemetry flows

All telemetry from your agents and platform services flows through a single OpenTelemetry Collector, then routes to dedicated backends:
SourceWhat it emits
Python SDK (hexr_llm, hexr_tool, @hexr_agent)Agent spans, LLM metrics, tool invocations
Envoy proxiesmTLS metrics, connection counts, TLS handshake latency
A2A sidecarsTask lifecycle, message throughput, SSE connections
Platform services (Vault, Gateway, Credential Injector)Operation rates, latency, error counts

Automatic instrumentation

You get complete observability without writing any instrumentation code. The SDK emits spans for every operation:
@hexr_agent(name="analyst", tenant="acme")
def analyze(topic: str):
    # Span: hexr.agent.invoke (auto)

    s3 = hexr_tool("aws_s3")
    # Span: hexr.tool.invoke {service: aws_s3}
    # Span: hexr.cache.lookup {tier: L1|L2|L3}
    # Span: hexr.credential.exchange (if cache miss)

    client = hexr_llm(openai.OpenAI())
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Analyze {topic}"}]
    )
    # Span: hexr.llm.chat {model: gpt-4o, tokens_in: 42, tokens_out: 256}

    secret = hexr.vault.get("openai/api-key")
    # Span: hexr.vault.get {path: openai/api-key}

    return response
Zero configuration. All OTel providers are set up by @hexr_agent at decoration time.

Trace spans

Every SDK operation generates a span you can inspect in Jaeger with full agent identity context:
Span nameKey attributesSource
hexr.agent.invokeagent_name, tenant, framework, status@hexr_agent decorator
hexr.tool.invokeservice, region, cache_tierhexr_tool()
hexr.cache.lookuptier (L1/L2/L3), hit, duration_msCredential cache
hexr.credential.exchangeprovider, service, spiffe_idCredential Injector client
hexr.llm.chatgen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokenshexr_llm() proxy
hexr.vault.getpath, tenanthexr.vault module
hexr.gateway.calltool_name, argumentshexr.gateway module
hexr.a2a.client.sendtarget_agent, task_id, task_stateA2AClient
hexr.a2a.bridge.executesource_agent, task_idA2A bridge
hexr.sandbox.execlanguage, timeout, exit_codehexr.sandbox
hexr.browser.browseurl, actions_counthexr.browser
hexr.guard.scanscan_type (prompt/output), is_validhexr.guard

LLM Guard span attributes

When LLM Guard blocks a prompt or response, additional attributes are set on the parent hexr.llm.chat span:
AttributeTypeDescription
hexr.guard.prompt_blockedbooltrue if the input prompt was blocked
hexr.guard.scannersstringScanner results that triggered the block
hexr.guard.output_blockedbooltrue if the LLM response was blocked
hexr.guard.output_scannersstringScanner results that triggered the output block
Blocked requests set the span status to ERROR with a description of which guard triggered.

Metrics

Agent metrics

MetricTypeDescription
hexr.agent.invocationsCounterTotal agent invocations
hexr.agent.activeUpDownCounterCurrently active invocations
hexr.agent.durationHistogramInvocation duration in seconds

Tool and credential metrics

MetricTypeDescription
hexr.tool.invocationsCounterTotal tool calls by service
hexr.tool.durationHistogramTool call duration
hexr.cache.hitsCounterCache hits by tier (L1/L2/L3)
hexr.cache.missesCounterCache misses
hexr.cache.lookup.durationHistogramCache lookup latency
hexr.credential.exchangesCounterFull credential exchanges
hexr.credential.failuresCounterFailed exchanges

LLM metrics

MetricTypeDescription
hexr.llm.callsCounterTotal LLM API calls
hexr.llm.call_errorsCounterFailed LLM calls
hexr.llm.call.durationHistogramLLM call latency
hexr.llm.input_tokensCounterTotal input tokens consumed
hexr.llm.output_tokensCounterTotal output tokens generated

A2A metrics

MetricTypeDescription
hexr.a2a.sendsCounterMessages sent
hexr.a2a.send_failuresCounterFailed sends
hexr.a2a.send.durationHistogramSend latency
hexr.a2a.bridge.executionsCounterBridge handler calls

LLM Guard metrics

MetricTypeDescription
hexr_guard_scans_totalCounterTotal scans by direction (input/output) and scanner
hexr_guard_blocks_totalCounterTotal blocks by direction and scanner
hexr_guard_scan_duration_secondsHistogramScan latency by direction

Pre-built Grafana dashboards

Hexr ships with two Grafana dashboards covering 42 panels out of the box — no setup required.

Platform overview (23 panels)

Covers system-wide health across all your agents:
  • Agent pod status and container health
  • Credential exchange rates and cache hit ratios
  • mTLS connection counts and TLS handshake latency
  • SPIRE entry counts and SVID rotation rates
  • OTel Collector throughput (traces/sec, metrics/sec)
  • Vault operation rates and latency
  • Gateway tool invocation rates

A2A communication (19 panels)

Covers inter-agent messaging for multi-agent workflows:
  • Task lifecycle — submitted → working → completed / failed
  • Message throughput per agent pair
  • Task duration histograms
  • SSE streaming connection counts
  • Valkey task store operations
  • Error rates by task state transition
  • Cross-namespace communication patterns

GenAI semantic conventions

hexr_llm() follows the OpenTelemetry GenAI semantic conventions, so your traces are compatible with any OTel-native LLM observability tool:
AttributeExample value
gen_ai.systemopenai, anthropic, google_genai, cohere, mistral
gen_ai.request.modelgpt-4o, claude-3-opus, gemini-pro
gen_ai.response.modelgpt-4o-2024-08-06
gen_ai.usage.input_tokens1200
gen_ai.usage.output_tokens800
gen_ai.response.idchatcmpl-abc123
gen_ai.response.finish_reasons["stop"]

Prometheus scrape targets

Prometheus scrapes metrics from your agent pods and all Hexr platform services automatically. The pre-configured targets include:
TargetMetrics
Agent pods (per tenant)Task lifecycle, message throughput
Credential InjectorExchange rates, OPA decisions
GatewayTool calls, import counts
VaultSecret operations, encryption
OTel CollectorCollector health, pipeline stats