Skip to main content
Every time you wrap an LLM client with hexr_llm(), Hexr automatically instruments it with OpenTelemetry spans that capture token counts, cost in USD, latency, model version, and the agent role that made the call. No additional configuration is needed — the traces flow to Grafana the moment your agent is deployed. This guide shows what gets captured, how costs are attributed per agent and role, and how to compare providers side by side.

Zero-config tracing

Wrap any LLM client with hexr_llm() and every call generates a full OpenTelemetry span automatically:
my_agent.py
from hexr import hexr_agent, hexr_llm

@hexr_agent(name="my-agent", tenant="acme-corp")
def main():
    # This call generates a full OpenTelemetry span automatically
    response = hexr_llm(
        provider="openai",
        model="gpt-4o",
        prompt="Summarize the latest AI research",
    )
No exporters to configure, no sampling rules to write, no SDKs to initialize.

What gets captured

Each LLM span includes the following attributes:
AttributeExample value
gen_ai.systemopenai
gen_ai.request.modelgpt-4o
gen_ai.response.modelgpt-4o-2024-08-06
gen_ai.usage.input_tokens152
gen_ai.usage.output_tokens487
gen_ai.usage.total_tokens639
hexr.agent.namemy-agent
hexr.agent.tenantacme-corp
hexr.agent.roleresearcher
hexr.llm.cost_usd0.0047
hexr.llm.duration_ms1234
Hexr follows the OpenTelemetry GenAI Semantic Conventions, so traces are compatible with any OTel-compatible backend — Datadog, New Relic, Honeycomb, or Grafana Cloud.

Cost attribution per agent and role

Costs are tracked per agent, per role, and per model. For a CrewAI crew running researcher, writer, and editor roles, you’d see a breakdown like this in Grafana:
Cost attribution example
Agent: content-crew (tenant: acme-corp)
├── researcher
│   ├── gpt-4o:      $2.34 (1,247 calls)
│   └── claude-3.5:  $1.12 (423 calls)
├── writer
│   ├── gpt-4o:      $3.67 (2,100 calls)
│   └── gpt-4o-mini: $0.23 (890 calls)
└── editor
    └── gpt-4o:      $0.89 (312 calls)

Total: $8.25 / 4,972 calls

Grafana dashboard

The built-in LLM Costs dashboard shows:
  • Token usage over time (input vs. output)
  • Cost per tenant (bar chart)
  • Model distribution (pie chart)
  • Latency percentiles (p50, p95, p99)
  • Error rate by provider

Comparing providers

Track the same prompt across multiple providers. Each call is traced separately so you can compare cost and latency in Grafana:
provider_comparison.py
from hexr import hexr_agent, hexr_llm

@hexr_agent(name="provider-comparison", tenant="acme-corp")
def main():
    providers = [
        ("openai", "gpt-4o"),
        ("anthropic", "claude-sonnet-4-20250514"),
        ("google", "gemini-1.5-pro"),
    ]

    for provider, model in providers:
        response = hexr_llm(
            provider=provider,
            model=model,
            prompt="Explain quantum computing in one paragraph",
        )
        # Each call traced separately — compare cost/latency in Grafana
Use the Model distribution panel in Grafana to spot which models are driving the most cost across your tenant, then optimize by routing cheaper tasks to smaller models.

Next steps

Multi-framework agents

See how per-role cost attribution works in CrewAI and LangChain deployments.

Secure secrets

Store LLM provider API keys with SPIFFE-scoped access and a full audit trail.

Agent-to-agent communication

Trace task delegation across A2A agents alongside LLM spans.

SDK reference

Full reference for hexr_llm and supported providers.