ai agent observability

see what your agents actually did

agentis is an observability platform for ai developers. it condenses noisy agent logs into readable snapshots, traces every execution path, and lets you ask an llm to debug what your agent actually did.

free while we’re in mvp. no card, one url to start sending logs.

agentis.ogbuilds.ai/agents

support-triage

snapshot timeline

  1. 11:58:00 AM7 steps

    classified ticket #4821 as billing, drafted a refund reply, escalated to a human for approval.

  2. 11:54:00 AMerror

    tool call to the crm timed out after 3 retries — could not load the customer record.

  3. 11:49:00 AM4 steps

    answered a password-reset question from the knowledge base, no escalation needed.

  4. 11:40:00 AMerror

    looped 6 times re-asking the same clarifying question before giving up — possible prompt issue.

  5. 11:33:00 AM6 steps

    routed an enterprise complaint to the priority queue and notified the on-call lead.

errorjun 19, 11:54

agent classified ticket #4825 as a billing dispute and loaded the customer, but the billing.getInvoices tool timed out after 3 retries. it fell back to a holding reply and escalated to a human.

agentVersion
1.4.2
model
claude-haiku-4-5
environment
production
ticketId
#4825
user
customer 88213

execution path

  1. action12ms

    received ticket #4825 from webhook

  2. llm call840ms

    classify intent → "billing dispute"

  3. tool call1.9s

    crm.getCustomer(id=88213)

  4. tool callerror30.0s

    billing.getInvoices(customer=88213)

    ETIMEDOUT after 3 retries — billing service unreachable

  5. decision

    retry policy exhausted → branch to fallback

  6. llm call760ms

    draft holding reply to customer

  7. action40ms

    escalate to human queue (priority: high)

llm debug

high confidence

the run failed because billing.getInvoices timed out (ETIMEDOUT) after 3 retries — the billing service was unreachable, not a bug in the agent's reasoning. the agent handled it correctly by falling back and escalating.

root cause: billing.getInvoices → ETIMEDOUT (30s, 3 retries) against https://billing.internal/invoices

  1. 1check whether the billing service was down or slow at 11:54 — this is an upstream dependency, not the agent
  2. 2lower the per-call timeout so the agent fails fast instead of burning 30s on a dead endpoint
  3. 3add a circuit breaker so repeated billing failures skip straight to the human queue

generated by claude-opus-4-8

raw logs

11:54:00 AM INFO action received ticket #4825 from webhook {"source":"zendesk","priority":"normal"}
11:54:01 AM INFO llm_call classify intent {"model":"claude-haiku","result":"billing dispute","tokens":412}
11:54:01 AM INFO tool_call crm.getCustomer(id=88213) {"ok":true,"ms":1900}
11:54:33 AM ERROR tool_call billing.getInvoices failed: ETIMEDOUT {"retries":3,"ms":30000,"endpoint":"https://billing.internal/invoices"}
11:54:33 AM WARN decision retry policy exhausted, switching to fallback path
the problem

agents fail in walls of logs

your agent calls a tool, retries, branches, calls an llm, branches again. when it goes wrong you scroll thousands of lines trying to reconstruct what happened. you can’t tell which step failed, why the model chose what it chose, or whether the next run will do the same thing.

how agentis helps

one readable snapshot per run

agentis collapses each run into a snapshot: a plain-language summary, the full execution path step by step, and the raw logs and context right there when you need them. when something breaks, ask the built-in llm to read the snapshot and tell you what went wrong.

condensed snapshots

noisy runs, distilled

every agent run becomes a snapshot you can actually read: a summary at the top, the steps below, errors flagged. scan your agents at a glance and drop into the one that's misbehaving.

  • a summary that says what the agent did in one line
  • errors and retries surfaced, not buried
  • history per agent so you can compare runs over time
agentis.ogbuilds.ai/agents

agents

3 agents · 2,105 snapshots this week

add agent
  • support-triage37 errors
    active 4d ago
    1,284
    snapshots
  • research-deepdive4 errors
    active 4d ago
    612
    snapshots
  • code-migrator19 errors
    active 4d ago
    209
    snapshots
execution path tracing

follow every step end to end

agentis traces the whole path: each tool call, each llm call, each branch and retry, in order, with timings. you see exactly where the run went sideways instead of guessing from timestamps.

  • the full call tree, in execution order
  • inputs, outputs and context for each step
  • the failing step marked so you start where it broke
agentis.ogbuilds.ai/agents/support-triage

support-triage

snapshot timeline

  1. 11:58:00 AM7 steps

    classified ticket #4821 as billing, drafted a refund reply, escalated to a human for approval.

  2. 11:54:00 AMerror

    tool call to the crm timed out after 3 retries — could not load the customer record.

  3. 11:49:00 AM4 steps

    answered a password-reset question from the knowledge base, no escalation needed.

  4. 11:40:00 AMerror

    looped 6 times re-asking the same clarifying question before giving up — possible prompt issue.

  5. 11:33:00 AM6 steps

    routed an enterprise complaint to the priority queue and notified the on-call lead.

errorjun 19, 11:54

agent classified ticket #4825 as a billing dispute and loaded the customer, but the billing.getInvoices tool timed out after 3 retries. it fell back to a holding reply and escalated to a human.

agentVersion
1.4.2
model
claude-haiku-4-5
environment
production
ticketId
#4825
user
customer 88213

execution path

  1. action12ms

    received ticket #4825 from webhook

  2. llm call840ms

    classify intent → "billing dispute"

  3. tool call1.9s

    crm.getCustomer(id=88213)

  4. tool callerror30.0s

    billing.getInvoices(customer=88213)

    ETIMEDOUT after 3 retries — billing service unreachable

  5. decision

    retry policy exhausted → branch to fallback

  6. llm call760ms

    draft holding reply to customer

  7. action40ms

    escalate to human queue (priority: high)

llm debug

high confidence

the run failed because billing.getInvoices timed out (ETIMEDOUT) after 3 retries — the billing service was unreachable, not a bug in the agent's reasoning. the agent handled it correctly by falling back and escalating.

root cause: billing.getInvoices → ETIMEDOUT (30s, 3 retries) against https://billing.internal/invoices

  1. 1check whether the billing service was down or slow at 11:54 — this is an upstream dependency, not the agent
  2. 2lower the per-call timeout so the agent fails fast instead of burning 30s on a dead endpoint
  3. 3add a circuit breaker so repeated billing failures skip straight to the human queue

generated by claude-opus-4-8

raw logs

11:54:00 AM INFO action received ticket #4825 from webhook {"source":"zendesk","priority":"normal"}
11:54:01 AM INFO llm_call classify intent {"model":"claude-haiku","result":"billing dispute","tokens":412}
11:54:01 AM INFO tool_call crm.getCustomer(id=88213) {"ok":true,"ms":1900}
11:54:33 AM ERROR tool_call billing.getInvoices failed: ETIMEDOUT {"retries":3,"ms":30000,"endpoint":"https://billing.internal/invoices"}
11:54:33 AM WARN decision retry policy exhausted, switching to fallback path
llm debug

ask the model what went wrong

point the built-in llm at a snapshot and it reads the path, the context and the error, then tells you the likely cause and what to change. it's a second pair of eyes on every failed run, on demand.

  • plain-language root-cause for a failed run
  • concrete suggestions you can act on
  • grounded in your real logs, not a generic guess
agentis.ogbuilds.ai/agents/support-triage

support-triage

snapshot timeline

  1. 11:58:00 AM7 steps

    classified ticket #4821 as billing, drafted a refund reply, escalated to a human for approval.

  2. 11:54:00 AMerror

    tool call to the crm timed out after 3 retries — could not load the customer record.

  3. 11:49:00 AM4 steps

    answered a password-reset question from the knowledge base, no escalation needed.

  4. 11:40:00 AMerror

    looped 6 times re-asking the same clarifying question before giving up — possible prompt issue.

  5. 11:33:00 AM6 steps

    routed an enterprise complaint to the priority queue and notified the on-call lead.

errorjun 19, 11:54

agent classified ticket #4825 as a billing dispute and loaded the customer, but the billing.getInvoices tool timed out after 3 retries. it fell back to a holding reply and escalated to a human.

agentVersion
1.4.2
model
claude-haiku-4-5
environment
production
ticketId
#4825
user
customer 88213

execution path

  1. action12ms

    received ticket #4825 from webhook

  2. llm call840ms

    classify intent → "billing dispute"

  3. tool call1.9s

    crm.getCustomer(id=88213)

  4. tool callerror30.0s

    billing.getInvoices(customer=88213)

    ETIMEDOUT after 3 retries — billing service unreachable

  5. decision

    retry policy exhausted → branch to fallback

  6. llm call760ms

    draft holding reply to customer

  7. action40ms

    escalate to human queue (priority: high)

llm debug

high confidence

the run failed because billing.getInvoices timed out (ETIMEDOUT) after 3 retries — the billing service was unreachable, not a bug in the agent's reasoning. the agent handled it correctly by falling back and escalating.

root cause: billing.getInvoices → ETIMEDOUT (30s, 3 retries) against https://billing.internal/invoices

  1. 1check whether the billing service was down or slow at 11:54 — this is an upstream dependency, not the agent
  2. 2lower the per-call timeout so the agent fails fast instead of burning 30s on a dead endpoint
  3. 3add a circuit breaker so repeated billing failures skip straight to the human queue

generated by claude-opus-4-8

raw logs

11:54:00 AM INFO action received ticket #4825 from webhook {"source":"zendesk","priority":"normal"}
11:54:01 AM INFO llm_call classify intent {"model":"claude-haiku","result":"billing dispute","tokens":412}
11:54:01 AM INFO tool_call crm.getCustomer(id=88213) {"ok":true,"ms":1900}
11:54:33 AM ERROR tool_call billing.getInvoices failed: ETIMEDOUT {"retries":3,"ms":30000,"endpoint":"https://billing.internal/invoices"}
11:54:33 AM WARN decision retry policy exhausted, switching to fallback path
1 url

to point your agent's logs at and start

every step

of a run traced from start to finish

free

for everything while we're in mvp

questions

the short answers

  • what is agent observability?
    it's being able to see what your ai agent actually did on a run — which tools it called, what the model decided, where it failed — instead of inferring it from raw logs. agentis turns that into snapshots and traces you can read.
  • how do i send logs to agentis?
    register an agent, get an api key, and post your agent's logs to a single ingest endpoint over plain http. agentis groups them into snapshots and traces the execution path for you.
  • which frameworks are supported?
    any of them. agentis ingests over a generic http api, so it doesn't care whether you're on langchain, a custom loop, or something you wrote this morning. if your agent can make a request, it can send logs.
  • do you use my logs to train models?
    no. your logs are yours. we use them to build your snapshots and to run the llm debug you explicitly ask for — never to train models.
  • is it really free?
    yes — the mvp is free while we build it out. no card to get started. we'll be clear and give plenty of notice before any of that changes.

start watching your agents

register an agent, point it at one endpoint, and read your first snapshot in minutes. free while we're in mvp.