Zot
monitor: scanning · last scan 22s ago

Somewhere in your logs is the reason your next customer leaves.

Zot reads your agent's traces, analyzes every session, and flags hallucinated policies, tool loops, and blown latency budgets. P0s go straight to Slack; everything else lands in the dashboard.

6 trace backends first scan reaches back 24h P0→P3 severity live in 5 min
findings · last 24h
P0
Acme Assistant invented a 90-day refund policy
KB lookup (kb_returns) returned no hit; the model filled the gap. Actual policy is 30 days.
hallucinationconf: high
P1
Order-lookup tool timing out under EU peak
order_lookup exceeded its 8s budget on 12% of sessions during the EU morning peak.
tool_errorconf: high
P1
Acme Assistant cited a non-existent KB article
Referenced KB article #4471 for a warranty question. No such article exists; the citation was fabricated.
hallucinationconf: medium
P3
Overly verbose greetings inflating token cost
~180 tokens of pleasantries before any useful content, across 388 chats.
verbosityconf: low
Reads your traces from Langfuse SigNoz Datadog Elasticsearch LangSmith Helicone

It starts with one bad interaction.
Fix your agent before customers churn.

“70% of consumers would switch brands after just one bad AI experience.”

Acquire BPO, 2024

// the loop

Poll. Analyze. Alert. Around the clock.

No SDK to install, no code to change. Point Zot at the tracing backend you already run. The first scan starts within seconds of connecting and reaches back 24 hours.

01 / HARVEST

Pull the traces

Zot polls each connected backend and normalizes every provider's traces (Langfuse, SigNoz, Datadog, Elastic, LangSmith, Helicone) into one canonical record. Seen-id dedupe means nothing gets processed twice.

02 / ANALYZE

Run the analyst

Deterministic detectors flag cost, latency, errors, tool-cycling and context stuffing. Then Zot reads the conversation itself for hallucinations, RAG misses, off-topic drift, stale info and frustrated users, each with a confidence score.

03 / ALERT

Surface the finding

Each issue becomes a priority-classified finding (P0–P3) with a title, the evidence, and a recommended fix. P0s hit Slack the moment they're found; everything else queues in the dashboard for triage.

// detectors

The failures you can't see in a dashboard of green checkmarks.

Your agent returns a 200 and a confident paragraph. Whether that paragraph was true is a different question. Zot answers it.

P2hallucination

Made-up facts & policies

The KB lookup missed and the model invented an answer. Zot grounds every specific claim (numbers, policies, citations) against what retrieval actually returned.

P0agent failure

Hard failures

Error rate at 50%+ with no final response, fatal execution or delivery failures. Posted to Slack the moment they're detected.

P1operational

Cost & latency spikes

Runs over $10, sessions over 120s, error rate above 15%, attributed to the dominant component. Thresholds are tunable per team.

P1tool_cycling

Tool loops & context stuffing

The same tool called with the same args three-plus times, or per-step input tokens doubling. A stuck agent burning money.

P2rag_miss

Retrieval misses

The same chunks fetched again and again, self-reported "I couldn't find that", or stale info surfacing over a newer authoritative source.

P1user_frustrated

Thumbs-down & frustration

Every thumbs-down gets a tailored LLM diagnosis, not just a counter. Self-reported refusals get categorized too.

// why zot

Your best eval set is already in production.

Eval suites and prompt optimizers run at dev time, on a dataset someone has to hand-build and maintain. Zot starts from the incident.

Dev-time eval & prompt tools
  • Run before launch, against test cases
  • Need a dataset someone hand-builds, then keeps current
  • Catch the failures you already thought to check for
Zot
  • Starts from the production incident itself
  • Ships a recommended fix with every finding, in the same Slack thread as the alert
  • Files every failure by category and topic, so you accumulate a labeled record of what your agent actually gets wrong
// delivery

It lands where your team already is.

A P0 in Slack the moment it's found. The dashboard for the full incident, the transcript, and the trace.

#agent-alerts · real-time P0
Zot
Zot APP 9:41 AM
🔴 P0: Acme Assistant invented a 90-day refund policy
On a returns question the assistant told the customer Acme offers a 90-day no-questions-asked refund. The actual policy (kb_returns) is 30 days. The KB lookup returned no hit and the model filled the gap.
Fix: Ground refund answers in kb_returns; refuse to state a policy when the lookup misses instead of guessing.
trace tr_3f9c…2a11  ·  session sess_3f9c2a  ·  conf high
@-mention Zot in the thread to ask follow-ups. It pulls the trace and answers.
app.zot.bot · overview
1
P0
3
P1
388
chats
8
findings
findings / hrP1 spike · EU peak
Filter by priority, confidence, category. Drill into the session timeline, the transcript, and LLM spend on the cost dashboard.
// ask zot

Talk to your own production data.

Ask Zot lives in Slack and the dashboard. It queries recent findings, searches live traces, and pulls up the exact thumbs-down that's bugging you. It only reports what its tools return; if they come back empty, it says so.

YOU
What were the worst P0s in the last day?
ZOT
One P0 in the last 24h: Acme Assistant invented a 90-day refund policy. The kb_returns lookup missed and the model guessed. I pulled the trace: the retrieval step returned zero chunks. Recommend grounding policy answers and refusing on a miss.
YOU
Show me the thumbs-down from this morning.
ZOT
3 thumbs-down since 6am. Top comment: "this isn't my order." All three trace back to order_lookup timing out under EU peak, the same root cause as the open P1.
// read-only · secure · live in 5 minutes

If you can't shake the feeling
that something's off.

Built for engineers watching agents, users, and prod. Conversational setup, a test-connection check that verifies your keys before saving, and a first scan that reaches back 24 hours. No deploy, no SDK.