Arize AI (@arizeai) / X

Arize AI

1,602 posts

Arize AI

@arizeai

The AI engineering platform for teams shipping reliable AI agents and LLM applications. Also home to @ArizePhoenix.

San Francisco, CA

Joined January 2020

Pinned
Arize AI
@arizeai
Jun 5
Observe 2026 is a wrap. Yesterday we shared what’s next for Arize AX and our vision for the AI factory for self-improving agents. The focus: helping teams turn production behavior into a repeatable loop for finding issues, investigating root cause, testing fixes, and improving
412K
Arize AI
@arizeai
Jun 19
Make sure your FIFA World Cup winner prediction agent isn't hallucinating! Ship agents that work. #AgentEvals #LLMHallucination #ArizeAI #FIFA2026 #WorldCup2026
00:00
266
Arize AI
@arizeai
Jun 18
When an agent fails, have you considered looking at the harness? The loop around the model decides how tasks are decomposed, how tools are called, how context is managed, how errors are recovered from, and what gets traced. @seldo explains why harnesses are replacing agent
What is an agent harness? Why harnesses are replacing agent frameworks
From arize.com
126K
Arize AI reposted
Elizabeth Hutton
@ehutt_
Jun 17
Yesterday, exactly one year after joining @arizeai, I got to speak about my work on agent evaluation at the @databricks AI Summit in San Francisco. Every seat was taken and there was a LINE of people standing at the back! I got great questions and had so many interesting
965
Arize AI
@arizeai
Jun 17
Recently @AnthropicAI shipped Dreams. @OpenAI shipped Dreaming V3. Same word, opposite architectures. A UIUC paper from @dylan_works_ landed the same week showing one of these patterns drops accuracy from 100% to 54% on ARC-AGI. One left itself an escape hatch. One did not.
Two labs started dreaming, and they built two different architectures
From arize.com
212
Arize AI reposted
Aparna Dhinakaran
@aparnadhinak
Jun 17
Article
When code costs nothing to produce, how do you review it all?
Recently, three economists tracked more than 100,000 GitHub developers, matched against telemetry showing exactly when each one adopted AI coding tools. Developers using autonomous coding agents wrote...
3K
Arize AI
@arizeai
Jun 16
Most agent orchestration debates are arguing about the wrong layer. 👀 Frameworks answer how agent control flow is expressed. Runtimes answer how agents recover, resume, and survive long tasks. Observability answers how teams find out what actually happened. @seldo explains why
What is agent orchestration? Frameworks, runtimes, and observability explained
From arize.com
131K
Arize AI
@arizeai
Jun 15
Cursor users! A dozen incredibly helpful Arize skills are now available directly in Cursor from the Agent Marketplace! Select "Customize" from the agents sidebar to see the marketplace and click to get them automatically installed. cursor.com/marketplace/ar…
390
Arize AI
@arizeai
Jun 15
Agent traces are most useful when you can join them with the rest of your data. Arize Data Fabric now supports @databricks, so teams can sync production traces, evals, and annotations into customer-owned storage, register them in Unity Catalog, and query them with lakehouse
Bring production agent traces from Arize into Databricks Unity Catalog
From arize.com
208K
Arize AI
@arizeai
Jun 15
Psst - we’re also going to be at the Data + AI Summit by @databricks! @ehutt_ (who's behind @ArizePhoenix) is speaking on Agent as a Judge, AI error analysis, and scaling evaluation for agent apps. RSVP here: app.ingo.me/q/gqiit #DataAISummit
Data + AI Summit 2026
From app.ingo.me
167
Arize AI
@arizeai
Jun 12
Three AI labs shipped something called "memory" this week. Apple paid Google a billion a year for one version of it. None of them is what users mean by the word. @jimbobbennett wrote a field map of the four kinds of memory shipping right now: → Retrieval, dressed up as memory
Memory is still a missing primitive: Cataloguing what the field is actually shipping
From arize.com
211
Arize AI
@arizeai
Jun 12
London is having a moment, and we're showing up for it. Arize is sponsoring @Londonmaxxing 003, a one-day hackathon at Ramen Space, Dalston, July 4th. Build something that makes London better to live in or build in. £1k+ prize pool + credits. Apply:
Londonmaxxing 003: Maxxing London Hackathon · Luma
From luma.com
4.4K
Arize AI
@arizeai
Jun 11
Observe 2026. 1 day at San Francisco, Shack15. 700+ AI engineers, researchers, founders, and builders. 6 new Arize AX products, live demos, and countless hallway conversations. The future of AI is self-improving agents. This year's Observe focused on the infrastructure
00:00
495
Arize AI
@arizeai
Jun 11
Our cofounder @aparnadhinak tested whether AI agents should use databases through filesystem abstractions. PostgresFS exposed docs as virtual files. A SQL skill queried Postgres, wrote results locally, and let the agent continue with Bash. Result: SQL skill 99/100. PostgresFS
375
Arize AI
@arizeai
Jun 11
Replying to @arizeai
The SQL skill paid the database cost once. It queried for the relevant slice, wrote that data to a local file, and let the agent use normal shell tools from there. That gave the agent a writable, rereadable, composable workspace.
150
Arize AI
@arizeai
Jun 11
The lessons for developers building agent harnesses: - Use the database for broad retrieval. - Use local files for iterative analysis. - Measure by question shape. Watch for abstractions that feel familiar while quietly increasing maintenance cost. Full experiment:
PostgresFS vs. SQL skills: should AI agents fake a filesystem?
From arize.com
152