Braintrust (@braintrust) / X

Braintrust

769 posts

Braintrust

@braintrust

The observability layer for production AI.

Joined August 2023

Pinned
Braintrust
@braintrust
Jun 1
Topics is now GA on all plans. Continuously find the patterns worth investigating across your production traffic.
00:00
3.1K
Braintrust
@braintrust
Jun 26
What's new: - Smarter evals with role-based score visibility and contextual rubrics. - Secure access to OpenAI, Anthropic, Google, and Azure without long-lived credentials. - Faster debugging with a redesigned trace experience that surfaces the most critical content. - Higher
00:00
950
Braintrust
@braintrust
Jun 26
Read more →
Product updates - Braintrust
From braintrust.dev
104
Braintrust
@braintrust
Jun 26
Stateful agents that do real work are worth investing in, but they're also more difficult to eval. The hard part isn't scoring, it's getting the state right. The best approach is to be deliberate about what needs to be real and what can be mocked in the eval, so you balance cost
5.9K
Braintrust
@braintrust
Jun 26
Read more → braintrustdata.link/stateful-agent…
407
Braintrust
@braintrust
Jun 25
Cost-efficiency doesn't mean picking the cheapest model. Model choice, routing, retries, fallbacks, and escalation all determine cost together. If you want to reduce AI cost without trading away quality, the right unit of analysis is cost-per-resolved-request, measured against
316
Braintrust
@braintrust
Jun 25
Read more → braintrustdata.link/cost-efficient…
142
Braintrust
@braintrust
Jun 25
We analyzed 1,781 real agent traces from @huggingface to understand what actually drives agent success across models, benchmarks, and harnesses. What we found: - The harness matters ~7× more than the model. - Open-weight models are production-ready for coding. - Cost per task
9.2K
Braintrust
@braintrust
Jun 25
Read more → braintrustdata.link/agent-traces-hf
302
Braintrust
@braintrust
Jun 24
A self-improving agent harness automates the development loop. Run the agent, capture structured traces, and score the output. You still need strong evals to prevent overfitting, and human review still matters to make everything work well.
00:00
356
Braintrust
@braintrust
Jun 24
As models develop increasingly nuanced differences in reasoning style, tool use, and context handling, the industry needs rigorous methods of measurement. The five pillars of AI model performance are autonomy, reasoning, speed, cost analysis, and reliability measurement.
389
Braintrust
@braintrust
Jun 24
Read more → braintrustdata.link/model-measurem…
125
Braintrust
@braintrust
Jun 23
Evals test for things you know to look for, but how do you find failures that you don't know about yet? Topics continuously finds patterns in your production traces, reconstructs conversational threads, clusters them, and surfaces the results in a UI built for human review.
00:00
359
Braintrust
@braintrust
Jun 23
Read more →
Automate pattern discovery with Topics, now generally available - Blog - Braintrust
From braintrust.dev
137
Braintrust
@braintrust
Jun 23
Braintrust integrates with OpenAI through direct API access, automatic tracing with wrapOpenAI, and workload identity federation for secure cloud connections. Set up OpenAI models in the playground, API, and gateway with either API keys or federated authentication.
297
Braintrust
@braintrust
Jun 23
Read the setup guide → braintrustdata.link/integration-op…
153
Braintrust
@braintrust
Jun 22
The Agent Open panel lineup is live. On Jun 30th in San Francisco Braintrust and friends host a conversation with leaders building the infrastructure for shipping quality agents. Then, it's time for pickleball. First come, first serve. Literally.
376
Braintrust
@braintrust
Jun 22
Join us → luma.com/the-agent-open
222
Braintrust
@braintrust
Jun 22
There have been six generations of AI agents: - A simple prompt that asks a model a question. - A fixed pipeline that retrieves context and puts it into the prompt to get a result. - A react loop, in which the model decides what tools to call and in what order. - A
00:00
470
Braintrust
@braintrust
Jun 19
When you're building AI systems, you need to know what prompt your LLM received, what it returned, and how many tokens it used. And you need to log tool calls, retrieval, reasoning, and handoffs between subagents. OpenTelemetry is an OSS framework for capturing that data using
00:00
597
Braintrust
@braintrust
Jun 18
How do you make AI traces readable for non-engineers? Custom trace views in Braintrust transform a raw trace into a format that a subject matter expert can understand. For example, you can turn a customer support trace into a ticket card with the entire conversation, the
00:00
262