Log inSign up
Braintrust
769 posts
Image
user avatar
Braintrust
@braintrust
The observability layer for production AI.
braintrust.dev
Joined August 2023
55
Following
6,895
Followers
  • Pinned
    user avatar
    Braintrust
    @braintrust
    Jun 1
    Topics is now GA on all plans. Continuously find the patterns worth investigating across your production traffic.
    Image
    00:00
    3.1K
  • user avatar
    Braintrust
    @braintrust
    Jun 26
    What's new: - Smarter evals with role-based score visibility and contextual rubrics. - Secure access to OpenAI, Anthropic, Google, and Azure without long-lived credentials. - Faster debugging with a redesigned trace experience that surfaces the most critical content. - Higher
    Image
    00:00
    950
    user avatar
    Braintrust
    @braintrust
    Jun 26
    Read more →
    Image
    Product updates - Braintrust
    From braintrust.dev
    104
  • user avatar
    Braintrust
    @braintrust
    Jun 26
    Stateful agents that do real work are worth investing in, but they're also more difficult to eval. The hard part isn't scoring, it's getting the state right. The best approach is to be deliberate about what needs to be real and what can be mocked in the eval, so you balance cost
    5.9K
    user avatar
    Braintrust
    @braintrust
    Jun 26
    Read more → braintrustdata.link/stateful-agent…
    Image
    407
  • user avatar
    Braintrust
    @braintrust
    Jun 25
    Cost-efficiency doesn't mean picking the cheapest model. Model choice, routing, retries, fallbacks, and escalation all determine cost together. If you want to reduce AI cost without trading away quality, the right unit of analysis is cost-per-resolved-request, measured against
    316
    user avatar
    Braintrust
    @braintrust
    Jun 25
    Read more → braintrustdata.link/cost-efficient…
    Image
    142
  • user avatar
    Braintrust
    @braintrust
    Jun 25
    We analyzed 1,781 real agent traces from @huggingface to understand what actually drives agent success across models, benchmarks, and harnesses. What we found: - The harness matters ~7× more than the model. - Open-weight models are production-ready for coding. - Cost per task
    9.2K
    user avatar
    Braintrust
    @braintrust
    Jun 25
    Read more → braintrustdata.link/agent-traces-hf
    Image
    302
  • user avatar
    Braintrust
    @braintrust
    Jun 24
    A self-improving agent harness automates the development loop. Run the agent, capture structured traces, and score the output. You still need strong evals to prevent overfitting, and human review still matters to make everything work well.
    Image
    00:00
    356
  • user avatar
    Braintrust
    @braintrust
    Jun 24
    As models develop increasingly nuanced differences in reasoning style, tool use, and context handling, the industry needs rigorous methods of measurement. The five pillars of AI model performance are autonomy, reasoning, speed, cost analysis, and reliability measurement.
    389
    user avatar
    Braintrust
    @braintrust
    Jun 24
    Read more → braintrustdata.link/model-measurem…
    Image
    125
  • user avatar
    Braintrust
    @braintrust
    Jun 23
    Evals test for things you know to look for, but how do you find failures that you don't know about yet? Topics continuously finds patterns in your production traces, reconstructs conversational threads, clusters them, and surfaces the results in a UI built for human review.
    Image
    00:00
    359
    user avatar
    Braintrust
    @braintrust
    Jun 23
    Read more →
    Image
    Automate pattern discovery with Topics, now generally available - Blog - Braintrust
    From braintrust.dev
    137
  • user avatar
    Braintrust
    @braintrust
    Jun 23
    Braintrust integrates with OpenAI through direct API access, automatic tracing with wrapOpenAI, and workload identity federation for secure cloud connections. Set up OpenAI models in the playground, API, and gateway with either API keys or federated authentication.
    297
    user avatar
    Braintrust
    @braintrust
    Jun 23
    Read the setup guide → braintrustdata.link/integration-op…
    Image
    153
  • user avatar
    Braintrust
    @braintrust
    Jun 22
    The Agent Open panel lineup is live. On Jun 30th in San Francisco Braintrust and friends host a conversation with leaders building the infrastructure for shipping quality agents. Then, it's time for pickleball. First come, first serve. Literally.
    376
    user avatar
    Braintrust
    @braintrust
    Jun 22
    Join us → luma.com/the-agent-open
    Image
    222
  • user avatar
    Braintrust
    @braintrust
    Jun 22
    There have been six generations of AI agents: - A simple prompt that asks a model a question. - A fixed pipeline that retrieves context and puts it into the prompt to get a result. - A react loop, in which the model decides what tools to call and in what order. - A
    Image
    00:00
    470
  • user avatar
    Braintrust
    @braintrust
    Jun 19
    When you're building AI systems, you need to know what prompt your LLM received, what it returned, and how many tokens it used. And you need to log tool calls, retrieval, reasoning, and handoffs between subagents. OpenTelemetry is an OSS framework for capturing that data using
    Image
    00:00
    597
  • user avatar
    Braintrust
    @braintrust
    Jun 18
    How do you make AI traces readable for non-engineers? Custom trace views in Braintrust transform a raw trace into a format that a subject matter expert can understand. For example, you can turn a customer support trace into a ticket card with the entire conversation, the
    Image
    00:00
    262

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up