Arena.ai (@arena) / X

Arena.ai

3,382 posts

Arena.ai

@arena

Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → arena.ai/jobs

Joined March 2023

Pinned
Arena.ai
@arena
Jun 4
Introducing Agent Mode: Agentic AI is now measured in the Arena. Agent Mode can do deep research, create reports, generate images, build websites, debug code, and more. It completes more complex tasks by using tools like web search, bash in a sandbox environment, image
00:00
277K
Arena.ai
@arena
2h
HappyHorse 1.1 by @HappyHorseATH is in the Video Arena. (Text-to-Video, Image-to-Video & Video Edit) HappyHorse 1.0 currently holds top 2-4 ranks across the Video Arena, so let's see how the latest version stacks up. Bring your most creative prompts and get voting. Scores coming
4.7K
Arena.ai
@arena
2h
Head over to the Video Arena at:
Video Arena: Compare the Best AI Video Generators
From arena.ai
1.6K
Arena.ai
@arena
2h
Millions of people worldwide bring real-world tasks to Arena - and at that scale, hot/cold storage becomes a hard problem fast. In this clip, our engineering team walks through some best practices: CDC replication, ephemeral storage tradeoffs, and what it takes to build data
00:00
5.6K
Arena.ai
@arena
2h
Listen in as our engineering team walks through best practices around how to handle billions of data points to keep Arena running:
2.6K
Arena.ai
@arena
6h
How good is GLM-5.2? GLM-5.2 (Max) ranks higher than Claude Opus 4.8 (Thinking) on Code Arena, where Frontend coding tasks are being voted on by the community. Below in thread are 10 examples of the same prompts given to both models and completed in a single shot.
Arena.ai
@arena
Jun 16
Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in
15K
Arena.ai
@arena
6h
Replying to @arena
More details in Code Arena leaderboard at:
WebDev AI Leaderboard - Best AI Models for Web Development
From arena.ai
1.9K
Arena.ai
@arena
6h
Try out Code Arena: Frontend at:
Code Arena: Build & Test with AI Coding Models
From arena.ai
1.9K
Arena.ai
@arena
8h
[Token efficiency in Agent Arena] Agent Arena measures agent performance across a range of real-world tasks from our global community. Models get search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building
16K
Arena.ai
@arena
8h
Learn more about the causal tracing methodology for Agent Arena on our blog:
Agent Arena: Causal Evaluation of Agents in the Real World
From arena.ai
3.6K
Arena.ai
@arena
8h
Head over to the Agent Arena leaderboard to dive into the details. You can also filter by open models or view by lab:
Agent Arena | AI Agent Performance Leaderboard
From arena.ai
3.1K