Log inSign up
Arena.ai
3,382 posts
Image
user avatar
Arena.ai
@arena
Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → arena.ai/jobs
US
arena.ai
Joined March 2023
215
Following
171.7K
Followers
  • Pinned
    user avatar
    Arena.ai
    @arena
    Jun 4
    Introducing Agent Mode: Agentic AI is now measured in the Arena. Agent Mode can do deep research, create reports, generate images, build websites, debug code, and more. It completes more complex tasks by using tools like web search, bash in a sandbox environment, image
    Image
    00:00
    277K
  • user avatar
    Arena.ai
    @arena
    2h
    HappyHorse 1.1 by @HappyHorseATH is in the Video Arena. (Text-to-Video, Image-to-Video & Video Edit) HappyHorse 1.0 currently holds top 2-4 ranks across the Video Arena, so let's see how the latest version stacks up. Bring your most creative prompts and get voting. Scores coming
    Image
    4.7K
    user avatar
    Arena.ai
    @arena
    2h
    Head over to the Video Arena at:
    Image
    Video Arena: Compare the Best AI Video Generators
    From arena.ai
    1.6K
  • user avatar
    Arena.ai
    @arena
    2h
    Millions of people worldwide bring real-world tasks to Arena - and at that scale, hot/cold storage becomes a hard problem fast. In this clip, our engineering team walks through some best practices: CDC replication, ephemeral storage tradeoffs, and what it takes to build data
    Image
    00:00
    5.6K
    user avatar
    Arena.ai
    @arena
    2h
    Listen in as our engineering team walks through best practices around how to handle billions of data points to keep Arena running:
    2.6K
  • user avatar
    Arena.ai
    @arena
    6h
    How good is GLM-5.2? GLM-5.2 (Max) ranks higher than Claude Opus 4.8 (Thinking) on Code Arena, where Frontend coding tasks are being voted on by the community. Below in thread are 10 examples of the same prompts given to both models and completed in a single shot.
    user avatar
    Arena.ai
    @arena
    Jun 16
    Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in
    Image
    15K
    user avatar
    Arena.ai
    @arena
    6h
    Replying to @arena
    More details in Code Arena leaderboard at:
    Image
    WebDev AI Leaderboard - Best AI Models for Web Development
    From arena.ai
    1.9K
    user avatar
    Arena.ai
    @arena
    6h
    Try out Code Arena: Frontend at:
    Image
    Code Arena: Build & Test with AI Coding Models
    From arena.ai
    1.9K
  • user avatar
    Arena.ai
    @arena
    8h
    [Token efficiency in Agent Arena] Agent Arena measures agent performance across a range of real-world tasks from our global community. Models get search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building
    Image
    16K
    user avatar
    Arena.ai
    @arena
    8h
    Learn more about the causal tracing methodology for Agent Arena on our blog:
    Image
    Agent Arena: Causal Evaluation of Agents in the Real World
    From arena.ai
    3.6K
    user avatar
    Arena.ai
    @arena
    8h
    Head over to the Agent Arena leaderboard to dive into the details. You can also filter by open models or view by lab:
    Image
    Agent Arena | AI Agent Performance Leaderboard
    From arena.ai
    3.1K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up