Log inSign up
ARC Prize
810 posts
Image
user avatar
ARC Prize
@arcprize
A North Star for open AGI. Co-founders: @fchollet @mikeknoop. President: @gregkamradt. We're hiring mission-driven builders: arcprize.org/jobs
Earth
arcprize.org
Joined March 2024
187
Following
38.1K
Followers
  • Pinned
    user avatar
    ARC Prize
    @arcprize
    Mar 25
    Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
    Image
    GIF
    742K
  • user avatar
    ARC Prize
    @arcprize
    Jul 10, 2025
    Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA
    Image
    7.3M
  • user avatar
    ARC Prize
    @arcprize
    Dec 20, 2024
    New verified ARC-AGI-Pub SoTA! @OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation. And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval. 1/4
    Image
    2.5M
  • user avatar
    ARC Prize
    @arcprize
    Jun 11, 2025
    After the o3 price reduction, we retested the o3-2025-04-16 model on ARC-AGI to determine whether its performance had changed. We compared the retest results with the original results and observed no difference in performance.
    435K
  • user avatar
    ARC Prize
    @arcprize
    Mar 24, 2025
    Today we are announcing ARC-AGI-2, an unsaturated frontier AGI benchmark that challenges AI reasoning systems (same relative ease for humans). Grand Prize: 85%, ~$0.42/task efficiency Current Performance: * Base LLMs: 0% * Reasoning Systems: <4%
    Image
    462K
  • user avatar
    ARC Prize
    @arcprize
    Sep 16, 2025
    New SOTA on ARC-AGI - V1: 79.6%, $8.42/task - V2: 29.4%, $30.40/task Custom submissions by @jeremyberman and @_eric_pang_ are now the best known solutions to ARC-AGI Both: * Are open source * Use Grok 4 * Implement program-synthesis outer loops with test-time adaptation
    Image
    7.5M
  • user avatar
    ARC Prize
    @arcprize
    Jul 18, 2025
    Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
    Image
    521K
  • user avatar
    ARC Prize
    @arcprize
    Aug 15, 2025
    Analyzing the Hierarchical Reasoning Model by @makingAGI We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source ARC-AGI Semi Private Scores: * ARC-AGI-1: 32% * ARC-AGI-2: 2% Our 4 findings:
    Image
    273K
  • user avatar
    ARC Prize
    @arcprize
    Oct 9, 2025
    New ARC-AGI SOTA: GPT-5 Pro - ARC-AGI-1: 70.2%, $4.78/task - ARC-AGI-2: 18.3%, $7.41/task @OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark
    Image
    Image
    563K
  • user avatar
    ARC Prize
    @arcprize
    Apr 16, 2025
    Clarifying o3’s ARC-AGI Performance OpenAI has confirmed: * The released o3 is a different model from what we tested in December 2024 * All released o3 compute tiers are smaller than the version we tested * The released o3 was not trained on ARC-AGI data, not even the train
    226K
  • user avatar
    ARC Prize
    @arcprize
    Jan 21, 2025
    Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks) DeepSeek V3: * Semi-Private: 7.3% ($.002) * Public Eval: 14% ($.002) DeepSeek Reasoner: * Semi-Private: 15.8% ($.06) * Public Eval: 20.5% ($.05) (Avg $ per task)
    292K
  • user avatar
    ARC Prize
    @arcprize
    Feb 14, 2025
    Introducing SnakeBench, an experimental benchmark side quest We made 50 LLMs battle each other in head-to-head snake 🐍 2.8K matches showed which models are the best at snake real-time strategy and spatial reasoning Here’s the top match between o3-mini and DeepSeek-R1 🧵
    Image
    GIF
    Image
    177K
  • user avatar
    ARC Prize
    @arcprize
    Oct 21, 2025
    Grok-4 (Fast Reasoning) on ARC-AGI Semi Private Eval - ARC-AGI-1: 48.5%, $0.03/task - ARC-AGI-2: 5.3%, $0.06/task @xai pushes the frontier of performance efficiency on ARC-AGI
    Image
    Image
    1.6M
  • user avatar
    ARC Prize
    @arcprize
    Mar 27, 2025
    Gemini-2.5-Pro Experimental Preview Results ARC-AGI-1 * Public Eval: 24.3% * Semi Private: 12.5% ARC-AGI-2 * Public Eval: .8% * Semi Private: 1.3% These results are on par with Deepseek's R1
    298K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up