agentlang-index

Zero is Vercel Labs' agent-first programming language. The thesis is that frontier models write Zero more accurately than they write TypeScript, Rust, Go, or Python. AgentLang Index is the measurement.

Status

v1.0 corpus shipped 2026-05-18, all 20 tasks byte-exact across five languages. First public benchmark run published 2026-05-19 at truffleagent.com/agentlang covering three OpenAI frontier models. Per-model drill-downs: gpt-5, gpt-4o, gpt-4o-mini.

Headline from the first run: every model scored 0% on Zero and 70-95% on TypeScript, Rust, Go, and Python. The average language tax is -78% — i.e. these models lose 78 percentage points of accuracy when asked to write Zero instead of an established language. That is the gap an agent-first language has to close before the agent-first claim is real.

What it tests

Both one-shot and agent-loop modes per task per language. One-shot is the model. Agent-loop is the model plus structured diagnostics and up to five repair attempts.
Five primary metrics: first-attempt pass rate, repair iterations to green, total token cost, wall-time, runtime correctness on hidden tests.
A derived sixth metric, language tax: pass-rate-in-Zero minus pass-rate-in-each-other-language, per model. That is the chart the benchmark exists to produce.

What it refuses to be

Not a Zero marketing instrument. Tasks where Zero scores poorly ship in the dashboard with the same prominence as tasks where it scores well. The methodology page explains what went wrong, not how to phrase around it.
Not a closed dataset. Every run is reproducible from truffle-dev/agentlang-index-data, CC-BY-4.0, with the harness git SHA and Zero version pinned per export.
Not an opinion piece. The thesis is testable. The artifact is the answer.

Quickstart

Reproduce a single-model sweep from a clean clone:

git clone https://github.com/truffle-dev/agentlang-index
cd agentlang-index
export OPENAI_API_KEY=sk-...
bun run bench/runner.ts --model gpt-4o-mini

The runner depends only on bun (no npm install step; OpenAI is called via raw fetch). Per-task language runners need tsc, rustc/ cargo, go, and python on PATH when the task uses them; missing toolchains surface as spawn ENOENT in the per-task stderr.

Output lands under bench/results/<model>/runs/<id>/ with per-task scratch/ (the materialized source the model wrote), stdout, stderr, and result.json. Aggregate a finished sweep into the landing-page JSON with bun run bench/aggregate.ts --site.

The Python harness at harness/ is the longer-form runner with SQLite storage and agent-loop mode; see harness/README.md.

Layout

agentlang-index/
  bench/                   TypeScript single-attempt runner (bun)
    runner.ts              one-shot per-model entry point
    aggregate.ts           collate runs into models/results JSON
    results/               per-run stdout/stderr/scratch captures
  harness/                 Python agent-loop runner (uv + pytest)
  corpus/                  Task specs, prompts, reference impls, tests
    000-hello-stdout/
    ...
  vendor/
    zero/<version>/        Pinned zero --version + skill data
    ts/                    Pinned tsconfig.json
    rust/                  rust-toolchain.toml
    go/                    go.mod
    python/                .python-version

Companion repos:

truffle-dev/agentlang-index-data — open dataset, CC-BY-4.0.
truffle-dev/agentlang-spec — Zero CLI used during corpus assembly.

Citation

Citation metadata lives in CITATION.cff. GitHub renders a "Cite this repository" button on the sidebar; click it for BibTeX and APA. The harness is Apache-2.0 and the companion dataset referenced from the file is CC-BY-4.0.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
bench		bench
corpus		corpus
docs		docs
harness		harness
vendor/zero		vendor/zero
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentlang-index

Status

What it tests

What it refuses to be

Quickstart

Layout

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentlang-index

Status

What it tests

What it refuses to be

Quickstart

Layout

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages