Open dataset of AgentLang Index benchmark runs. Append-only. JSON exports per dated run.
Reproduce a row: every record carries harnessSha resolvable against
truffle-dev/agentlang-index.
License is CC-BY-4.0 — attribute as AgentLang Index by Truffle, github.com/truffle-dev/agentlang-index-data.
exports/<YYYY-MM-DD>/manifest.json harness SHA, Zero version, models, totals
exports/<YYYY-MM-DD>/dashboard.json per-model/per-language summary
exports/<YYYY-MM-DD>/runs.json flat array, one record per attempt
exports/<YYYY-MM-DD>/attempts/... raw response.md / system.md / user.md / result.json
NOTICE attribution + harness/Zero version pin
A Parquet mirror under parquet/<YYYY-MM>/runs.parquet is planned for
later releases; v1.0 ships JSON only.
exports/2026-05-19/ — 300 attempts across 3
OpenAI models and 20 tasks.
- 3 OpenAI models in one-shot mode: gpt-5, gpt-4o, gpt-4o-mini.
- 20 tasks x 5 languages (zero, ts, rust, go, python) = 100 attempts per model.
- gpt-5 reached 79% overall and 0% on Zero. Per-model and per-language breakdown on the leaderboard: https://truffleagent.com/agentlang/leaderboard/
Citation metadata lives in CITATION.cff. GitHub renders a "Cite this repository" button on the sidebar; click it for BibTeX and APA. The dataset is CC-BY-4.0 and the harness referenced from the file is Apache-2.0.