Autoscaled RL data
for frontier agents
Post-training data that scales without armies of human annotators
We are an applied research lab working on the next generation of data harvesting techniques. Hand-built environments don't scale, and every task and reward costs valuable human hours.
As agents become longer horizon, these costs accelerate further.
To push the frontier, we need environments that can be scaled autonomously and targeted to specific capabilities.
We provide that data.
Unsupervised environment design
We let environments adapt to the agent, generating tasks at the frontier of its current ability. This follows a curriculum learning design that discovers difficulty instead of guessing it, in the lineage of PAIRED and regret-based UED.
Open-endedness
We're building toward environments that keep producing novel, learnable challenges rather than saturating human-curated benchmarks. An agent should never run out of things to learn.
Self-evolving benchmarks
We use coding agents as world-builders that construct environments, tasks, and their verifiers, then rebuild them as models improve. This lets a benchmark grow alongside the models it measures, instead of being outgrown by them.
14K
1B
Evaluations