Project Page | Paper | arXiv
teaser.mp4
RATs is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse. RATs runs in two stages:
| Stage | What it does |
|---|---|
| Play | Curiosity-driven skill acquisition: a proposer → planner → policy-writer → verifier → failure-diagnoser loop invents tasks, executes code-as-policy, and grows a skill library from what works. |
| Evaluation | Frozen learned skills are injected as planner context and compared head-to-head with CaP-X baselines, in two complementary settings: in-domain (LIBERO-PRO, MolmoSpaces) and cross-environment transfer (reusing the same skills in Robosuite and on a real Franka Panda). |
RATs uses uv for dependency management and runs on Python 3.10 with a CUDA-capable GPU. The full guide — cache/quota tuning, every submodule, the MolmoSpaces bridge, and the capx-baseline/ environments — is in docs/setup.md.
# Clone
git clone --branch main --depth 1 https://github.com/Playful-RATs/RATs rats
cd rats
# Root runtime submodules (LIBERO-PRO + MolmoSpaces)
git submodule update --init --depth 1 \
rats/third_party/LIBERO-PRO \
rats/third_party/libero_dependencies/robosuite \
rats/third_party/robosuite \
rats/third_party/contact_graspnet_pytorch \
rats/third_party/curobo \
rats/third_party/sam3
# Root env
uv venv .venv --python 3.10
source .venv/bin/activate
uv sync --frozen --active --extra libero --extra contactgraspnet
# LIBERO prompts for ~/.libero/config.yaml on first import in a fresh shell.
# Pre-create it to keep setup and batch runs non-interactive.
mkdir -p ~/.libero
cat > ~/.libero/config.yaml <<EOF
benchmark_root: $PWD/rats/third_party/LIBERO-PRO/libero/libero
bddl_files: $PWD/rats/third_party/LIBERO-PRO/libero/libero/./bddl_files
init_states: $PWD/rats/third_party/LIBERO-PRO/libero/libero/./init_files
datasets: $PWD/rats/third_party/LIBERO-PRO/libero/datasets
assets: $PWD/rats/third_party/LIBERO-PRO/libero/libero/./assets
EOFThe root
.venvruns both LIBERO-PRO and MolmoSpaces. MolmoSpaces additionally needs acondabridge env and ~tens of GB of assets; CaP-X baselines, Robosuite transfer, and real-world transfer run fromcapx-baseline/with their own envs. See docs/setup.md for all of it.
This section assumes the root .venv from Installation is active. Set a model provider and the runtime exports first:
export OPENAI_API_KEY="sk-..." # and/or GEMINI_API_KEY / OPENROUTER_API_KEY
export PYTHONPATH="$PWD:${PYTHONPATH:-}"
export CAPX_ENV_STACK=libero
export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=eglpython scripts/run_rats.py \
--config env_configs/libero/rats_libero_play.yaml \
--explore --iterations 50 \
--output-dir outputs/play_liberoMolmoSpaces play with the launcher-managed bridge is in docs/play.md.
The batch driver runs all six LIBERO-PRO suites (× 10 tasks × 5 trials) with skill reuse frozen:
RATS_VERIFIER_STRICT_BENCHMARK=1 python scripts/run_rats_libero_pro_batch.py \
--seed-skill-library outputs/play_libero/snapshots/iter050/skills.json \
--extra-rats-flags "--model google/gemini-3.5-flash" \
--output-dir outputs/rats_libero_pro_iter050seed \
--gpus 0,1,2,3,4,6,7 --workers 18 --skip-completedCaP-X baselines, the MolmoSpaces eval, and result summarization are in docs/evaluation.md.
Reuse LIBERO-learned skills as planner context in Robosuite and on a real Franka — see docs/cross-environment.md.
| Guide | Contents |
|---|---|
| Setup | Full environment setup: prerequisites, cache/quota tuning, submodules, LIBERO-PRO / MolmoSpaces / CaP-X-baseline runtimes |
| Play | Skill acquisition in LIBERO-PRO and MolmoSpaces with Contact-GraspNet |
| Evaluation | RATs vs. CaP-X in LIBERO-PRO and MolmoSpaces; output artifacts |
| Cross-Environment Evaluation | Transferring learned skills to Robosuite and a real Franka Panda |
@article{rats2026playful,
title = {Playful Agentic Robot Learning},
author = {Zhang, Junyi and Ge, Jiaxin and Yoo, Hanjun and Fu, Letian and Yang, Zihan and Liu, Yaowei and Saravanan, Raj and Yin, Shaofeng and Yu, Justin and Niu, Dantong and Wang, Zirui and Herzig, Roei and Goldberg, Ken and Bai, Yutong and Chan, David M. and Stoica, Ion and Kanazawa, Angjoo and Lei, Jiahui and Feng, Haiwen and Darrell, Trevor},
journal = {arXiv preprint arXiv:2606.19419},
year = {2026}
}