Hao Bai1,2, Rui Yang1,2, Chenlu Ye1, Spencer Whitehead2,
Aviral Kumar3, Tong Zhang1
1UIUC 2Microsoft 3CMU
AsyncWebRL trains vision-language web agents with efficient multi-step reinforcement learning. It is built on the AReaL async RL framework and sets a new open-source state of the art on the WebGym out-of-distribution test split.
Note, for the original WebGym code, please go to the webgym branch
Asynchronous system — up to 2.9× training-throughput speedup over the previously fastest open synchronous pipeline (WebGym):
- Everlasting rollout pool — rollout workers stay alive across iteration boundaries, so rollout, gradient update, and policy refresh overlap continuously with no per-iteration warm-up bubble.
- Lightweight screenshot handling — per-step image tensors stay in a dedicated in-memory actor; only lightweight references travel over RPC, avoiding the shared object-store serialization that bottlenecks WebGym.
- Decoupled off-policy correction — a decoupled importance-sampling ratio that roughly halves clip-trigger rates under async off-policyness.
Algorithmic fix — shorter trajectories at the same success rate:
- Diagnoses the per-trajectory step normalizer
$1/|\tau_i|$ in multi-step GRPO as the root cause of trajectory- and token-level inefficiency (failures run far longer than successes, so it under-weights the negative gradient on failed tokens). - Replacing it with a constant
$1/k$ breaks this coupling — trajectories contract while aggregate success is preserved, with the largest gains on the harder Medium / Hard OOD slices.
Framework:
- Built on AReaL — FSDP2 / Megatron training, SGLang / vLLM rollout, Qwen3-VL policies.
- Modular and extensible: workflows, engines, rewards, and datasets are independent, swappable components.
Everything beyond this page — installation (Docker image or local uv),
quickstart, configuration reference, and system / algorithm design — lives in
the docs:
| Topic | Link |
|---|---|
| Installation — verified package versions | Installation |
| Run training, adapt the config to your cluster | Quickstart |
| Configuration reference | Configuration |
| Async system design | Async System |
| Algorithm design | Algorithm |
@article{bai2026asyncwebrl,
title = {AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents},
author = {Bai, Hao and Yang, Rui and Ye, Chenlu and Whitehead, Spencer and Kumar, Aviral and Zhang, Tong},
journal = {arXiv preprint arXiv:2606.05597},
year = {2026}
}The code in this repository is under an MIT license. Part of our code is based on AReaL, which is under an Apache 2.0 License.
