Skip to content

microsoft/webgym

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Paper Documentation Status Project Page Code License: Apache 2.0 Python 3.12+

AsyncWebRL

Hao Bai1,2, Rui Yang1,2, Chenlu Ye1, Spencer Whitehead2,
Aviral Kumar3, Tong Zhang1

1UIUC   2Microsoft   3CMU

AsyncWebRL trains vision-language web agents with efficient multi-step reinforcement learning. It is built on the AReaL async RL framework and sets a new open-source state of the art on the WebGym out-of-distribution test split.

Note, for the original WebGym code, please go to the webgym branch

Features

Asynchronous system — up to 2.9× training-throughput speedup over the previously fastest open synchronous pipeline (WebGym):

  • Everlasting rollout pool — rollout workers stay alive across iteration boundaries, so rollout, gradient update, and policy refresh overlap continuously with no per-iteration warm-up bubble.
  • Lightweight screenshot handling — per-step image tensors stay in a dedicated in-memory actor; only lightweight references travel over RPC, avoiding the shared object-store serialization that bottlenecks WebGym.
  • Decoupled off-policy correction — a decoupled importance-sampling ratio that roughly halves clip-trigger rates under async off-policyness.

Algorithmic fix — shorter trajectories at the same success rate:

  • Diagnoses the per-trajectory step normalizer $1/|\tau_i|$ in multi-step GRPO as the root cause of trajectory- and token-level inefficiency (failures run far longer than successes, so it under-weights the negative gradient on failed tokens).
  • Replacing it with a constant $1/k$ breaks this coupling — trajectories contract while aggregate success is preserved, with the largest gains on the harder Medium / Hard OOD slices.

Framework:

  • Built on AReaL — FSDP2 / Megatron training, SGLang / vLLM rollout, Qwen3-VL policies.
  • Modular and extensible: workflows, engines, rewards, and datasets are independent, swappable components.

Documentation

Everything beyond this page — installation (Docker image or local uv), quickstart, configuration reference, and system / algorithm design — lives in the docs:

Topic Link
Installation — verified package versions Installation
Run training, adapt the config to your cluster Quickstart
Configuration reference Configuration
Async system design Async System
Algorithm design Algorithm

Citation

@article{bai2026asyncwebrl,
  title     = {AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents},
  author    = {Bai, Hao and Yang, Rui and Ye, Chenlu and Whitehead, Spencer and Kumar, Aviral and Zhang, Tong},
  journal   = {arXiv preprint arXiv:2606.05597},
  year      = {2026}
}

License

The code in this repository is under an MIT license. Part of our code is based on AReaL, which is under an Apache 2.0 License.

About

This project includes code for using the AsyncWebRL and WebGym frameworks to train web agent models.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages