AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Hao Bai^1,2, Rui Yang^1,2, Chenlu Ye¹, Spencer Whitehead²,
Aviral Kumar³, Tong Zhang¹

¹UIUC ²Microsoft ³CMU

AsyncWebRL trains vision-language web agents with efficient multi-step reinforcement learning. It is built on the AReaL async RL framework and sets a new open-source state of the art on the WebGym out-of-distribution test split.

Note, for the original WebGym code, please go to the webgym branch

Features

Asynchronous system — up to 2.9× training-throughput speedup over the previously fastest open synchronous pipeline (WebGym):

Everlasting rollout pool — rollout workers stay alive across iteration boundaries, so rollout, gradient update, and policy refresh overlap continuously with no per-iteration warm-up bubble.
Lightweight screenshot handling — per-step image tensors stay in a dedicated in-memory actor; only lightweight references travel over RPC, avoiding the shared object-store serialization that bottlenecks WebGym.
Decoupled off-policy correction — a decoupled importance-sampling ratio that roughly halves clip-trigger rates under async off-policyness.

Algorithmic fix — shorter trajectories at the same success rate:

Diagnoses the per-trajectory step normalizer $1/|\tau_i|$ in multi-step GRPO as the root cause of trajectory- and token-level inefficiency (failures run far longer than successes, so it under-weights the negative gradient on failed tokens).
Replacing it with a constant $1/k$ breaks this coupling — trajectories contract while aggregate success is preserved, with the largest gains on the harder Medium / Hard OOD slices.

Framework:

Built on AReaL — FSDP2 / Megatron training, SGLang / vLLM rollout, Qwen3-VL policies.
Modular and extensible: workflows, engines, rewards, and datasets are independent, swappable components.

Documentation

Everything beyond this page — installation (Docker image or local uv), quickstart, configuration reference, and system / algorithm design — lives in the docs:

Topic	Link
Installation — verified package versions	Installation
Run training, adapt the config to your cluster	Quickstart
Configuration reference	Configuration
Async system design	Async System
Algorithm design	Algorithm

Citation

@article{bai2026asyncwebrl,
  title     = {AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents},
  author    = {Bai, Hao and Yang, Rui and Ye, Chenlu and Whitehead, Spencer and Kumar, Aviral and Zhang, Tong},
  journal   = {arXiv preprint arXiv:2606.05597},
  year      = {2026}
}

License

The code in this repository is under an MIT license. Part of our code is based on AReaL, which is under an Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
areal		areal
assets		assets
docs		docs
omniboxes		omniboxes
scripts		scripts
visualize		visualize
webgym		webgym
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Features

Documentation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

Features

Documentation

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages