MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Official implementation, training, and evaluation code for MemGUI-Agent.

MemGUI-8B-SFT-Demo.mp4

🔥 News

2026-06-23: Open-sourced the code repository.
2026-06-19: Paper preprint is available on arXiv.
2026-06-16: Released the project page, MemGUI-3K dataset, MemGUI-8B-SFT model, and benchmark results.

Main Results

MemGUI-Agent improves both zero-shot 235B and trained 8B settings on long-horizon mobile GUI benchmarks. On MemGUI-Bench, MemGUI-Agent-235B reaches 62.5% Pass@3, and MemGUI-8B-SFT reaches 23.4% Pass@1. On the out-of-distribution MobileWorld GUI-Only benchmark, MemGUI-Agent-235B reaches 29.1% success rate, and MemGUI-8B-SFT reaches 17.9%.

Full benchmark trajectories are available on the official pages: MemGUI-Bench and MobileWorld.

For captioned leaderboard tables and paper figures, see the project page.

Overview

MemGUI-Agent is an end-to-end mobile GUI agent for long-horizon tasks that require remembering progress, preserving UI facts, and controlling prompt growth. Its core interface, ConAct (Context-as-Action), makes context management part of each model response instead of an external module.

ConAct maintains three structured fields: Folded Action History, Folded UI State, and Recent Step Record.

MemGUI-Agent updates folded history, UI memory, and recent step records while producing the next GUI action. We evaluate both a zero-shot 235B ConAct agent with unchanged backbone weights and MemGUI-8B-SFT, an 8B agent trained on MemGUI-3K.

MemGUI-3K

MemGUI-3K contains 2,956 successful mobile GUI trajectories, 82,103 task steps, and 64,430 evaluator-approved reasonable steps.

Dataset usage: data/memgui3k/README.md.

Repository Layout

MemGUI-Agent/
|-- data/
|   `-- memgui3k/                  # Dataset download, restore, packaging, and conversion tools
|-- evaluation/
|   `-- memgui3k_offline_eval/     # Step-level offline evaluation on MemGUI-3K
|-- scripts/                       # Convenience entrypoints
|-- training/
|   `-- ms_swift/                  # MemGUI-8B-SFT ms-swift LoRA SFT template
|-- website/                       # Project-page notes
|-- requirements.txt
`-- README.md

Quick Start

Install the Python dependencies used by the public utilities:

pip install -r requirements.txt

Download MemGUI-3K from Hugging Face:

bash scripts/download_memgui3k.sh

Restore screenshots into data/MemGUI-3K/images/:

bash scripts/restore_memgui3k_images.sh

Build step-level multimodal training JSONL files:

bash scripts/build_memgui3k_training_data.sh

This writes:

data/MemGUI-3K/training_data/
|-- train_sft.jsonl
`-- test_sft.jsonl

Training MemGUI-8B-SFT

MemGUI-8B-SFT is trained with ms-swift from Qwen3-VL-8B-Instruct. The released template keeps the paper's key hyperparameters:

Parameter	Value
Base model	`Qwen/Qwen3-VL-8B-Instruct`
Training type	LoRA SFT
Epochs	1
Learning rate	`1e-4`
LoRA rank / alpha	`8 / 32`
Target modules	`all-linear`
Max length	`32768`
Per-device train batch size	`2`
Gradient accumulation	`8`
GPUs	8

Run the public template:

bash training/ms_swift/train_memgui_8b_sft.sh

See training/ms_swift/README.md for the full command and environment variables.

Evaluation

The offline evaluation toolkit compares model outputs with MemGUI-3K gold step responses and reports action matching, memory actions, folding quality, and format compliance. See evaluation/memgui3k_offline_eval/README.md.

For end-to-end rollout scripts, trajectories, and evaluation results, see:

MemGUI-Bench: https://github.com/lgy0404/MemGUI-Bench
MobileWorld: https://github.com/Tongyi-MAI/MobileWorld

Contact

For questions about the paper, code, or released artifacts, contact guangyiliu@zju.edu.cn or the corresponding author at yongliu@iipc.zju.edu.cn.

⭐ Star History

Citation

@article{liu2026memgui,
  title={MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  author={Liu, Guangyi and Wu, Gao and Liu, Congxiao and Zhao, Pengxiang and Liu, Liang and Li, Mading and Zhang, Qi and Wang, Mengyan and Guo, Liang and Liu, Yong},
  journal={arXiv preprint arXiv:2606.19926},
  year={2026}
}

License

Code in this repository is released under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

🔥 News

Main Results

Overview

MemGUI-3K

Repository Layout

Quick Start

Training MemGUI-8B-SFT

Evaluation

Contact

⭐ Star History

Citation

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/memgui3k		data/memgui3k
docs		docs
evaluation/memgui3k_offline_eval		evaluation/memgui3k_offline_eval
scripts		scripts
training/ms_swift		training/ms_swift
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

🔥 News

Main Results

Overview

MemGUI-3K

Repository Layout

Quick Start

Training MemGUI-8B-SFT

Evaluation

Contact

⭐ Star History

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages