MobileForge

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

Guangyi Liu1,2,*, Pengxiang Zhao1,2,*, Gao Wu1,2,*, Yiwen Yin3,*, Mading Li2,†, Liang Liu2, Congxiao Liu1,2, Zhang Qi2, Mengyan Wang2, Liang Guo2, Yong Liu1,✉

1Zhejiang University    2Kuaishou Technology    3Tsinghua University

*Equal contribution Project lead Corresponding author

Contact: guangyiliu@zju.edu.cn / yongliu@iipc.zju.edu.cn

MobileForge turns real target-app interaction into executable curricula, hierarchical rollout feedback, and hint-contextualized step-level GRPO updates, enabling mobile GUI agents to adapt without human-written tasks, demonstrations, or reward labels.

Core Idea

Annotation-free adaptation from real app interaction.

MobileForge targets two gaps in mobile GUI learning: generated tasks and feedback are often detached from the target app, while sparse rollout outcomes are hard to turn into reusable step-level policy updates.

MobileForge teaser showing annotation-free adaptation for mobile GUI agents
MobileForge grounds generated tasks in target-app interaction and converts hierarchical feedback into reusable policy-improvement signals.

Results

Main performance of MobileForge.

Full paper
MobileForge main performance table
Main results from the paper: scaling with generated MobileForge tasks, in-domain AndroidWorld adaptation, and out-of-domain MobileWorld GUI-only generalization.
67.24%ForgeOwl-8B Pass@1

AndroidWorld in-domain adaptation with the MobileForge-adapted GUI-Owl-1.5-8B model.

77.59%ForgeOwl-8B Pass@3

Strong AndroidWorld multi-attempt success after annotation-free adaptation.

41.03%MobileWorld SR

Out-of-domain GUI-only success with no MobileWorld rollout used for training.

67.24%ForgeQwen3-8B Pass@3

Qwen3-VL-8B after MobileForge adaptation, close to the closed-data GUI-specialized base model.

Method

MobileGym grounds experience; HiFPO turns feedback into updates.

MobileForge overview pipeline
MobileForge links target-app exploration, curriculum mining, rollout execution, hierarchical evaluation, and hint-contextualized policy optimization in one annotation-free adaptation loop.
MobileGym framework

MobileGym

A unified mobile substrate that mines executable tasks from app interaction traces and evaluates completed rollouts with outcome labels, process feedback, and corrective hints.

HiFPO optimization

HiFPO

A feedback-guided optimization loop that reuses hints across attempts, filters mastered tasks and noisy steps, and trains on hint-contextualized step-level GRPO samples.

MobileForge corrective hint case study
Corrective hints from MobileGym-Critic are reused by HiFPO to guide later attempts and training prompts.

Hierarchical Feedback

Making rollout feedback reusable.

MobileGym-Critic separates final task outcome, step-level process quality, and corrective hints. HiFPO uses these signals to keep informative experience, discard mastered tasks, and condition GRPO on feedback from earlier attempts.

Citation

Cite MobileForge

@article{liu2026mobileforge,
  title={MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization},
  author={Liu, Guangyi and Zhao, Pengxiang and Wu, Gao and Yin, Yiwen and Li, Mading and Liu, Liang and Liu, Congxiao and Qi, Zhang and Wang, Mengyan and Guo, Liang and Liu, Yong},
  journal={arXiv preprint arXiv:2606.19930},
  year={2026}
}