Dilin is currently Senior Staff at Meta Reality Labs on the 3D Gen AI team.
His work focuses on generative AI for 3D content creation, building models
and systems that turn images and language into deployable 3D assets, scenes,
and interactive worlds.
His research often spans the end-to-end path from data licensing and processing
to model architecture, representation, evaluation, and the engineering needed
to make these models useful in interactive, agentic systems.
More broadly, he is interested in spatial intelligence: 3D generation,
reconstruction, perception, understanding and reasoning. His work often
connects visual inputs, geometry, language, and real deployment constraints.
Earlier, Dilin completed his PhD in computer science at UT Austin, advised by
Qiang Liu,
with research spanning variational inference and generative modeling.
AssetGen converts visual intent into production-ready 3D assets: mesh,
baked normals, color texture, and controlled polygon count. The system is
built for settings where generated assets need to be usable immediately in
games, simulations, and interactive 3D environments. In 3p blind
evaluations, it reaches competitive quality against leading commercial
systems and runs in 30 seconds instead of the several minutes common for baselines;
a Flash variant supports sub-15-second previews. It also
outperforms open-source models such as SAM 3D and Trellis 2.
Technical details are in the
paper.
WorldGen generates explicit 3D scenes from text: navigable, render-ready,
editable, and naturally suited for multiplayer interactive experiences.
It combines LLM-driven layout reasoning, procedural generation,
diffusion-based 3D generation, and object-aware scene decomposition. The result
is a functional 3D environment that can be explored, edited, rendered, and used
by agents or players. See the
paper
for the research version.
Before 3D generation, Dilin worked on power-efficient perception for Quest
and AR, including ML depth for passthrough, a foundational capability for
enabling mixed reality. The work focused on helping hardware understand 3D
depth under tight latency, memory, and power constraints.