DiLA: Disentangled Latent Action World Models
This repository hosts pretrained weights for DiLA: Disentangled Latent Action World Models.
DiLA is a disentangled latent-action world model trained from observation-only videos. It separates video features into a structure pathway for dynamics-relevant spatial layouts and latent actions, and a content pathway for appearance, texture, and slowly revealed scene details. Future states are predicted by rolling out latent actions in structure space and fusing the predicted structure with content memory.
The released checkpoint is:
model.pt
The checkpoint uses the public codebase naming scheme, including
structure_encoder.* and content_fusion.* keys.
Paper
DiLA: Disentangled Latent Action World Models
Tianqiu Zhang*, Muyang Lyu*, Yufan Zhang, Fang Fang, Si Wu
ICML 2026
- Paper: https://arxiv.org/abs/2605.15725
- Project page: https://disentangled-latent-action-world-models.github.io/
Code
The training and evaluation code is available at:
https://github.com/senngadaisuki/disentangled-latent-action-world-models
Usage
Please see the GitHub repository for environment setup, pretrained RAE preparation, checkpoint loading, training, and evaluation instructions.
For interactive qualitative evaluation, the code repository provides
test.ipynb, which includes autoregressive generation, action transfer, and
rebinding examples on SSv2, RT-1, RECON, and LoopNav.
Training Data
Following the paper, DiLA is trained on observation sequences from:
- Something-Something-V2 (SSv2)
- RT-1 /
fractal20220817_data - RECON
- LoopNav
Third-party datasets are subject to their own licenses and terms of use.
Intended Use
The checkpoint is intended for research on latent action models, video prediction, representation learning, action transfer, content-structure disentanglement, and visual planning.
Citation
@inproceedings{zhang2026dila,
title = {{DiLA}: Disentangled Latent Action World Models},
author = {Zhang, Tianqiu and Lyu, Muyang and Zhang, Yufan and Fang, Fang and Wu, Si},
booktitle = {Forty-third International Conference on Machine Learning},
year = {2026},
url = {https://openreview.net/forum?id=BRBHruBDkb}
}