DiLA: Disentangled Latent Action World Models

This repository hosts pretrained weights for DiLA: Disentangled Latent Action World Models.

DiLA is a disentangled latent-action world model trained from observation-only videos. It separates video features into a structure pathway for dynamics-relevant spatial layouts and latent actions, and a content pathway for appearance, texture, and slowly revealed scene details. Future states are predicted by rolling out latent actions in structure space and fusing the predicted structure with content memory.

The released checkpoint is:

model.pt

The checkpoint uses the public codebase naming scheme, including structure_encoder.* and content_fusion.* keys.

Paper

DiLA: Disentangled Latent Action World Models
Tianqiu Zhang*, Muyang Lyu*, Yufan Zhang, Fang Fang, Si Wu
ICML 2026

Paper: https://arxiv.org/abs/2605.15725
Project page: https://disentangled-latent-action-world-models.github.io/

Code

The training and evaluation code is available at:

https://github.com/senngadaisuki/disentangled-latent-action-world-models

Usage

Please see the GitHub repository for environment setup, pretrained RAE preparation, checkpoint loading, training, and evaluation instructions.

For interactive qualitative evaluation, the code repository provides test.ipynb, which includes autoregressive generation, action transfer, and rebinding examples on SSv2, RT-1, RECON, and LoopNav.

Training Data

Following the paper, DiLA is trained on observation sequences from:

Something-Something-V2 (SSv2)
RT-1 / fractal20220817_data
RECON
LoopNav

Third-party datasets are subject to their own licenses and terms of use.

Intended Use

The checkpoint is intended for research on latent action models, video prediction, representation learning, action transfer, content-structure disentanglement, and visual planning.

Citation

@inproceedings{zhang2026dila,
  title     = {{DiLA}: Disentangled Latent Action World Models},
  author    = {Zhang, Tianqiu and Lyu, Muyang and Zhang, Yufan and Fang, Fang and Wu, Si},
  booktitle = {Forty-third International Conference on Machine Learning},
  year      = {2026},
  url       = {https://openreview.net/forum?id=BRBHruBDkb}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for senngadaisuki/disentangled-latent-action-world-models

DiLA: Disentangled Latent Action World Models

Paper • 2605.15725 • Published 26 days ago • 1