XR-1-Stage1-UVMC

[Project Page] [Paper] [GitHub]

This repository contains the Stage 1 weights for the XR-1 (X Robotic Model 1) project, specifically the Unified Vision-Motion Codes (UVMC) tokenizer.

πŸ€– Model Description

The UVMC is a core component of the XR-1 Vision-Language-Action (VLA) framework. It utilizes a dual-branch VQ-VAE (Vector Quantized-Variational Autoencoder) architecture to encode both environmental visual dynamics and robot actions into a shared, discrete latent space.

Key Features

  • Unified Representation: Maps high-dimensional images and continuous robot actions into a single discrete vocabulary.
  • Cross-Embodiment: Trained on diverse datasets (RoboMIND 2.0) to support multiple robot configurations (e.g., Tien Kung, Franka, UR-5e).
  • Foundation for VLA: Provides the essential tokenization layer for the Stage 2 Transformer-based policy training.

πŸ›  Usage

To use these weights, please clone the official XR-1 GitHub Repository and follow the installation instructions.

πŸ“ Citation

If you find this model or the XR-1 framework useful in your research, please cite our work:

@article{fan2025xr,
  title={XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations},
  author={Fan, Shichao and Wu, Kun and Che, Zhengping and Wang, Xinhua and Wu, Di and Liao, Fei and Liu, Ning and Zhang, Yixue and Zhao, Zhen and Xu, Zhiyuan and others},
  journal={arXiv preprint arXiv:2411.02776},
  year={2025}
}

πŸ“œ License

This project is licensed under the MIT License.


Contact: For questions, please open an issue on our GitHub or contact us at [email protected].

Discussions

If you're interested in XR-1, welcome to join our WeChat group for discussions.

Downloads last month
31
Safetensors
Model size
1B params
Tensor type
F32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including X-Humanoid/XR-1-Stage1-UVMC

Paper for X-Humanoid/XR-1-Stage1-UVMC