Improve model card metadata and add paper links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +38 -14
README.md CHANGED
@@ -1,24 +1,48 @@
1
  ---
2
- library_name: <World Model>
3
- tags:
4
- - Text-to-Video
5
- - Image-to-Video
6
- - Diffusion Video Model
7
- - World Model
8
  license: apache-2.0
 
 
 
 
 
 
9
  ---
10
 
11
- # Yume: An Interactive World Generation Model
12
 
13
- This is a preview version of the Yume model, an interactive world generation model, presented in the paper
14
- [Yume: An Interactive World Generation Model](https://huggingface.co/papers/2507.17744) and [Yume-1.5: A Text-Controlled Interactive World Generation Model](
15
- https://arxiv.org/abs/2512.22096).
16
 
17
- Yume aims to create an interactive, realistic, and dynamic world from an input image, allowing exploration and control.
 
 
 
18
 
19
- Project Page: [https://stdstu12.github.io/YUME-Project/](https://stdstu12.github.io/YUME-Project/)\
20
- GitHub Repository: [https://github.com/stdstu12/YUME](https://github.com/stdstu12/YUME)
 
 
21
 
22
  ## Usage
23
 
24
- For detailed instructions and full inference scripts, please refer to the [GitHub repository](https://github.com/stdstu12/YUME).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
2
  license: apache-2.0
3
+ pipeline_tag: image-to-video
4
+ tags:
5
+ - Text-to-Video
6
+ - Image-to-Video
7
+ - Diffusion Video Model
8
+ - World Model
9
  ---
10
 
11
+ # Yume-1.5: A Text-Controlled Interactive World Generation Model
12
 
13
+ Yume-1.5 is a framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt. It supports keyboard-based exploration of the generated environments through a framework that integrates context compression and real-time streaming acceleration.
 
 
14
 
15
+ - [**Paper (Yume-1.5)**](https://huggingface.co/papers/2512.22096)
16
+ - [**Paper (Yume-1.0)**](https://huggingface.co/papers/2507.17744)
17
+ - [**Project Page**](https://stdstu12.github.io/YUME-Project/)
18
+ - [**GitHub Repository**](https://github.com/stdstu12/YUME)
19
 
20
+ ## Features
21
+ - **Long-video generation**: Unified context compression with linear attention.
22
+ - **Real-time acceleration**: Powered by bidirectional attention distillation.
23
+ - **Text-controlled events**: Method for generating specific world events via text prompts.
24
 
25
  ## Usage
26
 
27
+ For detailed installation and setup instructions, please refer to the [GitHub repository](https://github.com/stdstu12/YUME).
28
+
29
+ ### Inference Example
30
+ To perform image-to-video generation using the provided scripts:
31
+
32
+ ```bash
33
+ # Generate videos from images in the specified directory
34
+ bash scripts/inference/sample_jpg.sh --jpg_dir="./jpg" --caption_path="./caption.txt"
35
+ ```
36
+
37
+ ## Citation
38
+
39
+ If you use Yume for your research, please cite the following:
40
+
41
+ ```bibtex
42
+ @article{mao2025yume,
43
+ title={Yume: An Interactive World Generation Model},
44
+ author={Mao, Xiaofeng and Lin, Shaoheng and Li, Zhen and Li, Chuanhao and Peng, Wenshuo and He, Tong and Pang, Jiangmiao and Chi, Mingmin and Qiao, Yu and Zhang, Kaipeng},
45
+ journal={arXiv preprint arXiv:2507.17744},
46
+ year={2025}
47
+ }
48
+ ```