Shalfunnn commited on
Commit
bd76a4d
·
verified ·
1 Parent(s): 621bf20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -4
README.md CHANGED
@@ -1,20 +1,43 @@
1
- # WALL-OSS: Igniting VLMs toward the Embodied Space
2
 
3
  <div align="left">
4
 
5
- [![Paper](https://img.shields.io/badge/Paper-PDF-EA1B22?style=for-the-badge&logo=adobeacrobatreader&logoColor=fff)](https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf)
 
 
 
 
 
 
 
6
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-x--square--robot-FFB000?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/x-square-robot)
 
7
  [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=fff)](https://github.com/X-Square-Robot/wall-x)
 
8
  [![Project Page](https://img.shields.io/badge/Project-1E90FF?style=for-the-badge&logo=google-chrome&logoColor=fff)](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
9
 
10
  </div>
11
 
12
- ## 🤖 Model Description
 
 
13
 
14
  We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
15
  Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
16
  Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## 🚀 Quick Start
19
 
20
  ### Installation
@@ -41,7 +64,7 @@ import torch
41
  from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
42
 
43
  # Load the model
44
- model_path = "X-Square-Robot/wall-oss-flow" # or your local path
45
  model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
46
  model.eval()
47
 
 
1
+ # WALL-OSS
2
 
3
  <div align="left">
4
 
5
+ <p align="center">
6
+ <img src="assets/logo.png" width="600"/>
7
+ <p>
8
+
9
+ <div align="center">
10
+
11
+ [![Paper](https://img.shields.io/badge/📄%20Paper-PDF-EA1B22?style=for-the-badge&logo=adobeacrobatreader&logoColor=fff)](https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf)
12
+ &nbsp;&nbsp;
13
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-x--square--robot-FFB000?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/x-square-robot)
14
+ &nbsp;&nbsp;
15
  [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=fff)](https://github.com/X-Square-Robot/wall-x)
16
+ &nbsp;&nbsp;
17
  [![Project Page](https://img.shields.io/badge/Project-1E90FF?style=for-the-badge&logo=google-chrome&logoColor=fff)](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
18
 
19
  </div>
20
 
21
+ </div>
22
+
23
+ ## <a href="https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf" target="_blank"><strong>WALL-OSS: Igniting VLMs toward the Embodied Space</strong></a>
24
 
25
  We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
26
  Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
27
  Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
28
 
29
+ ## 🎬 Video Demos
30
+
31
+ <div align="center">
32
+ <video width="80%" controls>
33
+ <source src="https://x2robot.com/api/videos/file/wall-oss_top_720p-1.mp4" type="video/mp4">
34
+ Your browser does not support the video tag.
35
+ </video>
36
+ <p><strong>WALL-OSS in Action: Demonstrating advanced manipulation capabilities and embodied AI performance</strong></p>
37
+ </div>
38
+
39
+
40
+
41
  ## 🚀 Quick Start
42
 
43
  ### Installation
 
64
  from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
65
 
66
  # Load the model
67
+ model_path = "X-Square-Robot/wall-oss-fast" # or your local path
68
  model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
69
  model.eval()
70