Update README.md
Browse files
README.md
CHANGED
|
@@ -1,20 +1,43 @@
|
|
| 1 |
-
# WALL-OSS
|
| 2 |
|
| 3 |
<div align="left">
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
[](https://huggingface.co/x-square-robot)
|
|
|
|
| 7 |
[](https://github.com/X-Square-Robot/wall-x)
|
|
|
|
| 8 |
[](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
|
| 9 |
|
| 10 |
</div>
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
| 13 |
|
| 14 |
We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
|
| 15 |
Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
|
| 16 |
Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## 🚀 Quick Start
|
| 19 |
|
| 20 |
### Installation
|
|
@@ -41,7 +64,7 @@ import torch
|
|
| 41 |
from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
|
| 42 |
|
| 43 |
# Load the model
|
| 44 |
-
model_path = "X-Square-Robot/wall-oss-
|
| 45 |
model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
|
| 46 |
model.eval()
|
| 47 |
|
|
|
|
| 1 |
+
# WALL-OSS
|
| 2 |
|
| 3 |
<div align="left">
|
| 4 |
|
| 5 |
+
<p align="center">
|
| 6 |
+
<img src="assets/logo.png" width="600"/>
|
| 7 |
+
<p>
|
| 8 |
+
|
| 9 |
+
<div align="center">
|
| 10 |
+
|
| 11 |
+
[](https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf)
|
| 12 |
+
|
| 13 |
[](https://huggingface.co/x-square-robot)
|
| 14 |
+
|
| 15 |
[](https://github.com/X-Square-Robot/wall-x)
|
| 16 |
+
|
| 17 |
[](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
|
| 18 |
|
| 19 |
</div>
|
| 20 |
|
| 21 |
+
</div>
|
| 22 |
+
|
| 23 |
+
## <a href="https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf" target="_blank"><strong>WALL-OSS: Igniting VLMs toward the Embodied Space</strong></a>
|
| 24 |
|
| 25 |
We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
|
| 26 |
Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
|
| 27 |
Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
|
| 28 |
|
| 29 |
+
## 🎬 Video Demos
|
| 30 |
+
|
| 31 |
+
<div align="center">
|
| 32 |
+
<video width="80%" controls>
|
| 33 |
+
<source src="https://x2robot.com/api/videos/file/wall-oss_top_720p-1.mp4" type="video/mp4">
|
| 34 |
+
Your browser does not support the video tag.
|
| 35 |
+
</video>
|
| 36 |
+
<p><strong>WALL-OSS in Action: Demonstrating advanced manipulation capabilities and embodied AI performance</strong></p>
|
| 37 |
+
</div>
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
|
| 41 |
## 🚀 Quick Start
|
| 42 |
|
| 43 |
### Installation
|
|
|
|
| 64 |
from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
|
| 65 |
|
| 66 |
# Load the model
|
| 67 |
+
model_path = "X-Square-Robot/wall-oss-fast" # or your local path
|
| 68 |
model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
|
| 69 |
model.eval()
|
| 70 |
|