4 of 17 real robot tasks achieved >80% task progress zero-shot with the pretrained checkpoint?

4 of 17 real robot tasks achieved >80% task progress zero-shot with the pretrained checkpoint.

Included a deformable rope tightening task not seen during pretraining, showing generalization?

Included a deformable rope tightening task not seen during pretraining, showing generalization.

Model retains general vision/language capabilities while improving embodied grounding, avoiding policy narrowing?

Model retains general vision/language capabilities while improving embodied grounding, avoiding policy narrowing.

Media & Culture

Wall-OSS-0.5 open-weights VLA hits 80%+ zero-shot on real robot tasks

r/Singularity May 31, 2026

⚡Open-source robot model nails 4 of 17 tasks without any fine-tuning.

Deep Dive

X-Square-Robot released Wall-OSS-0.5, an open-weights Vision Language Action (VLA) model that directly evaluates its pretrained checkpoint on real robot tasks without any fine-tuning. In the demo reel, every clip carries an "Autonomous w/o Fine-Tuning" watermark, showing tasks like opening a pot lid, dropping fruit, covering blocks, sorting by color, and putting drinks in specified containers. The model achieves >80% task progress on 4 of 17 real robot tasks zero-shot, including a deformable rope tightening task that was not part of the pretraining set. Task progress curves show pretraining improvement tracking held-out tasks alongside seen tasks, addressing a key metric the embodied AI community has requested.

Unlike typical humanoid demos that only show results after task-specific tuning, Wall-OSS-0.5 is designed to measure the pretrained checkpoint's raw capability. The model appears to retain general image/language ability while gaining embodied grounding, avoiding the narrowing often seen in robot policies. However, harder tasks like towel folding, charger insertion, and table setting remain near zero in zero-shot performance, indicating pretraining alone isn't a silver bullet. The true test will be whether outside groups can run the checkpoint on their own robotic arms and replicate similar strengths and failures. Code, paper, and Hugging Face weights are publicly available for replication.

Key Points

4 of 17 real robot tasks achieved >80% task progress zero-shot with the pretrained checkpoint.
Included a deformable rope tightening task not seen during pretraining, showing generalization.
Model retains general vision/language capabilities while improving embodied grounding, avoiding policy narrowing.

Why It Matters

Open-source VLA with measurable pretraining performance could drive reproducibility and faster iteration in embodied AI research.

Read Original Article

Wall-OSS-0.5 open-weights VLA hits 80%+ zero-shot on real robot tasks

Why It Matters

Related Articles

🚀 Stay Ahead in AI