Vision-only manipulation is hitting a wall

by Vincent Duchaine. Last updated on Apr 30, 2026
Posted on Apr 30, 2026 in Physical AI
3 min read time

In 2016, I said something that went against where robotics was heading at the time: vision alone doesn’t work for grasping.

Not “it needs improvement.” Not “the tech isn’t there yet.” It doesn’t fit the problem.

Grasping is physical. Contact, force, friction. Vision can guide the approach. It can’t feel what happens next.

Back then, we saw it in the lab. Tactile vibration data predicted grasp failure with 83% accuracy and detected slip at 92%. Early results, but clear enough. The signals that matter don’t show up in images.

Ten years later, the rest of the field is running into the same limit.

Vision gets you close

Vision still matters. It handles detection, positioning and planning. It gets the robot to the right place, lined up the right way.

It does that well, but manipulation doesn’t stop when the gripper reaches the object.

That’s where things break.

What happens at contact isn’t visible

Before contact, the robot is working off images.

After contact, it’s dealing with forces.

A bad grasp doesn’t start as a visual change. It shows up as a shift in force. Slip starts in the fingertips before anything moves enough to see. Too much pressure shows up in the wrist before the object deforms.

By the time a camera picks up a problem, it’s already happening.

Vision sees outcomes. Contact sensing measures interaction as it happens.

And the useful data lives right there, at the moment of contact.

The evidence is already there

This isn’t a theory anymore.

Tactile-driven policies beat vision-only ones on tasks that involve force. Benchmarks like ManiSkill-ViTac show better performance when you combine vision with tactile input, especially in insertion and assembly. Models like π0, OpenVLA, and Octo depend on synchronized inputs from multiple sensors. Remove force or tactile data, and performance drops.

No one is replacing vision. They’re adding what’s missing.

The strongest systems today combine vision, proprioception, force, and touch into a single model.

That’s what moves performance.

Vision has already given most of what it can

Vision still carries a lot of the system. But it doesn’t solve the hard part.

Physical AI improves with more data, but not all data matters the same. Force and tactile signals have an outsized impact on how well a system handles real contact.

Most datasets still lean heavily on vision and joint data.

So you see the same pattern over and over. Robots reach the right position. Then struggle with insertion, assembly, and anything that depends on compliance.

The missing information is physical.

Tactile data hasn’t scaled yet

Collecting good contact data hasn’t been easy. You need instrumented end effectors, reliable force and tactile sensors, tight synchronization, and consistent formats.

That’s a hardware problem as much as a modelling one.

Until recently, the infrastructure wasn’t there.

Now it is.

The bottleneck is how fast teams can deploy it and start collecting data.

Closing the loop

What started as a claim in 2016 is now showing up everywhere.

Robots that only see will keep hitting the same limits. Robots that can feel will start to close the gap.

Vision stays. It’s not going anywhere.

But it won’t carry manipulation on its own. The shift comes from adding the signals that matter at the point of contact.

At Robotiq, our tactile sensors are built to capture those signals directly at the gripper, so robots see and feel what they’re doing.

Physical AI

How Medra built the largest autonomous lab in the United States

Medra Lab 001 is the largest autonomous AI-driven laboratory in the United States, operating continuously with robotics, AI,...

By Marc Giguère - April 28, 2026

Physical AI

Why Physical AI isn't scaling yet, and what's holding it back

Physical AI is advancing quickly.

AI models can now recognize objects, plan actions, and adapt to new tasks. But despite this...

By Linnea Bruce - April 21, 2026

Posts by Tag

Latest Blog Post

Vision-only manipulation is hitting a wall

Vision gets you close

What happens at contact isn’t visible

The evidence is already there

Vision has already given most of what it can

Tactile data hasn’t scaled yet

Closing the loop

Leave a comment

Related posts

How Medra built the largest autonomous lab in the United States

Why Physical AI isn't scaling yet, and what's holding it back

Subscribe

Posts by Tag

Latest Blog Post

Vision-only manipulation is hitting a wall

Vision gets you close

What happens at contact isn’t visible

The evidence is already there

Vision has already given most of what it can

Tactile data hasn’t scaled yet

Closing the loop

Leave a comment

Related posts

How Medra built the largest autonomous lab in the United States

Why Physical AI isn't scaling yet, and what's holding it back