Image & Video

AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction

arXiv eess.IV April 03, 2026

⚡New surgical AI framework predicts exactly where tools should interact with tissue, cutting prediction error by 66%.

Deep Dive

A research team from Johns Hopkins University and Shanghai Jiao Tong University has introduced AffordTissue, a breakthrough AI framework designed to bring surgical automation closer to clinical reality. The system addresses a critical gap in current surgical AI: while existing models can mimic dexterous control, they lack predictability about where instruments should interact with tissue surfaces. AffordTissue solves this by generating dense heatmaps that predict tool-action specific affordance regions—essentially showing surgeons and robotic systems exactly where on tissue each surgical action should occur.

The framework combines three key components: a temporal vision encoder that captures tool motion and tissue dynamics across multiple viewpoints, language conditioning that enables generalization across diverse instrument-action pairs, and a DiT-style decoder for generating precise affordance predictions. The researchers established the first tissue affordance benchmark by curating and annotating 15,638 video clips from 103 cholecystectomy procedures, covering six unique tool-action pairs involving four instruments (hook, grasper, scissors, clipper) and their associated tasks including dissection, grasping, clipping, and cutting.

Experiments demonstrate substantial improvements over existing approaches, with AffordTissue achieving 20.6 pixels in Average Symmetric Surface Distance (ASSD) compared to 60.2 pixels for the Molmo-VLM baseline—a 66% reduction in prediction error. This shows that task-specific architectures can outperform large-scale foundation models for dense surgical affordance prediction. The system's explicit spatial reasoning capabilities provide clear guidance for safe surgical automation, potentially enabling early safe-stop mechanisms when instruments deviate outside predicted safe zones.

Key Points

Predicts surgical tool interaction zones with 20.6px accuracy, 3x better than Molmo-VLM's 60.2px
Trained on 15,638 annotated video clips from 103 gallbladder removal procedures
Combines temporal vision encoding, language conditioning, and DiT-style decoding for precise heatmaps

Why It Matters

Enables safer surgical automation by predicting exact tissue interaction zones, potentially reducing errors in robotic-assisted procedures.

Read Original Article

AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction

Why It Matters

Stay Ahead in AI