Belief2-Attention uses both perpendicular and projected components from softmax-weighted V vectors, not just the residual signal?

Belief2-Attention uses both perpendicular and projected components from softmax-weighted V vectors, not just the residual signal.

The projected component is processed via an activation function and linear mapping, forming a two-layer FFN inside the attention block?

The projected component is processed via an activation function and linear mapping, forming a two-layer FFN inside the attention block.

A new inner-product matrix ZZ^T is added to QK^T to capture richer token correlations, tested on image classification and segmentation?

A new inner-product matrix ZZ^T is added to QK^T to capture richer token correlations, tested on image classification and segmentation.

Research & Papers

Belief2-Attention boosts vision models with dual-component attention

arXiv cs.CV June 02, 2026

⚡New mechanism uses both perpendicular and projected signals for richer token correlation.

Deep Dive

In a new arXiv preprint, researcher Guoqiang Zhang introduces Belief2-Attention, a refined attention mechanism for vision tasks that builds on the earlier Belief-Attention framework. The original Belief-Attention performed an orthogonal projection of the softmax-weighted summation of V vectors onto the original V vectors, using the perpendicular component as a residual signal. Zhang's ablation study reveals that the projected component also carries significant token correlation information that was previously discarded.

Belief2-Attention addresses this by utilizing both components. The projected component is processed through an activation function and a linear mapping before being merged back into the token representation. This effectively turns the projected pathway into a two-layer feedforward network embedded within the attention block itself. Furthermore, the mechanism introduces an additional inner-product matrix ZZ^T alongside the standard QK^T to capture richer pairwise token relationships. Zhang demonstrates mathematically that Belief2-Attention is more expressive than standard attention.

The proposed method was empirically validated on image classification and segmentation benchmarks, showing consistent improvements over both standard attention and the original Belief-Attention. While specific performance numbers are not detailed in the abstract, the paper claims effectiveness across these core vision tasks. This work points toward a more complete utilization of attention outputs, potentially reducing information loss in transformer-based vision models.

Key Points

Belief2-Attention uses both perpendicular and projected components from softmax-weighted V vectors, not just the residual signal.
The projected component is processed via an activation function and linear mapping, forming a two-layer FFN inside the attention block.
A new inner-product matrix ZZ^T is added to QK^T to capture richer token correlations, tested on image classification and segmentation.

Why It Matters

More expressive attention means higher accuracy in vision tasks like autonomous driving and medical imaging.

Read Original Article

Belief2-Attention boosts vision models with dual-component attention

Why It Matters

Related Articles

🚀 Stay Ahead in AI