Gemini 2.5 Flash hit 77% aggressive equilibria; GPT-5.4 Mini hit 70% cooperative with Self-Refine.

Provider identity matters more than model generation; noise robustness gains not statistically significant?

Provider identity matters more than model generation; noise robustness gains not statistically significant.

Agent Frameworks

Claude, Gemini, GPT-5.4 show provider-specific cooperation biases in game theory study

arXiv cs.MA May 29, 2026

⚡New models tested: Gemini 2.5 Flash hits 77% aggression, GPT-5.4 Mini 70% cooperation.

Deep Dive

Researchers at Institución Universitaria Colegio Mayor del Cauca tested whether next-generation LLM agents inherit cooperative biases from earlier models. Using the Iterated Prisoner's Dilemma (IPD) framework across three prompting styles (Default, Prose, Self-Refine) and four population compositions, they evaluated Claude Sonnet 4.6, Gemini 2.5 Flash, Gemini 3.1 Pro, and GPT-5.4 Mini. The study found that cooperative bias persists across providers—nine of twelve model-prompt combinations favored cooperative equilibria in balanced noiseless conditions. However, cross-provider divergence is substantial: Gemini 2.5 Flash reached up to 77% aggressive equilibria under biased conditions, while GPT-5.4 Mini reached 70% cooperative equilibria under Self-Refine prompting.

Support for aggressive capability parity was partial. Self-Refine improved competitive behavior in all models—Claude Sonnet 4.6 achieved the highest Individual Competitive Disposition (ICD) score of 0.913—but Default and Prose prompts showed no systematic narrowing. Noise robustness improved directionally (Claude Sonnet 4.6: ~6 percentage points sensitivity vs. 13 pp for Claude 3.5 Sonnet) but was not statistically significant due to sampling error. The key takeaway: provider identity, not model generation, is the strongest predictor of equilibrium outcomes. Noise remains a universal challenge regardless of model size or vintage.

Key Points

Cooperative bias persists: 9 of 12 model-prompt combos favored cooperation in balanced noiseless conditions.
Provider divergence: Gemini 2.5 Flash hit 77% aggressive equilibria; GPT-5.4 Mini hit 70% cooperative with Self-Refine.
Provider identity matters more than model generation; noise robustness gains not statistically significant.

Why It Matters

LLM behavior varies drastically by provider; deploying agents in multi-agent systems requires provider-specific trust and strategy tuning.

Read Original Article

Claude, Gemini, GPT-5.4 show provider-specific cooperation biases in game theory study

Why It Matters

Related Articles

🚀 Stay Ahead in AI