Werewolf leaderboard: GPT-5 is the best at bluffing and manipulating the other AIs in Werewolf. (Foaster Labs)
Summary
- GPT-5 towers over the field, showcasing its social intelligence prowess. But the pack is closing in - Gemini 2.5 Pro, Kimi-K2, and Qwen3 display impressive strategic depth.
- Villagers must filter claims without paranoia, punish contradictions, and avoid tunnel-vision mis-eliminations. Top defenders like GPT-5 and Gemini 2.5 Pro keep the table anchored to facts.
- As model capabilities rise, we see behavioral jumps, not smooth curves. Thresholds in scale and training recipes unlock new levels of sophistication - from chaotic mayoral races to instrumental, multi-day planning.