AI Assurance

Capability doesn't predict responsibility

An open, reproducible index of responsible-AI benchmarks across seventeen frontier models, and what it says about the assumption that stronger models are safer. More capable does not reliably mean more responsible. I built Raidex to test the assumption that a stronger model is better on every axis, responsibility included, and the data did not support it. The gap shows up across seventeen frontier models, and it holds even within a single lab's own lineup, where a newer, more

Vishnu Vettrivel

6 hours ago8 min read