A red team study on CBRN capabilities among frontier models
Models Featured From - Anthropic, OpenAI, Meta, Cohere, Mistral
First systematic evaluation reveals alarming vulnerabilities in leading AI models' CBRN safety measures
Our comprehensive red team study evaluated 10 frontier AI models against a novel dataset covering Chemical, Biological, Radiological, and Nuclear (CBRN) domains. The findings expose critical safety gaps that pose immediate risks to global security.
Shocking Discoveries:
- Safety mechanisms are fundamentally brittle - persona-based attacks achieve 81.7% success vs. 38.2% for direct queries
- Extreme performance disparity across industry - attack success rates range from 18.9% to 84.3% between leading models
- Alarmingly high direct query success - some models provide dangerous CBRN information 83% of the time when directly asked
- Enhancement query catastrophe - 8 out of 10 models show >70% attack success rates, reaching 92.9% in worst cases
- Clear industry leaders and laggards identified through rigorous NIST AI Risk Management Framework methodology