A red team study on CBRN capabilities among frontier models
Models Featured From - Anthropic, OpenAI, Meta, Cohere, Mistral
First systematic evaluation reveals alarming vulnerabilities in leading AI models' CBRN safety measures
Our comprehensive red team study evaluated 10 frontier AI models against a novel dataset covering Chemical, Biological, Radiological, and Nuclear (CBRN) domains. The findings expose critical safety gaps that pose immediate risks to global security.Get our latest Red Teaming report on this popular Databricks LLM. We conduct rigorous security tests to detect vulnerabilities like malware and injection attacks, while also evaluating model integrity by assessing biases, toxicity, and hallucinations, ensuring alignment with regulatory standards and brand values.
Shocking Discoveries:
- Safety mechanisms are fundamentally brittle - persona-based attacks achieve 81.7% success vs. 38.2% for direct queries
- Extreme performance disparity across industry - attack success rates range from 18.9% to 84.3% between leading models
- Alarmingly high direct query success - some models provide dangerous CBRN information 83% of the time when directly asked
- Enhancement query catastrophe - 8 out of 10 models show >70% attack success rates, reaching 92.9% in worst cases
- Clear industry leaders and laggards identified through rigorous NIST AI Risk Management Framework methodology