Back to Blogs
CONTENT
This is some text inside of a div block.
Subscribe to our newsletter
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thought Leadership

Introducing Safety Aligned DeepSeek R1 Model by Enkrypt AI

Published on
January 31, 2025
4 min read

DeepSeek R1 has made waves in the AI industry, delivering high performance at a fraction of the training cost compared to existing LLMs. This marks a significant leap forward, especially for organizations struggling to justify the ROI of AI adoption. While the model excels in performance benchmarks, our red teaming uncovered critical security vulnerabilities that make it unsuitable for various use cases.

In our latest breakthrough, we leveraged SAGE [1], our state-of-the-art safety alignment data generation technique, to strengthen a distilled version model’s defenses against prompt injection and the generation of toxic or harmful content. Overall risk of the model was reduced by 47%. More details are available in the results section.

These advancements ensure that AI models like DeepSeek R1 can be both high-performing and safe for real-world deployment. The safety aligned deepseek-llama8b-model is available on Huggingface [2] for the community.

How We Did It?

Using Enkrypt AI Red Teaming, we identified vulnerabilities in the model and established baseline risk scores. These insights were then leveraged to generate a targeted safety alignment dataset—a crucial step in training the LLM to “say no” to unsafe or unethical queries. Our alignment data generation algorithm SAGE [1] is a taxonomy-driven synthetic data generation process that produces 51K in-depth prompts across 1,500+ harmfulness categories, enabling robust LLM safety training while maintaining benchmark performance. More advanced readers can read our research paper on SAGE [1] - our technique for safety alignment data generation.

The Results

Comparing AI risk before and after Enkrypt AI Safety Alignment_deepseek_r1model
Comparing AI risk before and after Enkrypt AI Safety Alignment

Enkrypt Aligned DeepSeek-R1-Distill-Llama-8B showed a substantial decrease in risk after alignment. Toxicity of the model reduced by 57% while insecure code generation risk reduced by 77%. Risk of producing Harmful information like self harm, criminal planning or hate speech reduced by 99%. Risk of producing CBRN reduced by 69%. Overall Risk as defined by NIST framework decreased by 47%.

The alignment process also led to a slight increase in performance where the MMLU pro score of the model increased from 44.71 to 46.43.

To contribute to the AI community, we’ve shared the aligned DeepSeek R1 model on Hugging Face[2], ensuring accessible safety improvements for researchers and developers.

Comparison of Aligned Model with other LLMs

Comparing AI risk for Enkrypt aligned Model with other large language models.

In our DeepSeek R1 red teaming report, we compared the model with gpt-4o, o1 and claude-3-opus[3]. The alignment performed on DeepSeek-R1-Distill-Llama-8B has increased the rank of DeepSeek R1 on our Safety Leaderboard from 69 to 12 which makes it safer than gpt-4o, o1-mini and claude-3-haiku. The Overall Risk of the Aligned model is almost similar to o1 with just a difference of 1%. Check our Safety Leaderboard for how aligned DeepSeek model compares against others [5].

For real world usage of the aligned DeepSeek R1 model, it can be paired up with Enkrypt AI Guardrails which can detect and block 99% of attacks, delivering one of the industry's best combinations of performance, cost efficiency, and safety. We are continuously working to make the model even safer by reducing bias and censorship.

A Callout to Model Providers

At Enkrypt AI, we’ve successfully reduced AI safety risks by up to 70% while preserving model performance. We invite other model providers to collaborate with us in aligning AI for safer deployment. If you’re interested in fortifying your models against security vulnerabilities and bias, let’s talk.

Links

[1] SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming

[2] Enkrypt AI Aligned DeepSeek R1 Huggingface

[3] DeepSeek Red Team Report by Enkrypt AI

[4] DeepSeek Under Fire: Uncovering Bias & Censorship from 300 Geopolitical Questions

[5] Enkrypt AI Safety Leaderboard

Meet the Writer
Satbir Singh
Latest posts

More articles

Product Updates

How Enkrypt’s Secure MCP Gateway and MCP Scanner Prevent Top Attacks

Enkrypt empowers organizations to secure every layer of their AI agents with advanced MCP protection. Detect and eliminate vulnerabilities like prompt injection and tool poisoning using automated MCP supply chain scanners, and block live attacks with real-time security gateways. Get step-by-step defense insights and actionable configurations to ensure safe, compliant MCP deployments.
Read post
Industry Trends

MCP Security Vulnerabilities: Attacks, Detection, and Prevention

Discover the 13 most critical security vulnerabilities in Model Context Protocol (MCP) implementations—from prompt injection to supply-chain attacks. Learn how to detect, prevent, and mitigate these threats using MCP Gateway with Guardrails, MCP Scanner, and MCP Registry for a secure AI ecosystem.
Read post
EnkryptAI

Enkrypt AI Recognized as a Gartner® Cool Vendor in AI Security 2025

Enkrypt AI has been recognized as a Gartner Cool Vendor in AI Security 2025 for its groundbreaking real-time guardrails and agent safety innovations across text, image, and voice. Discover how Enkrypt AI empowers enterprises to adopt AI securely, with confidence and compliance at scale.
Read post