Back to Blogs
CONTENT
This is some text inside of a div block.
Subscribe to our newsletter
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thought Leadership

Introducing Safety Aligned DeepSeek R1 Model by Enkrypt AI

Published on
January 31, 2025
4 min read

DeepSeek R1 has made waves in the AI industry, delivering high performance at a fraction of the training cost compared to existing LLMs. This marks a significant leap forward, especially for organizations struggling to justify the ROI of AI adoption. While the model excels in performance benchmarks, our red teaming uncovered critical security vulnerabilities that make it unsuitable for various use cases.

In our latest breakthrough, we leveraged SAGE [1], our state-of-the-art safety alignment data generation technique, to strengthen a distilled version model’s defenses against prompt injection and the generation of toxic or harmful content. Overall risk of the model was reduced by 47%. More details are available in the results section.

These advancements ensure that AI models like DeepSeek R1 can be both high-performing and safe for real-world deployment. The safety aligned deepseek-llama8b-model is available on Huggingface [2] for the community.

How We Did It?

Using Enkrypt AI Red Teaming, we identified vulnerabilities in the model and established baseline risk scores. These insights were then leveraged to generate a targeted safety alignment dataset—a crucial step in training the LLM to “say no” to unsafe or unethical queries. Our alignment data generation algorithm SAGE [1] is a taxonomy-driven synthetic data generation process that produces 51K in-depth prompts across 1,500+ harmfulness categories, enabling robust LLM safety training while maintaining benchmark performance. More advanced readers can read our research paper on SAGE [1] - our technique for safety alignment data generation.

The Results

Comparing AI risk before and after Enkrypt AI Safety Alignment_deepseek_r1model
Comparing AI risk before and after Enkrypt AI Safety Alignment

Enkrypt Aligned DeepSeek-R1-Distill-Llama-8B showed a substantial decrease in risk after alignment. Toxicity of the model reduced by 57% while insecure code generation risk reduced by 77%. Risk of producing Harmful information like self harm, criminal planning or hate speech reduced by 99%. Risk of producing CBRN reduced by 69%. Overall Risk as defined by NIST framework decreased by 47%.

The alignment process also led to a slight increase in performance where the MMLU pro score of the model increased from 44.71 to 46.43.

To contribute to the AI community, we’ve shared the aligned DeepSeek R1 model on Hugging Face[2], ensuring accessible safety improvements for researchers and developers.

Comparison of Aligned Model with other LLMs

Comparing AI risk for Enkrypt aligned Model with other large language models.

In our DeepSeek R1 red teaming report, we compared the model with gpt-4o, o1 and claude-3-opus[3]. The alignment performed on DeepSeek-R1-Distill-Llama-8B has increased the rank of DeepSeek R1 on our Safety Leaderboard from 69 to 12 which makes it safer than gpt-4o, o1-mini and claude-3-haiku. The Overall Risk of the Aligned model is almost similar to o1 with just a difference of 1%. Check our Safety Leaderboard for how aligned DeepSeek model compares against others [5].

For real world usage of the aligned DeepSeek R1 model, it can be paired up with Enkrypt AI Guardrails which can detect and block 99% of attacks, delivering one of the industry's best combinations of performance, cost efficiency, and safety. We are continuously working to make the model even safer by reducing bias and censorship.

A Callout to Model Providers

At Enkrypt AI, we’ve successfully reduced AI safety risks by up to 70% while preserving model performance. We invite other model providers to collaborate with us in aligning AI for safer deployment. If you’re interested in fortifying your models against security vulnerabilities and bias, let’s talk.

Links

[1] SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming

[2] Enkrypt AI Aligned DeepSeek R1 Huggingface

[3] DeepSeek Red Team Report by Enkrypt AI

[4] DeepSeek Under Fire: Uncovering Bias & Censorship from 300 Geopolitical Questions

[5] Enkrypt AI Safety Leaderboard

Meet the Writer
Satbir Singh
Latest posts

More articles

Product Updates

How Guardrails Help Prevent Abuse, Cut Costs and Boost Quality in AI Chatbots

They prevent abuse, keeping both users and companies out of harm’s way. They cut costs by enabling greater automation and avoiding expensive mistakes. They ensure compliance with the laws and regulations that safeguard us all. And they boost quality, leading to better user experiences and trust.
Read post
Industry Trends

A Not-so-Brief Intro to Vision-Language Red Teaming

In our increasingly digital world, the ability to interpret and reason about visual information is as important as understanding text.
Read post
Product Updates

Safely Scaling Generative AI: Policy-Driven Approach for Enterprise Compliance

generic AI solutions, which might treat safety as a static checklist or a bolt-on filter, Enkrypt AI enables safety at scale — a modular architecture where every component “speaks” the same policy language and can adapt as your needs evolve.
Read post