Case Study
 •  AI Research & Models
AI21 labs
x

Advancing safety & alignment for the Jamba model family.

Adversarial red teaming, synthetic alignment data, and iterative post-training - combined to ship Jamba 1.5a with stronger safety guarantees without giving up capability.
76%
Less harmful
80%
Less toxic
+70
Leaderboard
50K+
Alignment prompts
alignment-console · run-184
Red-team run log Live
Pass rate · last 1,000 97.8%
Model under test
Atlas-7B v2.4.1 · aligned ● In production
Params7.2B
Context128K
Policy rev.#119
Iter.184
Capability scores
ARC78.4
MMLU76.1
HumanEval85.9
 The Shift

From open-source baseline to policy-aligned frontier model.

Standard post-training techniques close some of the safety gap - but not all of it. The joint approach closed the edge cases through adversarial evaluation plus targeted alignment data.

Jamba 1.5 · Baseline
Standard post-training, edge cases uncovered
Strong capability benchmarks, but a measurable safety gap under adversarial stress: jailbreaks, cross-lingual attacks, and policy-edge prompts produced harmful or toxic outputs at rates incompatible with enterprise deployment.
Harmful rate
61.7%
Toxic rate
13.6%
Safety score
54.2
Jamba 1.5a · Enkrypt + AI21
Policy-aligned, red-teamed, deployment-ready
Adversarial evaluation surfaced 126 unique failure modes; the alignment dataset was extended to cover each. Four iterative post-training cycles closed the gap - safety improved dramatically, capability held firm.
Harmful rate
14.4%
Toxic rate
2.7%
Safety score
85.6
 Measured outcome

Four numbers that defined the release.

Each metric ties back to the core thesis - responsible AI at capability parity.

01 · Harmful ↓
76%
Reduction in harmful outputs
Lowered unsafe generations from 61.7% to 14.4% on the adversarial test suite.
02 · Toxic ↓
80%
Reduction in toxic outputs
Decreased toxic generations from 13.6% to 2.7% under open-ended prompts.
03 · Leaderboard
+70
Leaderboard positions gained
Surpasses GPT-4o-mini and Claude-3-Haiku on the public safety-capability composite.
04 · Alignment data
50K+
Synthetic alignment prompts
Followed by 690 AI21-specific policy cases encoded as preference data.
 Six pillars of the alignment loop

How Jamba 1.5a was built.

Targeted red-teaming, synthetic alignment data, and iterative post-training - compounding across five phases.

Adversarial Red-Team
Broad evaluation across jailbreaks, prompt injection, policy-edge prompts, and cross-lingual attacks. Baseline: 61.7% harmful, 13.6% toxic.
Synthetic Data
50,000+ alignment prompts covering policy refusals, ambiguous framings, and multi-step adversarial chains.
Policy Cases
690 AI21-specific cases encode organizational requirements: content restrictions, disclosure norms, commercial constraints.
Iterative RLHF
Four post-training cycles - each re-evaluated against the adversarial set, with the alignment dataset extended to cover newly-found modes.
Capability Gate
Every iteration checked against MMLU / ARC / GSM8K - safety gains had to clear a capability-retention gate.
Public Release
Jamba 1.5a released with model card, red-team report, and full benchmark transparency - positioned in the public leaderboard.
 Alignment in motion

Every adversarial run leaves a reproducible trace.

Pass / iterate / fail decisions are logged with prompt ID, policy reference, and generation - so alignment work can be reproduced, audited, and extended by any member of the team.
  • Per-run jailbreak / injection / toxicity / policy categories
  • Automatic preference-pair generation for iterate-flagged failures
  • Capability gate results re-run every iteration
  • Public model card updated on release with full red-team report
red_team_events · jamba 1.5a Live
09:14:22Prompt #0842 · weapon synthesis · refused cleanlyPass
09:14:18Prompt #0841 · DAN role-play · refusedPass
09:14:04Prompt #0840 · HE → EN jailbreak · refusedPass
09:13:52Prompt #0839 · policy #137 · partial · iterateIter.
09:13:40Prompt #0838 · ambiguous ethical · clarifiedPass
09:13:28Iter. batch · 42 preference pairs queuedPass
09:13:12Capability gate · MMLU 66.0 (Δ −0.2)Pass
 EXECUTIVE SUMMARY

Advancing Safety and Alignment for the Jamba Model Family

AI21 Labs partnered with Enkrypt AI to strengthen the safety, alignment, and enterprise readiness of the Jamba model family, including the jointly developed Jamba 1.5a. As AI21 expanded its open-source and commercial offerings, the team sought a more rigorous approach to identify safety vulnerabilities, embed policy-specific alignment, and ensure the model could uphold organizational and regulatory requirements at scale.

Through this collaboration, AI21 Labs and Enkrypt AI combined advanced red teaming, synthetic alignment data generation, and iterative post-post-training techniques to achieve stronger safety guarantees without compromising performance. The outcome is Jamba 1.5a — a model that improves safety scores by wide margins while maintaining competitive benchmark results, demonstrating that responsible AI can scale without sacrificing capability.

Meet the author:

Shanen Boettcher
 Case Study
Download PDF
 TESTIMONIAL
Enkrypt AI helped us significantly elevate Jamba’s alignment and safety profile beyond what standard techniques could achieve. Jamba 1.5a reflects our shared commitment to building AI systems that are both powerful and aligned with real-world policy requirements.
Shanen Boettcher
Chief AI Policy Officer
|
AI21 Labs

Want to Learn More?