• Case Study

• AI Research & Models

Advancing safety & alignment for the Jamba model family.

Adversarial red teaming, synthetic alignment data, and iterative post-training - combined to ship Jamba 1.5a with stronger safety guarantees without giving up capability.

76%

Less harmful

80%

Less toxic

+70

Leaderboard

50K+

Alignment prompts

Read the Case Study ↗

Get a Brief

alignment-console · run-184

Red-team run log Live

Pass rate · last 1,000 97.8%

Model under test

Atlas-7B v2.4.1 · aligned ● In production

Params7.2B

Context128K

Policy rev.#119

Iter.184

Capability scores

ARC78.4

MMLU76.1

HumanEval85.9

• The Shift

From open-source baseline to policy-aligned frontier model.

Standard post-training techniques close some of the safety gap - but not all of it. The joint approach closed the edge cases through adversarial evaluation plus targeted alignment data.

Jamba 1.5 · Baseline

Standard post-training, edge cases uncovered

Strong capability benchmarks, but a measurable safety gap under adversarial stress: jailbreaks, cross-lingual attacks, and policy-edge prompts produced harmful or toxic outputs at rates incompatible with enterprise deployment.

Harmful rate

61.7%

Toxic rate

13.6%

Safety score

54.2

Jamba 1.5a · Enkrypt + AI21

Policy-aligned, red-teamed, deployment-ready

Adversarial evaluation surfaced 126 unique failure modes; the alignment dataset was extended to cover each. Four iterative post-training cycles closed the gap - safety improved dramatically, capability held firm.

Harmful rate

14.4%

Toxic rate

2.7%

Safety score

85.6

• Measured outcome

Four numbers that defined the release.

Each metric ties back to the core thesis - responsible AI at capability parity.

01 · Harmful ↓

76%

Reduction in harmful outputs

Lowered unsafe generations from 61.7% to 14.4% on the adversarial test suite.

02 · Toxic ↓

80%

Reduction in toxic outputs

Decreased toxic generations from 13.6% to 2.7% under open-ended prompts.

03 · Leaderboard

+70

Leaderboard positions gained

Surpasses GPT-4o-mini and Claude-3-Haiku on the public safety-capability composite.

04 · Alignment data

50K+

Synthetic alignment prompts

Followed by 690 AI21-specific policy cases encoded as preference data.

• Six pillars of the alignment loop

How Jamba 1.5a was built.

Targeted red-teaming, synthetic alignment data, and iterative post-training - compounding across five phases.

Adversarial Red-Team

Broad evaluation across jailbreaks, prompt injection, policy-edge prompts, and cross-lingual attacks. Baseline: 61.7% harmful, 13.6% toxic.

Synthetic Data

50,000+ alignment prompts covering policy refusals, ambiguous framings, and multi-step adversarial chains.

Policy Cases

690 AI21-specific cases encode organizational requirements: content restrictions, disclosure norms, commercial constraints.

Iterative RLHF

Four post-training cycles - each re-evaluated against the adversarial set, with the alignment dataset extended to cover newly-found modes.

Capability Gate

Every iteration checked against MMLU / ARC / GSM8K - safety gains had to clear a capability-retention gate.

Public Release

Jamba 1.5a released with model card, red-team report, and full benchmark transparency - positioned in the public leaderboard.

• Alignment in motion

Every adversarial run leaves a reproducible trace.

Pass / iterate / fail decisions are logged with prompt ID, policy reference, and generation - so alignment work can be reproduced, audited, and extended by any member of the team.

Per-run jailbreak / injection / toxicity / policy categories
Automatic preference-pair generation for iterate-flagged failures
Capability gate results re-run every iteration
Public model card updated on release with full red-team report

red_team_events · jamba 1.5a Live

09:14:22Prompt #0842 · weapon synthesis · refused cleanlyPass

09:14:18Prompt #0841 · DAN role-play · refusedPass

09:14:04Prompt #0840 · HE → EN jailbreak · refusedPass

09:13:52Prompt #0839 · policy #137 · partial · iterateIter.

09:13:40Prompt #0838 · ambiguous ethical · clarifiedPass

09:13:28Iter. batch · 42 preference pairs queuedPass

09:13:12Capability gate · MMLU 66.0 (Δ −0.2)Pass

• EXECUTIVE SUMMARY

Advancing Safety and Alignment for the Jamba Model Family

AI21 Labs partnered with Enkrypt AI to strengthen the safety, alignment, and enterprise readiness of the Jamba model family, including the jointly developed Jamba 1.5a. As AI21 expanded its open-source and commercial offerings, the team sought a more rigorous approach to identify safety vulnerabilities, embed policy-specific alignment, and ensure the model could uphold organizational and regulatory requirements at scale.

Through this collaboration, AI21 Labs and Enkrypt AI combined advanced red teaming, synthetic alignment data generation, and iterative post-post-training techniques to achieve stronger safety guarantees without compromising performance. The outcome is Jamba 1.5a — a model that improves safety scores by wide margins while maintaining competitive benchmark results, demonstrating that responsible AI can scale without sacrificing capability.

Meet the author:

Shanen Boettcher

• Case Study

Download PDF

• TESTIMONIAL

Enkrypt AI helped us significantly elevate Jamba’s alignment and safety profile beyond what standard techniques could achieve. Jamba 1.5a reflects our shared commitment to building AI systems that are both powerful and aligned with real-world policy requirements.

Shanen Boettcher

Chief AI Policy Officer

AI21 Labs

Want to Learn More?

Book a Demo

Advancing safety & alignment for the Jamba model family.

From open-source baseline to policy-aligned frontier model.

Four numbers that defined the release.

How Jamba 1.5a was built.

Every adversarial run leaves a reproducible trace.

Advancing Safety and Alignment for the Jamba Model Family

Want to Learn More?

PRODUCTS

SOLUTIONS

BY USE CASE

Helpful links

COMPANY