Publications

No free lunch with Gaurdrails
Benchmarks show stronger guardrails improve safety but can reduce usability. Paper proposes a framework to balance the trade-offs — ensuring practical, secure LLM deployment.

Investigating Implicit Bias in LLMs
A study of 50+ models reveals that bias persists — and sometimes worsens — in newer models. The work calls for standardized benchmarks to prevent discrimination in real-world AI use.

VERA: Validation & Enhancement for RAG
VERA improves Retrieval-Augmented Generation by refining retrieved context and output, reducing hallucinations and enhancing response quality across open-source and commercial models.

Fine-Tuning, Quantization & Safety
Fine-tuning increases jailbreak vulnerability, while quantization has varied effects. Our analysis emphasizes the role of strong guardrails in deployment.

SAGE-RT Synthetic Red Teaming
SAGE enables scalable, synthetic red-teaming across 1,500+ harmfulness categories — achieving 100% jailbreak success on GPT-4o and GPT-3.5 in key scenarios.