Research

Research Insights

From academic research to production-grade AI safety.

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models