Saachi Jain

Saachi Jain

CV (last updated March 2023): here
Google Scholar: here
Github: @scoutsaachi
Twitter: @saachi_jain_

my face

About

I lead the safety training team at OpenAI! Our team trains the models that we ship to be both safe and reliable. Our work spans fundamental research on teaching the model to reason about safety, improving reliability and calibration, agentic safety, and model deception.

I did my PhD at MIT advised by Aleksander Mądry, focusing particularly understanding how modern deep learning methods represent data and to identify the biases that they create. In the past, I've worked on problems in graph theory and the analysis of time series data.

My research was supported by the Two Sigma Diversity PhD Fellowship and the Apple Fellowship. I graduated with my BS/MS in Computer Science at Stanford, where I worked with Jure Leskovec and Matei Zaharia. Before starting at MIT, I was a Computer Vision Scientist with Tesla Autopilot.

Selected Papers

From hard refusals to safe-completions: toward output-centric safety training || Blog
Yuan Yuan, Tina Sriskandarajah, Anna-Luisa Brakman, Alec Helyar,Alex Beutel, Andrea Vallone, Saachi Jain

Deliberative Alignment: Reasoning Enables Safer Language Models || Blog
Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, Amelia Glaese

Improving subgroup robustness via data selection || Blog
Saachi Jain*, Kimia Hamidieh*, Kristian Georgiev*, Anderew Ilyas, Marzyeh Ghassemi, Aleksander Madry
Conference on Neural Information Processing Systems (Neurips) 2025

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation || Blog
Joshua Vendrow*, Saachi Jain*, Logan Engstrom, Aleksander Madry.
Data-centric Machine Learning Research (DMLR) Workshop at ICML 2023

Distilling Model Failures as Directions in Latent Space || Blog
Saachi Jain*, Hannah Lawrence*, Ankur Moitra, Aleksander Madry.
International Conference on Learning Representations (ICLR) 2023, Spotlight (Top 25%)

A Data-Based Perspective on Transfer Learning || Blog
Saachi Jain*, Hadi Salman*, Alaa Khaddaj*, Eric Wong, Sung Min Park, Aleksander Madry.
Conference on Computer Vision and Pattern Recognition (CVPR) 2023

When does Bias Transfer in Transfer Learning? || Blog
Hadi Salman*, Saachi Jain*, Andrew Ilyas*, Logan Engstrom*, Eric Wong, Aleksander Madry.
2022

Missingness Bias in Model Debugging || Blog
Saachi Jain*, Hadi Salman*, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry.
International Conference on Learning Representations (ICLR) 2022

Certified Patch Robustness via Smoothed Vision Transformers || Blog
Hadi Salman*, Saachi Jain*, Eric Wong*, Aleksander Madry.
Conference on Computer Vision and Pattern Recognition (CVPR) 2022

Combining Diverse Feature Priors || Blog
Saachi Jain*, Dimitris Tsipras*, Aleksander Madry
International Conference on Machine Learning (ICML) 2022

A Mechanism for Producing Aligned Latent Spaces with Autoencoders
Saachi Jain*, Adityanarayanan Radhakrishnan*, Caroline Uhler (2021)

Spectral Lower Bounds on the I/O Complexity of Computation Graphs.
Saachi Jain and Matei Zaharia
Symposium on Parallelism in Algorithms and Architectures (SPAA) 2020

MASA: Motif-Aware State Assignment in Noisy Time Series Data
Saachi Jain, David Hallac, Rok Sosic, and Jure Leskovec
Workshop on Mining and Learning from Time Series (MiLeTS) at SIGKDD 2019

Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, and Jason Weston
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019