Quick Overview: This is a presentation on Representation Engineering by Join the Regional Asia Group as they host Improving Alignment &Robustness w/Circuit Breakers:

Andy Zou Top Down Interpretability - Detailed Overview & Context

This is a presentation on Representation Engineering by Join the Regional Asia Group as they host Improving Alignment &Robustness w/Circuit Breakers: Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models ... This talk was recorded at NDC AI in Oslo, Norway. Attend the next NDC ... [MERL Seminar Series Spring 2025] Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnerabilities

Photo Gallery

Andy Zou – Top-Down Interpretability for AI Safety [Alignment Workshop]
AI safety: Universal and Transferable Attacks on Aligned Language Models
Representation Engineering
Andy Zou - Universal and Transferable Adversarial Attacks on Aligned Language Modelsproject page
Improving Alignment &Robustness w/Circuit Breakers:  Andy Zou
Universal Jailbreaks with Zico Kolter, Andy Zou, and Asher Trockman
Interpretability vs. Explainability in Machine Learning
Between the Layers– Interpreting Large Language Models - Michelle Frost - NDC AI 2025
[MERL Seminar Series Spring 2025] Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnera...
Sponsored
Sponsored
View Main Result
Sponsored
Sponsored