Quick Overview: This is a presentation on Representation Engineering by Join the Regional Asia Group as they host Improving Alignment &Robustness w/Circuit Breakers:
Andy Zou Top Down Interpretability - Detailed Overview & Context
This is a presentation on Representation Engineering by Join the Regional Asia Group as they host Improving Alignment &Robustness w/Circuit Breakers: Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models ... This talk was recorded at NDC AI in Oslo, Norway. Attend the next NDC ... [MERL Seminar Series Spring 2025] Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnerabilities