AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Year
2023
Type(s)
Author(s)
Sicheng Zhu and Ruiyi Zhang and Bang An and Gang Wu and Joe Barrow and Zichao Wang and Furong Huang and Ani Nenkova and Tong Sun
Source
In Workshop on Socially Responsible Language Modelling Research (SoLaR), NeurIPS 2023, 2023
BibTeX
BibTeX