AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models

Year
2024
Type(s)
Author(s)
Zhu, Sicheng, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun.
Source
First Conference on Language Modeling (COLM), 2024.
BibTeX
BibTeX