AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Year

2023

Type(s)

Conference articles

Author(s)

Sicheng Zhu and Ruiyi Zhang and Bang An and Gang Wu and Joe Barrow and Zichao Wang and Furong Huang and Ani Nenkova and Tong Sun

Source

In Workshop on Socially Responsible Language Modelling Research (SoLaR), NeurIPS 2023, 2023

BibTeX

Furong Huang