AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment Attacks

Year
2026
Type(s)
Author(s)
Pankayaraj Pathmanathan and Udari Madhushani Sehwag and Michael-Andrei Panaitescu-Liess and Cho-Yu Jason Chiang and Furong Huang
Source
In AAAI 2026 AI Alignment Track (AAAI), Oral, 2026, 2026
Url
https://arxiv.org/abs/2410.11283
BibTeX
BibTeX