Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs

Year
2026
Type(s)
Author(s)
James Beetham and Souradip Chakraborty and Mengdi Wang and Furong Huang and Amrit Singh Bedi and Mubarak Shah
Source
In 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026, 2026
Url
https://aclanthology.org/2026.eacl-long.360.pdf
BibTeX
BibTeX