Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs

Year

2026

Type(s)

Conference articles

Author(s)

James Beetham and Souradip Chakraborty and Mengdi Wang and Furong Huang and Amrit Singh Bedi and Mubarak Shah

Source

In 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026, 2026

Url

BibTeX

Furong Huang