Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away

Year
2026
Type(s)
Author(s)
Soumya Suvra Ghosal and Souradip Chakraborty and Vaibhav Singh and Furong Huang and Dinesh Manocha and Amrit Singh Bedi
Source
In Forty-third International Conference on Machine Learning (ICML), 2026, 2026
Url
https://arxiv.org/abs/2602.11096
BibTeX
BibTeX