Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling

Year

2026

Type(s)

Conference articles

Author(s)

Pankayaraj Pathmanathan and Furong Huang

Source

In Main Conference, The 64th Annual Meeting of the Association for Computational Linguistics (ACL), Oral, 2026, 2026

Url

BibTeX

Furong Huang