Furong Huang
Associate Professor @ University of Maryland
Home
Publications
Research
Project Page Highlights
Students
Teaching
Blog
Contact
Students on the Job Market
Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling
Publications
Year
2026
Type(s)
Conference articles
Author(s)
Pankayaraj Pathmanathan and Furong Huang
Source
In Main Conference, The 64th Annual Meeting of the Association for Computational Linguistics (ACL), Oral, 2026, 2026
Url
https://arxiv.org/abs/2507.06419
BibTeX
BibTeX
BibTeX
@inproceedings{pathmanathan2026teachrewardmodel, title = {{Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling}}, author = {Pankayaraj Pathmanathan and Furong Huang}, booktitle = {Main Conference, The 64th Annual Meeting of the Association for Computational Linguistics (ACL), Oral, 2026}, year = {2026}, url = {https://arxiv.org/abs/2507.06419}, note = {
arXiv
Code
},