Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities

Year

2025

Type(s)

Conference articles

Author(s)

Zora Che and Stephen Casper and Robert Kirk and Anirudh Satheesh and Stewart Slocum and Lev E McKinney and Rohit Gandikota and Aidan Ewart and Domenic Rosati and Zichu Wu and Zikui Cai and Bilal Chughtai and Yarin Gal and Furong Huang and Dylan Hadfield-Menell

Source

In Transactions on Machine Learning Research (TMLR), 2025, 2025

Url

https://arxiv.org/abs/2502.05209

BibTeX

Furong Huang

Associate Professor @ University of Maryland

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities

BibTeX

Where Has Furong Been? Behind the Scenes of Our NeurIPS Competition

Past News

NeurIPS ’22 Main Conference Papers from Huang Lab @UMD