PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach

Year
2026
Type(s)
Author(s)
Udari Madhushani Sehwag and Shayan Shabihi and Alex McAvoy and Vikash Sehwag and Yuancheng Xu and Dalton Towers and Furong Huang
Source
In The Fourteenth International Conference on Learning Representations (ICLR), 2026, 2026
Url
https://arxiv.org/abs/2511.20703
BibTeX
BibTeX