Is poisoning a real threat to DPO? Maybe more so than you think

Year
2025
Type(s)
Author(s)
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Source
In AAAI 2025 AI Alignment Track (AAAI), 2025, 2025
Url
https://arxiv.org/abs/2406.12091
BibTeX
BibTeX