Is poisoning a real threat to LLM alignment? Maybe more so than you think

Year
2024
Type(s)
Author(s)
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Source
In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, ICML 2024, 2024
BibTeX
BibTeX