Is poisoning a real threat to LLM alignment? Maybe more so than you think

Year

2024

Type(s)

Conference articles

Author(s)

Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang

Source

In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, ICML 2024, 2024

BibTeX

Furong Huang