GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
In The Thirteenth International Conference on Learning Representations (ICLR), 2025, 2025
Yuancheng Xu and Udari Madhushani Sehwag and Alec Koppel and Sicheng Zhu and Bang An and Furong Huang and Sumitra Ganesh
Publisher's website
Yuancheng Xu and Udari Madhushani Sehwag and Alec Koppel and Sicheng Zhu and Bang An and Furong Huang and Sumitra Ganesh
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security
In Workshop BuildingTrust Workshop, ICLR 2025, 2025
Zikui Cai and Shayan Shabihi and Bang An and Zora Che and Brian R. Bartoldson and Bhavya Kailkhura and Tom Goldstein and Furong Huang
Publisher's website
Zikui Cai and Shayan Shabihi and Bang An and Zora Che and Brian R. Bartoldson and Bhavya Kailkhura and Tom Goldstein and Furong Huang
CSRec: Rethinking Sequential Recommendation from A Causal Perspective
In The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025, 2025
Xiaoyu Liu and Jiaxin Yuan and Yuhang Zhou and Jingling Li and Furong Huang and Wei Ai
Publisher's website
Xiaoyu Liu and Jiaxin Yuan and Yuhang Zhou and Jingling Li and Furong Huang and Wei Ai
Is poisoning a real threat to DPO? Maybe more so than you think
In AAAI 2025 AI Alignment Track (AAAI), 2025, 2025
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Publisher's website
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Is poisoning a real threat to LLM alignment? Maybe more so than you think
In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, ICML 2024, 2024
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
SAIL: Self-improving Efficient Online Alignment of Large Language Models
In ICML 2024 Workshop on Theoretical Foundations of Foundation Models, ICML 2024, 2024
Mucong Ding and Souradip Chakraborty and Vibhu Agrawal and Zora Che and Alec Koppel and Mengdi Wang and Amrit Bedi and Furong Huang
Mucong Ding and Souradip Chakraborty and Vibhu Agrawal and Zora Che and Alec Koppel and Mengdi Wang and Amrit Bedi and Furong Huang
Progressively Efficient Communication
In Intrinsically Motivated Open-ended Learning (IMOL) Workshop, NeurIPS 2023, 2023
Khanh Nguyen and Ruijie Zheng and Hal Daume III and Furong Huang and Karthik Narasimhan
Khanh Nguyen and Ruijie Zheng and Hal Daume III and Furong Huang and Karthik Narasimhan
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models
In Workshop on Socially Responsible Language Modelling Research (SoLaR), NeurIPS 2023, 2023
Sicheng Zhu and Ruiyi Zhang and Bang An and Gang Wu and Joe Barrow and Zichao Wang and Furong Huang and Ani Nenkova and Tong Sun
Sicheng Zhu and Ruiyi Zhang and Bang An and Gang Wu and Joe Barrow and Zichao Wang and Furong Huang and Ani Nenkova and Tong Sun
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis
In Workshop BuildingTrust Workshop, ICLR 2025, 2025
Jeffrey Yang Fan Chiang and Seungjae Lee and Jia-Bin Huang and Furong Huang and Yizheng Chen
Publisher's website
Jeffrey Yang Fan Chiang and Seungjae Lee and Jia-Bin Huang and Furong Huang and Yizheng Chen
Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise
In 4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), 2024, 2024
Kyle Sang and Tahseen Rabbani and Furong Huang
Publisher's website
Kyle Sang and Tahseen Rabbani and Furong Huang
