Publication Type: Conference articles

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment

In The Thirteenth International Conference on Learning Representations (ICLR), 2025, 2025
Yuancheng Xu and Udari Madhushani Sehwag and Alec Koppel and Sicheng Zhu and Bang An and Furong Huang and Sumitra Ganesh
Publisher's website

AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security

In Workshop BuildingTrust Workshop, ICLR 2025, 2025
Zikui Cai and Shayan Shabihi and Bang An and Zora Che and Brian R. Bartoldson and Bhavya Kailkhura and Tom Goldstein and Furong Huang
Publisher's website

CSRec: Rethinking Sequential Recommendation from A Causal Perspective

In The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025, 2025
Xiaoyu Liu and Jiaxin Yuan and Yuhang Zhou and Jingling Li and Furong Huang and Wei Ai
Publisher's website

Is poisoning a real threat to DPO? Maybe more so than you think

In AAAI 2025 AI Alignment Track (AAAI), 2025, 2025
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang
Publisher's website

Is poisoning a real threat to LLM alignment? Maybe more so than you think

In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, ICML 2024, 2024
Pankayaraj Pathmanathan and Souradip Chakraborty and Xiangyu Liu and Yongyuan Liang and Furong Huang

SAIL: Self-improving Efficient Online Alignment of Large Language Models

In ICML 2024 Workshop on Theoretical Foundations of Foundation Models, ICML 2024, 2024
Mucong Ding and Souradip Chakraborty and Vibhu Agrawal and Zora Che and Alec Koppel and Mengdi Wang and Amrit Bedi and Furong Huang

Progressively Efficient Communication

In Intrinsically Motivated Open-ended Learning (IMOL) Workshop, NeurIPS 2023, 2023
Khanh Nguyen and Ruijie Zheng and Hal Daume III and Furong Huang and Karthik Narasimhan

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

In Workshop on Socially Responsible Language Modelling Research (SoLaR), NeurIPS 2023, 2023
Sicheng Zhu and Ruiyi Zhang and Bang An and Gang Wu and Joe Barrow and Zichao Wang and Furong Huang and Ani Nenkova and Tong Sun

Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis

In Workshop BuildingTrust Workshop, ICLR 2025, 2025
Jeffrey Yang Fan Chiang and Seungjae Lee and Jia-Bin Huang and Furong Huang and Yizheng Chen
Publisher's website

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise

In 4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), 2024, 2024
Kyle Sang and Tahseen Rabbani and Furong Huang
Publisher's website