Comparison of our proposed poisoning algorithm and the random poisoning method.
We train 3 A2C agents with the same hyper-parameters on Hopper-v2, under different poisoning methods with the same attack budget and power. The videos show how the trained agents perform in the test phase.
*Note: the goal of the agent in Hopper is to hop forward as fast as possible.*
(1) a baseline showing the original not poisoned agent
(2) agent under random poisoning attack with $\epsilon=0.1$, $C/K=0.3$.
(3) agent under our proposed VA2C-P attack with $\epsilon=0.1$, $C/K=0.3$.
Paper link: https://arxiv.org/abs/2009.00774
Github link: https://github.com/umd-huang-lab /poison-rl