We propose a new algorithm for MARL under a multi-agent predictive state representation model, where we incorporate a dynamic interaction graph; we provide the theoretical guarantees of our model and run various experiments to support our algorithm.
This paper proposes a new algorithm for learning the optimal policies under a novel multi-agent predictive state representation reinforcement learning model. Compared to the state-of-the-art methods, the most striking feature of our approach is the introduction of a dynamic interaction graph to the model, which allows us to represent each agent’s predictive state by considering the behaviors of its “neighborhood” agents. Methodologically, we develop an online algorithm that simultaneously learns the predictive state representation and agent policies. Theoretically, we provide an upper bound of the L2-norm of the learned predictive state representation. Empirically, to demonstrate the efficacy of the proposed method, we provide thorough numerical results on both a MAMuJoCo robotic learning experiment and a multi-agent particle learning environment.
We propose a new algorithm for MARL under a multi-agent predictive state representation model, where we incorporate a dynamic interaction graph; we provide the theoretical guarantees of our model and run various experiments to support our algorithm.