Finding Effective Security Strategies through Reinforcement Learning and Self-Play

17 Sep 2020 · Kim Hammar, Rolf Stadler ·

We present a method to automatically find security strategies for the use case of intrusion prevention. Following this method, we model the interaction between an attacker and a defender as a Markov game and let attack and defense strategies evolve through reinforcement learning and self-play without human intervention. Using a simple infrastructure configuration, we demonstrate that effective security strategies can emerge from self-play. This shows that self-play, which has been applied in other domains with great success, can be effective in the context of network security. Inspection of the converged policies show that the emerged policies reflect common-sense knowledge and are similar to strategies of humans. Moreover, we address known challenges of reinforcement learning in this domain and present an approach that uses function approximation, an opponent pool, and an autoregressive policy representation. Through evaluations we show that our method is superior to two baseline methods but that policy convergence in self-play remains a challenge.

PDF Abstract