Learning Implicit Communication Strategies for the Purpose of Illicit Collusion
Full recordShow full item record
AbstractWinner-take-all dynamics are prevalent throughout the human and natural worlds. Such dynamics promote competition between agents as they secure available resources for themselves and undermine their opponents. However, this competition is mitigated by the capacity of any single agent to amass resources on its own. Thus, agents collude with one another to ensure that one of their own can win and distribute the spoils amongst the co-conspirators. Such collusion can be difficult to execute successfully, and explicit collusion is often prohibited. Thus, competitors may attempt to collude by encoding signals in their publicly observable behavior. Such collusion happens in domains such as economics in which colluding agents establish cartels, in politics via the spoils system, and in biology where symbiotic systems can be more efficient than organisms acting alone. However, it is not known by what mechanisms agents can establish such implicit collusion strategies. In this work, we do so by extending iterated prisoner's dilemma (IPD) into a winner-take-all (WTA) framework, formalize a WTA IPD strategy as a Markov decision process, and use classical and deep reinforcement learning to discover collusion strategies in WTA IPD. We then analyze the techniques that are learned to understand how agents can develop signaling mechanisms over restricted communication channels.