Artificially Intelligent Players Invent Nonverbal “Languages” to Win Card Games

Machines are becoming more collaborative, both with humans and one another. Soon, we may have self-driving cars that negotiate rights-of-way and robots to assist nurses with home care. But first, they’ll need to learn to communicate, and not just through spoken language. Humans say a lot with their actions. Tapping the brakes both slows you and signals potential trouble ahead. Crossing your arms both protects you and signals reticence. To teach artificial intelligence (AI) to communicate, researchers have turned to card games. While AI long ago bested humans at chess, Go, some forms of poker, and many video games, the games of bridge and Hanabi offer special challenges. Players must cooperate without a clear way to share information (such as by saying, “Hey, play this card!”). Researchers working on both games have recently developed AIs that invent their own implicit codes to coordinate their moves. In bridge, there are four players, divided into two teams. Before anyone plays a card, players take turns bidding on a “contract.” A bid indicates you think your team can win a certain number of tricks, with a chosen suit as trump. Over the years, bridge players have developed ways to place bids that also tell their partners what’s in their own hand. For example, one might bid “two clubs,” even without any clubs, to indicate a lot of face cards. Through such coded bids, teams can have rudimentary conversations. Researchers at University College London recently posted a paper to the pre-print server arXiv in which pairs of AIs use machine learning to perform a simplified version of bridge bidding. In their system, called “Policy-Belief-Iteration” (P-BIT), each AI player has two neural networks. One network learns to infer the partner’s hand based on what’s been bid. The other learns to make appropriate bids based on the partner’s inferred hand and the player’s own hand. During training, AI players are rewarded for making bids that improve the accuracy of their partner’s inferences about the AI player’s own hand. After 1.5 million practice hands, the AI pair had developed conventions of its own, for instance bidding three of a suit as a way to signal that six of a suit might be an ideal contract. The players bested baseline AI players that didn’t use communication or model their partner’s beliefs. Jun Wang, a computer scientist at University College London and an author of the paper, says the players can’t yet compare with humans but that he finds the initial results “very encouraging.” A newer card game called Hanabi poses similar challenges of communication. In this cooperative solitaire-like game, two to five players each hold four or five cards, each with a color and a number, and take turns placing them on colored piles in the correct order. But they can’t see their own cards, only those of their partners. On each turn, they can play a card, discard one, or give a hint to another player. They can’t tell another player which card to play, though—they can only say which cards in the partner’s hand are a particular color or number. Indicating the color of a particular card in your partner’s hand might in some cases signal that it should be played next. Information can come not just from the explicit clue itself (the color of a card) but implicitly from why that clue was chosen instead of others. Recently a team from DeepMind Technologies and the University of Oxford posted a paper on arXiv describing a machine-learning system for the two-player version of Hanabi. Their “Bayesian Action Decoder” (BAD) also uses neural networks and has each AI player try to infer the beliefs of its partner. To avoid the infinite recursion of thinking about what your partner is thinking about what you’re thinking and so on, the system creates a set of “public beliefs” external to the two players. These represent all the openly available information about the state of the game and previous actions, and what that might say about all the hands without actually looking at any of them. Then a “public agent” uses a neural network to convert these beliefs into instructions for what each player should do, for any possible combination of cards in their partner’s hand. Each player then acts on the public agent’s guidance combined with its own observations. Both of these systems rely on giving AI something like humans’ “theory of mind,” or awareness of others’ beliefs and intentions Even though these AI players weren’t specifically rewarded for communication, signaling emerged as a side effect. For instance, pointing out red or yellow cards meant the AI partner should play the most recently drawn card. The researchers calculated that 40 percent of the information shared through hints was implicit. Such coded communication helped their system play nearly perfectly, scoring an average of 24 out of 25 points, beating the best previous bot by about a point. (For humans, scoring more than 20 points is good, even with the unfair advantage of body language.) Both of these systems, P-BIT for bridge and BAD for Hanabi, rely on giving AI something like humans’ “theory of mind,” or awareness of others’ beliefs and intentions. Such reasoning is ubiquitous in human interaction. If you ask your friend about his marriage and he changes the subject to the Mets, that says something not just about the Mets but also about his marriage. For AIs to handle interaction with humans or each other efficiently and gracefully, they’ll need to understand implicit signaling, and card games are one way to get there. A startup called NukkAI is focused on building better AI for bridge and is raising millions of dollars with the expectation that it will eventually apply its technology to real-world problems. In a recent paper, DeepMind called Hanabi “a new frontier for AI research” and offered an open-source test bed. While DeepMind’s BAD system develops conventions by playing with the same partner repeatedly, their “frontier” paper notes that a more advanced task would be to reason on the fly about the behavior of an unfamiliar player. They tested a few systems in this scenario, and none averaged more than four points. Julia Proft, a computer scientist at Cornell University who works on robots and communication (and recently found that AI Hanabi players are more likely to be judged human when they use implicit signals), also emphasizes the importance of on-the-fly reasoning. “What they did is very cool,” she says of the papers on learned conventions, but she adds that spontaneous inference from context is “what the interesting problem is.” Unfortunately, says Jakob Foerster, a computer scientist at the University of Oxford and coauthor of both DeepMind papers, “I don’t think we even have credible methods to start thinking about what’s required for that. We’re pretty far out, to be honest.”