For decades, researchers have been pitting artificial intelligence (AI) against the top game players in the world. The heads-up no-limit Texas hold’em variant of poker may be the final frontier in the battle of man vs. machine over games. And it may be about to fall.
In 1997, IBM chess computer Deep Blue defeated world chess champion Garry Kasparov. In 2011, IBM Watson defeated Ken Jennings and Brad Ruttner, the two winningest Jeopardy players in that game show’s history. In 2015, Google DeepMind’s AlphaGo defeated South Korean professional Go player Lee Sedol, considered one of the best players in the world.
But as games go, heads-up no-limit Texas hold’em is an entirely different beast. Unlike the others, it is a game of imperfect information — the players know only some of the cards in play, and they can bluff and use other ploys to mislead their opponents. Tuomas Sandholm, computer science professor at Carnegie Mellon University, says the game features 10161 information sets, significantly more than all the atoms in the universe. Limit hold’em, which restricts bets and raises to a pre-determined amount, has 1013 information sets.
“For a given game size, incomplete information games are much harder to solve than complete information games,” Sandholm says. “In complete information games, it’s basically decomposable. You can solve what’s best to do just by looking at the end game. But if I’m in an end game where I have four aces, I can’t just bet aggressively. And I can’t just bet weakly when I have a weak hand. That would be too transparent. You have to balance across the subgames and therefore the problem is not decomposable.”
Humans masters of the incomplete and misleading
In practical terms, humans encounter situations in which they must make decisions on incomplete and misleading information all the time. An AI capable of making good decisions based on such information has real-world applications in areas like negotiations, finance, military strategy, cybersecurity and even medicine. Sandholm notes he just received funding for a project to use AI to steer the adaptation and evolution of the immune system to better treat cancers and autoimmune diseases.
But to get there, AI need to surpass humans’ ability to solve imperfect information games.
“When it comes to these strategic situations, you don’t want to use an AI that’s dumber than you,” Sandholm says. “That would make you worse off. You want an AI that’s stronger than you. In negotiations, I don’t want to delegate that to an AI that’s worse than I am. It’s the same for military strategy and cybersecurity. You don’t want it to be worse than what we can do manually. It has to put together better strategies than any human.”
Nearly two years ago, Claudico, an AI developed by Sandholm and his Ph.D student Noam Brown, took on four of the top heads-up, no-limit Texas hold’em players in the world: Dong Kim, Jason Les, Bjorn Li and Doug Polk. From April 24 to May 8, 2015, Claudico played 20,000 hands against each player (80,000 hands total) for a $100,000 prize (donated by Microsoft Research and Rivers Casino). When the dust cleared, Polk, Kim and Li has more chips than Claudico, while Les trailed.
While the humans won in absolute terms, Sandholm says the result was a statistical tie — the participants didn’t play enough hands to make a significant statistical determination.
A win (at least a tie) for humans
“The humans won; as a group, they won,” he says. “But we couldn’t get a statistically significant result on that. Even playing against those absolute, top players, it was a statistical tie. Certainly the AI was not better.”
Determined to do better, Sandholm and Brown started over from scratch. In 2016, they developed a new poker-playing AI named Tartanian. They entered a “dumbed down” version of Tartanian, Baby Tartanian 8, in the 2016 Annual Computer Poker Competition and won both the Total Bankroll and Bankroll Instant Run-off categories. Sandholm explains Baby Tartanian 8 was “dumbed down” in that it had to be limited to the amount of memory allowed by the competition.
Then, in February 2016, Sandholm and Brown started over again, with an eye toward a second match-up between their AI and top heads-up, no-limit pros. They created Libratus. As with the other AI, they didn’t write a strategy for Libratus. Instead, they wrote the algorithm that Libratus uses to compute its strategy.
For instance, he says, Libratus includes a new and faster equilibrium-finding method that identifies non-promising paths and starts to ignore them over time. It also has access to the Pittsburgh Supercomputing Center’s Bridges supercomputer to perform live endgame-solving computations.
The new contest, “Brains vs. Artificial Intelligence: Upping the Ante,” is currently ongoing at Rivers Casino in Pittsburgh. Pros Jason Les, Dong Kim, Daniel McAulay and Jimmy Chou are competing for a $200,000 prize this time around. The contest, which started on Jan. 11, will span 20 days, and the pros will play a collective total of 120,000 hands against Libratus.
As in the previous match, the contest will use duplicate matches to minimize the role of luck.
Reducing the variable
“We try to reduce the variance in this game by not allowing the computer to be lucky or the human to be lucky,” Sandholm says. “We pair the players up.”
For instance, if Jason Les and Dong Kim are paired, and Les gets a certain set of cards against the computer in a particular hand, the computer will get that same set of cards in a hand against Kim, and vice versa.
In addition, to combat fatigue, the humans are allowed to take breaks whenever they want to, and can take as long as they like to play a hand.
“The humans are very careful about their sleep hygiene and their eating and all of that,” Sandholm says. “These are real professional athletes. They have been playing better now than they were in the beginning. Their play has improved during the match so far.”
With seven days played as of this writing, Libratus is ahead in chips against all four players, and is cumulatively ahead by more than $470,000. Rivers Casino is streaming the match live via Twitch for the duration.