The AI System That Beat Poker Professionals

An artificial intelligence program developed by Carnegie Mellon University in collaboration with Facebook AI has defeated leading professionals in six-player no-limit Texas hold'em poker, the world's most popular form of poker. The AI, called Pluribus, defeated poker professional Darren Elias, who holds the record for most World Poker Tour titles, and Chris "Jesus" Ferguson, winner of six World Series of Poker events. Each pro separately played 5,000 hands of poker against five copies of Pluribus.

"Pluribus achieved superhuman performance at multi-player poker, which is a recognized milestone in artificial intelligence and in game theory that has been open for decades," said Tuomas Sandholm, a professor of Computer Science, who developed Pluribus  “The ability to beat five other players in such a complicated game opens up new opportunities to use AI to solve a wide variety of real-world problems."

Pluribus' algorithms created some surprising features into its strategy. For instance, most human players avoid "donk betting" -- that is, ending one round with a call but then starting the next round with a bet. It's seen as a weak move that usually doesn't make strategic sense. But Pluribus placed donk bets far more often than the professionals it defeated. A major factor in Pluribus’s success was its ability to mix up its strategy, the same thing that humans try to do, however, it is a matter of execution for humans -- to do this in a perfectly random way and to do so consistently is something most people just can't do.

Poker provides an even bigger challenge because it is an incomplete information game; players can't be certain which cards are in play and opponents are likely to bluff and when. That made it both a tougher AI challenge and more relevant to many real-world problems involving multiple parties and missing information.

All of the AIs that displayed superhuman skills at two-player games did so by modeling what's called a Nash equilibrium. It is a pair of strategies (one per player) where neither player can benefit from changing strategy as long as the other player's strategy remains the same. Although the AI's strategy guarantees only a result no worse than a tie, the AI emerges victorious if its opponent makes miscalculations and therefore fails to maintain the equilibrium. However, in a game with more than two players, playing a Nash equilibrium can be a losing strategy. So Pluribus discards the theoretical guarantees of success and develops strategies that, despite the number of players, enable it to consistently outplay opponents.

Pluribus first computes a "blueprint" strategy by playing six copies of itself, which is sufficient for the first round of betting. From that point on, Pluribus does a more detailed search of possible moves. It looks ahead several moves as it does so, but not requiring looking ahead all the way to the end of the game, which would likely be inaccurate. While this search is a standard approach in perfect-information games, it is extremely challenging in imperfect-information games and was the main breakthrough that enabled Pluribus to achieve superhuman multi-player poker. The search is essentially the use of a limited lookahead subgame in the context of an imperfect information game algorithm. 

As a result of the subgame, the AI considers five possible continuation strategies each opponent and itself might adopt for the rest of the game, while the number of possible strategies is much larger, researchers found that only five strategies are needed at each subgame to compute a strong, balanced overall strategy.  In addition, Pluribus seeks to be unpredictable. For instance, betting would make sense if the AI held the best possible hand, but if the AI bets only when it has the best hand, opponents will quickly catch on. So Pluribus calculates how it would act with every possible hand it could hold and then computes a strategy that is balanced across all of those possibilities.

While the creation of an AI player for Texas hold’em could pose a serious threat to the online gaming industry, creating an algorithm that faces the hardest situations an AI system can face, a situation with imperfect information and multiple players that may behave irrationally, is a huge milestone in the development and integration of more complicated AI systems that do more than just automate monotonous processes. 

ResearchPreesha Gehlot