Roughly a year ago, to the day, Google researchers announced their artificial intelligence, AlphaGo, had mastered the ancient game of Go. At the time, Discover wrote that there was still one game that gave computers fits: poker.
Late Monday night, a computer program designed by two Carnegie Mellon University researchers beat four of the world’s top no-limit Texas Hold’em poker players. The algorithm, named Libratus by its creators, collected more than $1.5 million in chips after a marathon 20-day tournament in Pittsburgh. The victory comes only two years after the same researchers’ algorithm failed to beat human players.
In the past few decades, computer scientists’ algorithms have surpassed human prowess in checkers, chess, Scrabble, Jeopardy! and Go—our biological dominance in recreational pastimes is dwindling. But board games are played with a finite set of moves, the rules are clear-cut and your opponent’s strategy unfolds on the board. Computers are well-adapted to sort through and make the most optimal choices in these logical, rules-based games.
Poker was seen as a stronghold for human minds because it relies heavily on “imperfect intelligence”—we don’t know which cards our opponents hold, and the number of possible moves is so large it defies calculation. In addition, divining an opponent’s next move relies heavily on psychology. The best players have refined bluffing into an art form, but computers don’t fare very well when asked to intuit how humans will react.
These hurdles were obviously no match for the improved algorithm designed by Tuomas Sandholm and Noam Brown. While they haven’t yet released the specific details of their program, it seems that they relied on the well-worn tactic of “training.” Libratus ran trillions of simulated poker games, building its skills through trial and error, until it discovered an optimal, winning strategy. This allowed the AI to learn the nuances of bluffing and calling all by itself, and meant that it could learn from its mistakes.
“The best AI’s ability to do strategic reasoning with imperfect information has now surpassed that of the best humans,” Sandholm said in a statement.
Sandholm says that the Libratus would review each day’s play every night and address the three most problematic holes in its strategy. When play began the next day, the human players were forced to try new strategies in their attempt to trick the machine. The poker pros would meet every night as well to discuss strategies, but their efforts couldn’t match the processing power of the Pittsburgh Supercomputing Center’s Bridges computer, which drew upon on the equivalent of 3,300 laptops worth of computing power.
Libratus seemed to favor large, risky bets, which initially made the human players balk. They soon learned that it was best to try and defeat the AI early on in a hand, as that’s when the most cards are unseen and uncertainty is greatest. As more cards are flipped and decisions made, the computer was able to further refine its decision making.
The algorithm isn’t limited to poker either. While this version of the program was trained specifically on the rules of Texas Hold ‘Em, it was written broadly enough that it could conceivably learn to master any situation that contains imperfect information, such as negotiations, military strategy and medical planning.
Libratus isn’t quite ready for the World Poker Tour yet. The version of the game it played only included two opponents at a time, unlike most tournaments. Games with more players compound the number of variables at play, making it significantly more difficult for a computer to choose the best course of action.
So when it comes to joining the poker table with Libratus, heed the immortal words of Kenny Rogers: Know when to walk away. Know when to run.