Posted on Categories Discover Magazine
Remember AlphaGo? You know, the artificial intelligence that in 2016 soundly defeated the finest players humanity could muster in the ancient Chinese strategy game of Go; thus forcing us to relinquish the last vestige of board game superiority flesh-and-blood held over machines?
Remember that?
Well, here’s something to chew on: Google’s AI research arm DeepMind, the same benevolent creator that spawned AlphaGo, has already rendered that gluteus maximus-spanking version obsolete. In a study published Wednesday in the journal Nature, researchers describe a swifter, leaner, autodidact AI that defeated AlphaGo 100 games to zero. Zilch. Nada. Nothing.
Appropriately, this new AI prodigy is named AlphaGo Zero, and its secret to superiority is truly fascinating.
Perhaps we should have seen this coming. After all, AlphaGo’s prowess depended on the expertise of humans in the first place. Its artificial neural network was trained on a vast library of games played by human masters. AlphaGo analyzed those games, move-by-move, and then played itself in simulations over and over again, hyper-optimizing moves each turn based on its store of human knowledge about the game. AlphaGo took what it learned from humans and did it better.
AlphaGo Zero is different. Researchers didn’t feed its neural network any data from past games played by humans. The AI started from scratch with an entirely blank slate, its imagination confined only to the rules of the game. AlphaGo Zero began its training by making utterly random moves in simulated games against itself, learning a little more from each outcome, and improving its neural network each time.
It carried on like this for three days, during which 4.9 million games were generated, and 1,600 simulations were produced for each of those game. In just 36 hours, AlphaGo Zero was ready to knock its predecessor off the top of the mountain. For comparison, the AlphaGo version that beat Lee Sedol, the world’s best human player, required several months of training and relied on far more hardware to get the job done.
DeepMind, after defeating Sedol, continued to improve on AlphaGo in a few iterations. Earlier this year, AlphaGo Master defeated 60 of the world’s top Go players online. AlphaGo Zero surpassed AlphaGo Master after 21 days of training. After 40 days, AlphaGo Zero was arguably the best thing to ever play Go.
The fact that human-guided AlphaGo that defeated Sedol couldn’t muster a single win against self-taught AlphaGo Zero had researchers arriving at some rather mind-blowing, and perhaps spine-chilling conclusions. In their study, they write:
“This suggests that AlphaGo Zero may be learning a strategy that is qualitatively different to human play…AlphaGo Zero discovered a remarkable level of Go knowledge during its self-play training process. This included not only fundamental elements of human Go knowledge, but also non-standard strategies beyond the scope of traditional Go knowledge.”
Over thousands of years, hundreds of generations, countless games and books published about said games, humanity amassed its knowledge of Go. And the masters reached their level only by standing on the shoulders of so many that came before them. The game has a rich history, and there’s a reason it still captures the imagination of people today.
AlphaGo Zero, through random play and reinforcement learning, not only mastered the game of Go, but also reinvented it. All in less than two months.
For an artificial intelligence researcher, building an AI with general knowledge would be akin to landing on Mars—there’s no limits to what an AI like that could do. Human beings possess general knowledge. We use the same biological hardware and software to drive a car, solve a math problem, write poetry, catch a baseball and play the game of Go. We can also solve problems where the solution is nebulous, there are no “winners” and the rules to guide us don’t exist. How does a person win in poetry?
AlphaGo Zero is another step toward a kind of general knowledge. It formed its own strategies and optimized an outcome without studying prior examples. Sure, the behaviors that emerged here are novel, and perhaps unprecedented. But the game of Go represents a confined problem with rules and a clear definition of when the game ends—albeit there are a mind-numbing amount of game variations. An algorithm like AlphaGo Zero has potential to teach itself and perform at superhuman levels in rule-based tasks where an outcome is known: investing, insurance claims, medical diagnosis.
But can it play Go, write a novel, drive a car and pick out the best tomato from the produce section? Not yet, but it’s a step closer.