Blog: OpenAI Bot Crushes Dota 2 Champions And This is Just the Beginning

Machines, like humans, learn best when they’re beaten

There are quite a few games in which AI bots are used, there are very popular games in which these bots have already defeated real people and even managed to defeat champions, but they were less simple games and had to play one on one. But not so long ago, the AI ​​bots got to more complex games, namely, Dota 2, a 5 vs 5 team game, where there are very difficult gameplay, very twisted strategies and until the last moment it is not clear who will win. I have been following Dota for a long time, for all the updates, tournaments and all the progress of the game, I saw how these AI bots created by an OpenAI startup just appeared and then they were able to play only 1 on 1. Such games are usually played during various tournaments as a show match. But from recent times they began work on a full-fledged team and won the champions at The International 8.

Fundamentally, video games offer challenges that board games like Go just doesn’t. Since in video games most of the information is hidden from the players, this leads to the fact that AI cannot perceive the whole picture and make the best move. There’s also more information to process and a huge number of possible moves. The average match contains 80,000 individual frames, during which each character can perform dozens of 170,000 possible actions. Heroes on the board finish an average of 10,000 moves each frame, contributing to the game’s more than 20,000 total dimensions. And each of those heroes — of which there are over 100 — can pick up or purchase hundreds of in-game items. Processing all this data so games can be played at a faster-than-life pace is a huge challenge. To train their algorithms, OpenAI had to corral a massive amount of processing power — some 256 GPUs and 128,000 CPU cores. It’s more complex than it sounds.

AI bot view from Battlefield — Photo from OpenAI

To create these bots, the OpenAI lab turned to a method of machine learning known as deep reinforcement learning — one of three machine learning paradigms. This is a very simple technique that can lead to very complex behavior. Everything works on trial and error, these AI bots are placed in a virtual environment where they learn how to achieve their goals. Programmers set what are called reward functions (awarding bots points for things like killing an enemy, destroying a tower, assisting, their net worth, last mile hits, etc.), and then they leave the AI bots to play themselves over and over again.

OpenAI’s training framework — Rapid — consists of two parts: a set of rollout workers that run a copy of Dota 2 and an LSTM network, and optimizer nodes that perform synchronous gradient descent across a fleet of graphics cards. As the rollout workers gain experience, they inform the optimizer nodes, and another set of workers compare the trained LSTM networks (agents) to reference agents. To self-improve, OpenAI Five plays 180 years’ worth of games every day — 80% against itself and 20% against past selves — on 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform.

From the evening of April 18th through the 21st, anybody with an internet connection had the chance to play against OpenAI’s bot — the same one that defeated the world champion team. The results were unsurprising, to say the least: it obliterated the competition, winning 7,215 competitive games and boasting a 99.4% victory rate overall. It only lost 42 competitive games over the weekend. Moreover, it is worth noting that there are things that these bots still do not know how to do, for example, to make stacks of forest creeps, but they are already winning over real people with such an advantage.

These bots show that there are things that they can do better than people, much better, for example, they learn everything faster, but still, there are qualities that they inferior us. However, if used properly, they can make our lives much brighter and better. Our main threat will always be only the man himself because these bots are created only by our hands. So let us only create the beautiful and make our planet better, at least restore its previous state, when nature did not suffer from us so much.

