Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
After defeating pretty much every highly ranked professional player in the game of Go, Google
DeepMind now ventured into the realm of Chess.
They recently challenged not the best humans, no-no-no, that was long ago.
They challenged Stockfish, the best computer chess engine in existence in quite possibly
the most exciting chess-related event since Kasparov's matches against Deep Blue.
I will note that I was told by DeepMind that this is the preliminary version of the paper,
so now we shall have an initial look, and perhaps make a part 2 video with the newer
results when the final paper drops.
AlphaZero is based on a neural network and reinforcement learning and is trained entirely
through self-play after being given the rules of the game.
It is not to be confused with AlphaGo Zero that played Go.
It is also noted that this is not simply AlphaGo Zero applied to chess.
This is a new variant of the algorithm.
The differences include: - one, the rules of chess are asymmetric,
for instance pawns only move forward, castling is different on kingside and queenside, and
this means that neural network-based techniques are less effective at it.
- two, the algorithm not only has to predict a binary win or loss probability when given
a move, but draws are also a possibility and that is to be taken into consideration.
Sometimes a draw is the best we can do, actually.
There are many more changes to the previous incarnation of the algorithm, please make
sure to have a look at the paper for details.
Before we start with the results and more details, a word on Elo ratings for perspective.
The Elo rating is a number that measures the relative skill level of a player.
Currently, the human player with the highest Elo rating, Magnus Carlssen is hovering around
2800.
This man played chess blindfolded against 10 opponents simultaneously in Vienna a couple
years ago and won most of these games.
That's how good he is.
And Stockfish is one of the best current chess engines, with Elo rating over 3300.
A difference of 500 Elo points means that if it were to play against Magnus Carlssen,
it would be expected to win at least 95 games out of a 100.
Though it is noted that there is a rule suggesting a hard cutoff at around a 400 point difference.
The two algorithms then played each other.
AlphaZero versus Stockfish.
They were both given 60 seconds of thinking time per move, which is considered to be plenty
given that both of the algorithms take around 10 seconds at most per move.
And here are the results.
AlphaZero was able to outperform Stockfish in about 4 hours of learning from scratch.
They played a 100 games - AlphaZero won 28 times, drew 72 times and never lost to Stockfish.
Holy mother of papers, do you hear that?
Stockfish is already unfathomably powerful compared to even the best human prodigies,
and AlphaZero basically crushed it after four hours of self-play.
And, it was run with a similar hardware as AlphaGo Zero, one machine with 4 Tensor Processing
Units.
This is hardly commodity hardware, but given the trajectory of the improvements we've seen
lately, it might very well be in a couple of years.
Note that Stockfish does not use machine learning and is a handcrafted algorithm.
People like to refer to computer opponents in computer games as AI, but it is not doing
any sort of learning.
So, you know what the best part is?
AlphaZero is a much more general algorithm that can also play Shogi on an extremely high
level, which is also referred to as Japanese chess.
And this is one of the most interesting points - AlphaZero would be highly useful even it
if were slightly weaker than Stockfish, because it is built on more general learning algorithms
that can be reused for other tasks without investing significant human effort.
But in fact, it is more general, and it also crushes Stockfish.
With every paper from DeepMind, the algorithm becomes better AND more and more general.
I can tell you, this is very, very rarely the case.
Total insanity.
Two more interesting tidbits about the paper: one, all the domain knowledge the algorithm
is given is stated precisely for clarity.
two, one might think that as computers and processing power increases over time, all
we have to do is add more brute force to the algorithm and just evaluate more positions.
If you think this is the case, have a look at this - it is noted that AlphaZero was able
to reliably defeat Stockfish WHILE evaluating ten times fewer positions per second.
Maybe we could call this the AI equivalent of intuition, in other words, being able to
identify a small number of promising moves and focusing on them.
Chills run down my spine as I read this paper.
Being a researcher is the best job in the world.
And we are even being paid for this.
Unreal.
This is a hot paper, there is lot of discussions out there on this, lots of chess experts analyze
and try to make sense of the games.
I had a ton of fun reading and watching through some of these, as always, Two Minute Papers
encourages you to explore and read more, and the video description is ample in useful materials.
You will find videos with some really cool analysis from Grandmaster Daniel King, International
Chess Master Daniel Rensch, and the YouTube channel ChessNetwork.
All quality materials.
And, if you have enjoyed this episode and you think that 8 of these videos a month is
worth a few dollars, please throw a coin our way on Patreon, or, if you favor cryptocurrencies
instead, you can throw Bitcoin or Ethereum our way.
You support has been amazing as always and thanks so much for keeping with us through
thick and thin, even in times when weird Patreon decisions happen.
Luckily, this last one has been reverted.
I am honored to have supporters like you Fellow Scholars.
Thanks for watching and for your generous support, and I'll see you next time!
No comments:
Post a Comment