Mind boggling machine learning results from AlphaZero

Laurie K · Dec 10, 2017

mfb said:

Which other games do you expect next?
...
It will be interesting to see if the AI can be adapted to work with a broader range of problems.

Would they be anything like the fun and games you get on a stock market or currency exchange?

Could the AI ever win a game if the game was rigged in the interests of others not being the AI and the AI was not allowed to tally the score and operate off its own internal calculations because that would be illegal?

Ryan_m_b · Dec 10, 2017

mfb said:

Which other games do you expect next?
Chess and Go have no randomness and no hidden information - all players always know the full state of the game. Many card games have hidden information (cards not shown to everyone) and randomness, for example. There is a Poker-AI beating humans but that is a different topic..

Earlier on in the year an AI beat champion players in a limited DOTA game:
https://www.google.co.uk/amp/s/arst...t-takes-on-the-pros-at-dota-2-and-wins/?amp=1

The game doesn’t have the perfect information that chess/go have (you can only see other players in your vicinity). It will be interesting to see if this AI can graduate to play proper games. At the moment it’s limited to playing as one specific character, against one specific character in solo matches (rather than the usual mixed 5v5).

Haelfix · Dec 10, 2017

I gather this was done with some form of a RNN.

What's funny about this business is that the general algorithms and structures have been known for well over 30 years. Some tweaking is needed for the particular game of course, but why has it taken this long for results like this to all of a sudden show up. It just seems like a completely obvious thing to try with chess, so I doubt this team were the first to try this approach.

Like everyone else, I started dabbling with ML about 5-6 years ago for a Kaggle competition, but it seems like almost all the big results have been occurring recently.

So is it a question of computation power and storage capacity? That seems partially true, but also pretty odd. Certainly different games have vastly different state counting so you would expect results to be more spread out in time.

On the other hand, could it be that there are inflection points within the search strategies, where say past a certain number of (layers, iterations, etc) convergence properties are substantially altered?

girts · Dec 10, 2017

Very interesting topic, I want to share my two cents, please don't take me as ignorant as I would really like to know much more about this than I probably do at the moment.Correct me as necessary but I fail to see the intellect part in this AI, well it is definitely artificial and it has some strong capabilities in some areas that's for sure but is it intelligence?
The way I see intelligence is not maximum capability in specific logic strategy tasks (that's essentially a computer) I see intellect as the capability to learn a fixed rule set and then seeing the problem within that rule set and coming up with a solution that is totally different and not within the rule set and not even within all the possible outcomes of the rules. Because if we are talking about the possibility of finding a solution to a specific problem which is within a fixed set of rules then isn't that just a matter of time question? For example chess has fixed rules and fixed amount of possible moves and outcomes, and I assume the reason why a human can't beat a computer and lately AlphaZero is because the computer is X times faster in it's capability to process all possible strategies from every move made by either itself or it's opponent.
So other than this fact the other fact that seems so novel about these news is that Alpha Zero learned how to play only from the rules of the games (Go, chess) so isn't this also a purely deterministic solution? I'm thinking in terms of knowing the possible moves and rules that govern them it is only a matter of time with trying and error to come up with all the possible outcomes both winning and losing ones?I can imagine how such approach and device could help and solve mathematical and scientific problems which is very great,which is useful if one already knows the necessary inputs or atleast some of them.

An example of an intellect comes to mind, say the situation in which Albert Einstein was in when he conceived the Theory of Relativity, he had no physical examples of the theory and no way of proving it with experiments back in 1900's but it proved out to be correct.
Now could AI come up with a correct explanation or unknown physical law that would explain some of the mysteries in science like the inside of a black hole, dark matter, etc if it was given only a partial set of rules as we arguably don't know all of the laws and rules of this universe as of this moment?
to me it seems chess and even learning chess is different in this regard as you already know the full picture and so it becomes a matter of time and processing approach and power how you figure out the winning strategy, but how does one figure out something that is not known and cannot be explained/arrived at with the existing laws/rules?

Pretty much physics history was learning the unknown while simply experimenting based on what we know so far, so trial and error or educated guess, so if we were to say build up a real AI based on the definition of it right now with our current level of knowledge and understanding about the universe, could such AI find the answers to the very things we don't know so far and if so then from what inputs or ways it would do that?I apologize if this is bit off topic I'm just curious.

phyzguy · Dec 10, 2017

Delta² said:

No , I am not saying that they lie on how they trained the program , but I don't know if they got some aid from GMs regarding the development of the source code of the program. Conventional chess program developers cooperate often with GMs and that reflects on the source code (mainly the source code regarding its evaluation function) of the conventional chess program. I thought that Alphazero developers may also did the same thing.(in simple words, a GM can tell a programmer how he/she thinks when playing chess and the programmer can somehow incorporate this info into the source code of the program)

I don't think you understand how these "deep learning" machines work. They are very different from conventional chess playing machines. There is no "evaluation function" programmed into the machine. It builds up its own evaluation of the best move in the course of training. The only value judgement programmed in is the value of winning or losing, which is stated in the paper: -1 for a loss, 0 for a draw, +1 for a win. In the course of playing several hundred thousand games, the synaptic weights of the neural net are adjusted by the machine itself to increase the probability of winning. The only chess specific information programmed in is the size of the board, how each piece can move, and what constitutes a win/loss/draw.

phyzguy · Dec 10, 2017

Haelfix said:

I gather this was done with some form of a RNN.
What's funny about this business is that the general algorithms and structures have been known for well over 30 years. Some tweaking is needed for the particular game of course, but why has it taken this long for results like this to all of a sudden show up. It just seems like a completely obvious thing to try with chess, so I doubt this team were the first to try this approach.

My understanding is that the recent explosion in successful applications of neural networks is due to improvements in methods for adjusting the synaptic weights. The use of "deep neural nets", which have many hidden layers of neurons between the input and the output, was prohibitive in the past because there was no known method to adjust the weights in a reasonable length of time. New techniques, in particular the use of Restricted Boltzmann Machines included improved algorithms for training the network.

Haelfix · Dec 10, 2017

I think that's definitely partially true, there have been some algorithmic changes. However even very simple convolution neural networks (and other feedforward NNs) with simple backpropagation using gradient descent are now being utilized extremely successfully in applications like facial recognition. My laptop PC with a GPU card is able achieve accuracy that was unheard of even 10 years ago. So it just seems surprising that everything seems to be happening at once.

PAllen · Dec 10, 2017

I think some distinctions in the field called AI are worth making:

1) There is a long track record of success in neural network training - where people provide training data and guide the training (to varying extents). AlphaGo Master that beat Lee Sedol and (with further refinement and training) Ke Jie (who is generally considered the strongest living Go player), was a result in this category. It was remarkable, but only in the sense that people had tried this with Go without any success comparable to this, and the team itself expected this achievement to take much longer (perhaps 10 years, according to some team members). These techniques have been used for both closed, complete information problems, as well as a number of incomplete information problems or partly open problems.

2) Machine (self) learning is what is explored in this new project, which has minimal precedent (that I am familiar with). This is having a neural network train itself with no human provided data or guidance. The technology developed by the AlphaZero team is at present fundamentally limited to finite actions possible, with finite rules for generating them, and a score that can be represented as a real number whose expectation can be maximized. (Note, for chess and Shogi, the score values wore -1,0,1, for Go they were 0, 1, but the framework was explicitly designed to allow scores like 2.567, if there were a problem with such characteristics). It also seems required that the sequence of actions before which a scoring can occur can't be too long (for practical reasons of computation limits, even given the large computational power available during self training). There are no other limitations or specializations. This necessitated an artificial rule added to chess, for the self training phase (beyond the 50 move rule and 3 fold repetition that in principle terminate any game in finite time). It is still possible (especially with machines) to have 1000 move games without terminating per any of the official rules (49 moves, capture or pawn move, 49 moves, capture or pawn move, etc). The group was concerned with these rats holes eating up too much processing time, so they added a rule that games over some threshold length were scored as draws (the paper does not specify what value they chose for this cutoff). This strikes me as a risky but presumably necessary step due to system limitations. They specifically did NOT want to address this by adding ajudicating rules, because these would have to involve chess knowledge. Particularly intriguing to me, looking at black vs. white results, is that AlphaZero seems to have evolved a meta rule on its own to play it safe with black, and take more risks with white. This is the practice of the majority of top grandmasters.

3) It seems that except possibly for the core neural network itself, huge changes and breakthroughs would be needed to apply their (self learning, with no training data) system to incomplete information, open type problem areas. Further, there is no sense in which it is an AI. This is not pejorative. The whole AI field is named after a hypothetical future goal which no existing project is really directly working on (because no one knows how, effectively). It is silly to judge AlphaZero against this goal, because that is not remotely what it was trying to achieve.

Delta2 · Dec 10, 2017

I found some additional info, Stockfish 8 was allowed to use only up to 1GB of Ram for hash table, and that together with the 1-minute per move time control imposed, ruins to some extent, the effective use of 64 cores by Stockfish 8.

How much ram and stored space in HDD Alphazero was using? Couldn't find info for that, could been hundreds of GB to store all those neural network synaptic weights info...

jerromyjon · Dec 10, 2017

While I find this quite interesting, it also seems quite "unstructured". What I mean is that I assume there are no constraints on repetition, and I wonder how replaying the same series of identical moves affects the weighting of "good play" when there are random repetitions. I would think having some type of iteration scheme to allow it to play through all possible games would give it the power to determine the best possible move out of all options but I don't know if that is out of the range of possible in a finite time.

PAllen said:

It seems that except possibly for the core neural network itself, huge changes and breakthroughs would be needed to apply their (self learning, with no training data) system to incomplete information, open type problem areas

I often wonder if this type of system could be applied to mathematics, giving it basic rules of math and scoring according to deriving known mathematical complexities, seems pretty simple to a layman like me but perhaps I'm missing something obvious as I'm not much of a mathematician.

Ryan_m_b · Dec 10, 2017

jerromyjon said:

While I find this quite interesting, it also seems quite "unstructured". What I mean is that I assume there are no constraints on repetition, and I wonder how replaying the same series of identical moves affects the weighting of "good play" when there are random repetitions. I would think having some type of iteration scheme to allow it to play through all possible games would give it the power to determine the best possible move out of all options but I don't know if that is out of the range of possible in a finite time.

For chess it certainly isn’t possible as the number of legitimate games is often compared to the number of atoms in the universe.

PAllen · Dec 10, 2017

Ryan_m_b said:

For chess it certainly isn’t possible as the number of legitimate games is often compared to the number of atoms in the universe.

Many many times greater than the number of atoms in the observable universe.

jerromyjon · Dec 10, 2017

PAllen said:

Many many times greater than the number of atoms in the observable universe.

Okay, now I agree with the title of the post, "mind boggling"!

phyzguy · Dec 10, 2017

jerromyjon said:

While I find this quite interesting, it also seems quite "unstructured". What I mean is that I assume there are no constraints on repetition, and I wonder how replaying the same series of identical moves affects the weighting of "good play" when there are random repetitions. I would think having some type of iteration scheme to allow it to play through all possible games would give it the power to determine the best possible move out of all options but I don't know if that is out of the range of possible in a finite time.

In addition to the fact that there are a huge number of possible games, so there is no way to iterate through all of the possibilities, note that it wasn't playing "random repetitions". It was playing against itself, so as it learned and got better it was playing against a stronger and stronger opponent. So the games it was learning from were far from randomly selected.

jerromyjon · Dec 10, 2017

phyzguy said:

So the games it was learning from were far from randomly selected.

Very good point, I didn't think about it that way. Thanks.

anorlunda · Dec 10, 2017

PAllen said:

Many many times greater than the number of atoms in the observable universe.

PAllen said:

The group was concerned with these rats holes eating up too much processing time, so they added a rule that games over some threshold length were scored as draws (the paper does not specify what value they chose for this cutoff).

Limiting the number of moves before declaring a draw would let you make the number of possible games more finite. But I think that it is beside the point. These neural nets don't memorize specific games, they remember gain factors (weights) in their neural nets. I think neural nets are fascinating because they are almost the antithesis of logic.

Ryan_m_b · Dec 10, 2017

anorlunda said:

Limiting the number of moves before declaring a draw would let you make the number of possible games more finite. But I think that it is beside the point. These neural nets don't memorize specific games, they remember gain factors (weights) in their neural nets. I think neural nets are fascinating because they are almost the antithesis of logic.

How so? Not disagreeing, that’s an interesting statement I’d like to hear more of.

mfb · Dec 10, 2017

The number of possible games exceeds the number of atoms in the observable universe even if you stop games after 50 moves of each side - not an unusual length of actual games.
With Go you exceed that number after less than 20 moves per side.

Apart from the Chess endgame (where computers can calculate all relevant moves) the games are too complex to check every option. The algorithms have to decide which options to explore in more detail, and which options to discard. This is not an easy task - sometimes sacrificing the queen leads to a big advantage several moves later, for example.

PAllen · Dec 10, 2017

mfb said:

The number of possible games exceeds the number of atoms in the observable universe even if you stop games after 50 moves of each side - not an unusual length of actual games.
With Go you exceed that number after less than 20 moves per side.

Apart from the Chess endgame (where computers can calculate all relevant moves) the games are too complex to check every option. The algorithms have to decide which options to explore in more detail, and which options to discard. This is not an easy task - sometimes sacrificing the queen leads to a big advantage several moves later, for example.

I would guess they cut off after something like 500 or 1000, because these are vanishingly rare even between computers. Games of length 200 or more occur even in human tournaments. I agree the cutoff has nothing to do with minimizing the set of all possible games, and everything to do with maximizing the number of self play games that can be completed in a given training period, without losing much value.

mfb · Dec 10, 2017

Who is “they”? No one calculates hundreds of steps in advance. You cannot. Even 10 moves gives way too many options to explore all of them.

Stockfish evaluated did 70 million positions per second. In a minute that gives you 4 billion positions. At ~20 possible moves for each player that would allow a full search just 7 half-moves deep. AlphaZero with its 80,000 positions per second could only look about 5 half-moves ahead. Even amateurs will explore the most promising moves in more detail - and so do the chess engines.

PAllen · Dec 10, 2017

mfb said:

Who is “they”? No one calculates hundreds of steps in advance. You cannot. Even 10 moves gives way too many options to explore all of them.

Stockfish evaluated did 70 million positions per second. In a minute that gives you 4 billion positions. At ~20 possible moves for each player that would allow a full search just 7 half-moves deep. AlphaZero with its 80,000 positions per second could only look about 5 half-moves ahead. Even amateurs will explore the most promising moves in more detail - and so do the chess engines.

They is the paper authors. They describe a need, for the case of chess, during self training self play, to add a rule that games longer than some threshold be declared draws. They don’t state what cutoff they used. Note that even though the 50 move rule and 3 fold repetition already ensure that any chess game terminates infinite length, the theoretical worst case is on the order of 5000 moves. My belief is that their fear was that pursuing such implausible games to their natural conclusion before scoring would reduce the number of self training games completed in reasonable time, and that little would be lost training to this modified chess variant.

mfb · Dec 10, 2017

Even with 2 move options per side, 50 moves for each side would give you 2^100 = 10^30 possible games, way too many to explore all of them. You need the rule to prevent exploring the value of endless loops. The main point stays the same: The algorithms don't do brute force. They explore only a tiny part of the possible moves, and the strategy which part to examine is the interesting part.

PAllen · Dec 11, 2017

mfb said:

Even with 2 move options per side, 50 moves for each side would give you 2^100 = 10^30 possible games, way too many to explore all of them. You need the rule to prevent exploring the value of endless loops. The main point stays the same: The algorithms don't do brute force. They explore only a tiny part of the possible moves, and the strategy which part to examine is the interesting part.

I think you continue to misunderstand me. There is a self training phase, the result of which then plays games with other opponents. The cutoff only applies to self training and is wholly lrrelevant to look ahead for move making during self training. It applies to scoring a self played game. Scoring occurs after a game ends. They want to keep game length in the hundreds, not thousands. So they add an extra rule that terminates games that go on too long and calls them draws. This makes it a different game than chess, which has no such rule. But they correctly guess that good play of this game is intistinguishable in practice from regular chess.

I have tried to express this clearly as I can, several tImes but you keep responding to something unrelated.

PAllen · Dec 11, 2017

GTOM said:

It is true that even a supercompu can't brute force a game of chess, let alone go. But actually only a small subset of gamespace is needed to find the winner strategy.
One optimalisation method is backpropagation, start from the end, and see how different steps can alter the outcome.
In GO once you realize that you can rotate, mirror the table and get the same outcome, you can drastically reduce the complexity of search tree. I guess with proper similarity functions (that maybe two states arent exactly the same, but similar enough to follow the same strategy) search time can be drastically reduced too.

Since new alphago was developed AFTER the experiments with the first one, not just out of the blue with new super learning algorithm, i find it hard to imagine, that they didnt rely on that kind of knowledge. Programmers are kinda lazy, they don't write something from scratch, when they can copy-paste, use function libraries, existing network structures etc.

Please read the paper. They discuss exactly this issue, and state that while such techniques were used in the prior Go specific variants, for AlphaZero they explicitly removed all such specializations, and ensured that NO game oriented optimizations or heuristics were present in the initial state of the Neural Network.

Please stop making suppositions when there is a paper that has a whole 'methods' section answering such issues.

PAllen · Dec 11, 2017

256bits said:

Well, there is the link "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", PDF, under heading A new paradigm ( third way down ).
If I read Table S3 correctly, it took 44 million training games to learn chess, and 21 million to learn Go and Shogi, to become the best at winning.
Humans hardly play that many games with their slow processing grey matter.

Getting back to this interesting point, it occurs to me that since AlphaZero is learning from scratch, a more appropriate comparison would be with the total number of games of chess played by all serious chess players through history. Every chess player learns from the play of the current generation of strong players, who learned from the play of those before, etc. Thus, the comparable human neural net is not one person but the collection of all serious players from the advent of chess, and even reasonably close predecessor games.

My guess is that this would still not total 44 million, but I have no data on this. It would certainly less disparate than looking at games of just one human player.

PAllen · Dec 11, 2017

PAllen said:

Getting back to this interesting point, it occurs to me that since AlphaZero is learning from scratch, a more appropriate comparison would be with the total number of games of chess played by all serious chess players through history. Every chess player learns from the play of the current generation of strong players, who learned from the play of those before, etc. Thus, the comparable human neural net is not one person but the collection of all serious players from the advent of chess, and even reasonably close predecessor games.

My guess is that this would still not total 44 million, but I have no data on this. It would certainly less disparate than looking at games of just one human player.

Ok, here is a data point:

https://shop.chessbase.com/en/products/mega_database_2017

Almost 7 million games we have a record of. Thus, to order of magnitude, a claim could be made that AlphaZero was as effective at mastering chess as the collective net of human chess players.

Andy Resnick · Dec 12, 2017

PAllen said:

Getting back to this interesting point, it occurs to me that since AlphaZero is learning from scratch, a more appropriate comparison would be with the total number of games of chess played by all serious chess players through history. Every chess player learns from the play of the current generation of strong players, who learned from the play of those before, etc. Thus, the comparable human neural net is not one person but the collection of all serious players from the advent of chess, and even reasonably close predecessor games.

My guess is that this would still not total 44 million, but I have no data on this. It would certainly less disparate than looking at games of just one human player.

I finally got a chance to read the arXiv report, which is fascinating- my question is, is there some way to 'peek under the hood' to see the process by which AlphaZero optimized the move probabilities based on the Monte-Carlo tree search, and if during the process of selecting and optimizing the parameters and value estimates arrived at an overall strategic process that is measurably distinct from 'human' approaches to play- could AlphaZero pass a 'Chess version' Turing test?

PAllen · Dec 12, 2017

Andy Resnick said:

I finally got a chance to read the arXiv report, which is fascinating- my question is, is there some way to 'peek under the hood' to see the process by which AlphaZero optimized the move probabilities based on the Monte-Carlo tree search, and if during the process of selecting and optimizing the parameters and value estimates arrived at an overall strategic process that is measurably distinct from 'human' approaches to play- could AlphaZero pass a 'Chess version' Turing test?

Yes, it is very worth reading the whole paper.

I can’t think of any way to formulate a chess Turing test that would clearly distinguish the self trained AlphaZero from any of the top current engines. For example, both would handle novel chess problems well, using their very different techniques.

sysprog · Dec 12, 2017

GM Amanov provides insightful commentary.

GTOM · Dec 12, 2017

sysprog said:

GM Amanov provides insightful commentary.

Looks like what i said about a simpler brute force and avoid error is rather that Stockfish (which is enough to achieve draw in the majority of cases), than that new program, well it is sure interesting, what are the differences, drop useless combinations faster, more likely start in right direction?

phyzguy · Dec 12, 2017

There is a big push for "Explainable AI", driven by the uses in medicine, law and other places. Here is a New York Times article on the issues. So people are working on being able to ask how these deep neural networks make their decisions. If they make progress, it would be interesting to apply to AlphaZero, to see if we can gain insights on how it chooses the correct move.

sysprog · Dec 12, 2017

GTOM said:

Looks like what i said about a simpler brute force and avoid error is rather that Stockfish (which is enough to achieve draw in the majority of cases), than that new program, well it is sure interesting, what are the differences, drop useless combinations faster, more likely start in right direction?

Amanov mentions in the video that the majority of the games were not yet released -- that renders at best tenuous any suggested plenary interpretation of the results -- preliminary study of emergents from the program set implementation presumably should avail of unfettered access to every game.

gleem · Dec 12, 2017

phyzguy said:

There is a big push for "Explainable AI", driven by the uses in medicine, law and other places. Here is a New York Times article on the issues. So people are working on being able to ask how these deep neural networks make their decisions. If they make progress, it would be interesting to apply to AlphaZero, to see if we can gain insights on how it chooses the correct move.

This is a huge issue. A bit off the topic but there are a number of uses of AI where we would definitely like to know the reason for a behavior, a result or a recommendation from the AI. See. https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/

AaronK · Dec 12, 2017

phyzguy said:

There is a big push for "Explainable AI", driven by the uses in medicine, law and other places. Here is a New York Times article on the issues. So people are working on being able to ask how these deep neural networks make their decisions. If they make progress, it would be interesting to apply to AlphaZero, to see if we can gain insights on how it chooses the correct move.

I mentioned this earlier in the thread, but the discussion on 'Explainable AI' has been a topic of mainstream debate recently because of this talk at NIPS:

Eminent ML expert and director of AI research at Facebook Prof. Yann LeCun disagreed (not explicity that there needs to be more rigor in ML) and many have contributed their opinion.

anorlunda · Dec 13, 2017

This thread has been extensively cleaned up to remove off topic posts commenting on the misconception that alpha one used traditional gaming strategies rather than neural networks.

Mind boggling machine learning results from AlphaZero

Similar threads

Hot Threads

Recent Insights