Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AlphaGo is essentially built on the work that IBM did on TD-Gammon (a reinforcement learning backgammon player) in the 90s.

Pretty much the same thing happened with TD-Gammon with it playing unconventional moves, in the longer term humans ended up adopting some of TD-Gammon's tactics once they understood how they played out, it wouldn't be surprising to see the same happen with Go.



From my understanding, computers have also had this affect on chess. The play styles of younger champions has evolved to the point where unpredictability is actually part of the strategy. I'm not a chess expert by any means, but this quote by Viswanathan Anand (former World Chess Champion) describes it.

  “Top competitors who once relied on particular styles of play are now forced to mix up their strategies, for fear that powerful analysis engines will be used to reveal fatal weaknesses in favoured openings....Anything unusual that you can produce has quadruple, quintuple the value, precisely because your opponent is likely to do the predictable stuff, which is on a computer” [1]
[1] http://www.businessinsider.com/anand-on-how-computers-have-c...


>powerful analysis engines will be used to reveal fatal weaknesses in favoured openings...

Anand isn't really talking about strategy here, he's just talking about choice of opening. Players with narrow opening repertoires, like Fischer, have always been easier to prepare for than players who play a wide variety of openings.

As far as actual changes to strategy, the most obvious one is that computers tend to value material more highly than humans. So a computer will take a risky pawn if it looks sound, while a human will see that taking the pawn is very complicated and prefer a simpler move.


Computers and the internet have changed chess in several ways:

(1) Online game databases have made it easier for players to track developments in opening theory and prepare to play specific opponents

(2) Chess engines add to this be used to search for antidotes to complicated opening systems

(3) Young players have greater access to high-quality sparring partners - either engines or fellow humans on online servers.

This has lead to the best players becoming younger, and players playing more varied and less 'sharp' openings.


Reading the paper, it doesn't at all sound like AlphaGo uses anything that TD-Gammon used.

It uses MCTS, which is unlike minimax. It doesn't use temporal difference learning, although they say that the policy somewhat resembles TD.

That doesn't sound like 'essentially built on', its sounds maybe like 'slightly influenced by'


You're missing the forest for the trees.

Tesauro's work on TD-Gammon was pioneering at the high level, i.e. combining reinforcement learning + self-play + neural networks.


> AlphaGo is essentially built on the work that IBM did on TD-Gammon (a reinforcement learning backgammon player) in the 90s.

Citation needed.


And you'll find it in the AlphaGo paper. It's not a contentious claim.


Citation still needed.


He just gave you a citation. "The AlphaGO paper".


This one, I assume? http://www.nature.com/nature/journal/v529/n7587/full/nature1...

Looks like citation 46 is the relevant one here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: