AlphaGo Zero自学成才,轻易击败上一代AlphaGo
教程:双语阅读  浏览:728  
  • 提示:点击文章中的单词,就可以看到词义解释
    A self-taught computer has become the world’s best player of Go, the fiendishly complex board game, without any input from human experts.


    DeepMind, Google’s artificial intelligence subsidiary in London, announced the milestone in AI less than two years after the highly publicised unveiling of AlphaGo, the first machine to beat human champions at the ancient Asian game. Details are published in the scientific journal Nature.


    Previous versions of AlphaGo learned initially by analysing thousands of games between excellent human players to discover winning moves. The new development, called AlphaGo Zero, dispenses with this human expertise and starts just by knowing the rules and objective of the game.

    前几代AlphaGo最初通过分析成千上万场优秀人类玩家间的对决来发现制胜招数。新开发的AlphaGo Zero则根本不需要人类专长,只要知道游戏规则和目标就可以投入游戏。

    “It learns to play simply by playing games against itself, starting from completely random play,” said Demis Hassabis, DeepMind chief executive. “In doing so, it quickly surpassed human level of play and defeated the previously published version of AlphaGo by 100 games to zero.”

    “它学游戏仅仅是通过跟自己玩,从完全的随机玩游戏开始,”DeepMind首席执行官杰米斯•哈萨比斯(Demis Hassabis)说。“在玩的过程中,它很快就超过了人类的水平,并以100比0的战绩击败了在论文中介绍过的上一代AlphaGo。”

    His colleague David Silver, AlphaGo project leader, added: “By not using human data in any fashion, we can create knowledge by itself from a blank slate.” Within a few days, the computer had not only learned Go from scratch but surpassed thousands of years of accumulated human wisdom about the game.

    他的同事、AlphaGo项目负责人戴维•西尔弗(David Silver)补充称:“我们不以任何方式使用人类数据,就可以让它从一块白板创造知识。”在几天时间里,AlphaGo不仅学会了下围棋,而且还胜过了人类历经数千年在该游戏上累积的智慧。

    The team developed a new form of “reinforcement learning” to create AlphaGo Zero, combining search-based simulations of future moves with a neural network that decides which moves give the highest probability of winning. The network is constantly updated over millions of training games, producing a slightly superior system each time.

    该团队开发了一种新的“强化学习”形式来创造AlphaGo Zero,将基于搜索的未来走法模拟与神经网络相结合,决定如何出招才能获得最高的获胜概率。该网络用数百万场培训游戏不断更新,每次更新都会带来稍稍增强的系统。

    Although Go is in one sense extremely complex, with far more potential moves than there are atoms in the universe, in another sense it is simple because it is a “game of perfect information” — chance plays no part, as with cards or dice, and the state of play is defined entirely by the position of stones on the board.


    The game involves surrounding more territory than your opponent. This aspect of Go makes it particularly susceptible to the computer simulations on which AlphaGo depends. DeepMind is now examining real-life problems that can be structured in a similar way, to apply the technology.


    Mr Hassabis identified predicting the shape of protein molecules — an important issue in drug discovery — as a promising candidate. Other likely scientific applications include designing new materials and climate modelling.


      上一篇:中国月住宅销售面积二年半来首次同比下滑 下一篇:俄罗斯电视节目主持人宣布竞选总统


