top of page

LightZero

Image Source : opendilab:lightzero

Project Description

  • 该项目是我在商汤科技实习期间参与的一个项目;

  • ​该项目是 OpenDILab 下的一个子项目;

  • 该项目致力于研究蒙特卡洛树搜索与深度强化学习结合的RL方法;

  • 该项目致力于复现state-of-the-art的各种方法,从AlphaZero到MuZero系列;

  • ​更多的信息可以参考 github_link paper.

My Contribution

  • Preproduced the MuZero Algorithm, an innovative method that extends the applicablity of techniques akin to enabling tree search in environments with unkonwn transition dynamics.

  • Implemented the Sampled MuZero method, an extension of MuZero, to facilitate learning in domains with arbitrary complex action spaces through strategic planning over sampled actions.

  • Reproduced the Stochastic Muzero Method, enabling comprehensive incorporation of the stochastic nature of the envrionment in the tree search process.

Algorithm Framework

  • Muzero

muzero_frame.jpg
  • Sampled Muzero

sampled framework.jpg
  • Stochastic Muzero

stochastic framework2.jpg

Experimental Result

  • Muzero

mz_figure.jpg
  • Sampled Muzero

sez_figure.jpg
  • Stochastic Muzero

2048_figure.jpg
bottom of page