
LightZero

Image Source : opendilab:lightzero
Project Description
-
该项目是我在商汤科技实习期间参与的一个项目;
-
该项目是 OpenDILab 下的一个子项目;
-
该项目致力于研究蒙特卡洛树搜索与深度强化学习结合的RL方法;
-
该项目致力于复现state-of-the-art的各种方法,从AlphaZero到MuZero系列;
-
更多的信息可以参考 github_link 和 paper.
My Contribution
-
Preproduced the MuZero Algorithm, an innovative method that extends the applicablity of techniques akin to enabling tree search in environments with unkonwn transition dynamics.
-
Implemented the Sampled MuZero method, an extension of MuZero, to facilitate learning in domains with arbitrary complex action spaces through strategic planning over sampled actions.
-
Reproduced the Stochastic Muzero Method, enabling comprehensive incorporation of the stochastic nature of the envrionment in the tree search process.
Algorithm Framework
-
Muzero

-
Sampled Muzero

-
Stochastic Muzero

Experimental Result
-
Muzero

-
Sampled Muzero

-
Stochastic Muzero
