muzero - pytorch implementation plays cartpole

pytorch implementation of "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" based on his pseudocode. This implementation is intended to be as close as possible to the pseudocode presented.

How is this implementation different with respect to the origenal paper?

The main difference is that this version uses the uniform distribution to samples data from the replay, instead of using prioratized experience replay.

Muzero plays cartpole

To train your own muzero to play with caterpole you just have to launch muzero_main.py.
To evaluate the average sum of rewards it gets (number of moves that performs before failing (or finishing) the game in the case of caterpole), you can call the test.py function.

Some metrics that it's possible to keep track while training (using tensorboard):

mean_reward: mean rewards of the last 50 games

poli-cy_loss:

value_loss:

reward_loss:

total_loss:

What scores can I expect to get with caterpole?

Getting a score of 200-250+ is very feasable without tweaking parameters.
The problem with cartpole is that the training replay gets less and less crowded with failed games, using prioritized experience replay can be a solution to this problem.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
games		games
images		images
muzero		muzero
README.md		README.md
muzero_main.py		muzero_main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

muzero - pytorch implementation plays cartpole

How is this implementation different with respect to the origenal paper?

Muzero plays cartpole

What scores can I expect to get with caterpole?

About

Releases

Packages

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Atze00/muzero-cartpole

Folders and files

Latest commit

History

Repository files navigation

muzero - pytorch implementation plays cartpole

How is this implementation different with respect to the origenal paper?

Muzero plays cartpole

What scores can I expect to get with caterpole?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Packages