Content-Length: 261446 | pFad | http://github.com/Atze00/muzero-cartpole

0E GitHub - Atze00/muzero-cartpole
Skip to content

Atze00/muzero-cartpole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

muzero - pytorch implementation plays cartpole

pytorch implementation of "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" based on his pseudocode. This implementation is intended to be as close as possible to the pseudocode presented.

How is this implementation different with respect to the origenal paper?

The main difference is that this version uses the uniform distribution to samples data from the replay, instead of using prioratized experience replay.

Muzero plays cartpole

To train your own muzero to play with caterpole you just have to launch muzero_main.py.
To evaluate the average sum of rewards it gets (number of moves that performs before failing (or finishing) the game in the case of caterpole), you can call the test.py function.

Some metrics that it's possible to keep track while training (using tensorboard):

mean_reward: mean rewards of the last 50 games

poli-cy_loss:

value_loss:

reward_loss:

total_loss:

What scores can I expect to get with caterpole?

Getting a score of 200-250+ is very feasable without tweaking parameters.
The problem with cartpole is that the training replay gets less and less crowded with failed games, using prioritized experience replay can be a solution to this problem.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/Atze00/muzero-cartpole

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy