Skip to content

Atze00/muzero-cartpole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

muzero - pytorch implementation plays cartpole

pytorch implementation of "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" based on his pseudocode. This implementation is intended to be as close as possible to the pseudocode presented.

How is this implementation different with respect to the original paper?

The main difference is that this version uses the uniform distribution to samples data from the replay, instead of using prioratized experience replay.

Muzero plays cartpole

To train your own muzero to play with caterpole you just have to launch muzero_main.py.
To evaluate the average sum of rewards it gets (number of moves that performs before failing (or finishing) the game in the case of caterpole), you can call the test.py function.

Some metrics that it's possible to keep track while training (using tensorboard):

mean_reward: mean rewards of the last 50 games

policy_loss:

value_loss:

reward_loss:

total_loss:

What scores can I expect to get with caterpole?

Getting a score of 200-250+ is very feasable without tweaking parameters.
The problem with cartpole is that the training replay gets less and less crowded with failed games, using prioritized experience replay can be a solution to this problem.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy