Skip to content

ADGEfficiency/cem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cross Entropy Method

Cross Entropy Method (CEM) is a gradient free optimization algorithm that fits parameters by iteratively resampling from an elite population.

The model learns only from a single scalar (total episode reward).

Pseudocode for the CEM algorithm:

for epoch in num_epochs:
  sampling a population from a distribution
  testing that population using the environment
  selecting the elites (judged by total episode reward)
  refitting the sampling distribution (to the elites)

CEM can be easily parallelized - this implementation runs batches across multiple processes using Python's multiprocessing, making it quick in wall time.

The total number of episodes run in an experiment is given by:

num_episodes = num_epochs * num_processes * batch_size

Use

Cartpole

$ python cem.py cartpole --num_process 6 --epochs 8 --batch_size 4096
Namespace(env='cartpole', num_process=6, epochs=8, batch_size=4096)
expt of 196608 total episodes
epoch 0 - 22.0 30.5 pop - 64.9 48.1 elites
epoch 1 - 33.3 37.4 pop - 92.9 46.2 elites
epoch 2 - 46.0 46.9 pop - 125.1 46.9 elites

Pendulum

$ python cem.py pendulum --num_process 6 --epochs 15 --batch_size 4096

Setup

The dependencies of this project are gym and matplotlib - numpy will come along with gym:

$ pip install -r requirements.txt

About

Parallelized Cross Entropy Method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy