Skip to content

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

License

Notifications You must be signed in to change notification settings

wumo/Reinforcement-Learning-An-Introduction

Repository files navigation

Reinforcement Learning: An Introduction

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.

Inspired by ShangtongZhang/reinforcement-learning-an-introduction (Python) and idsc-frazzoli/subare (Java 8)

Features:

  • Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
  • Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.

Implemented algorithms:

Model-based (Dynamic Programming):

Monte Carlo (episode backup):

Temporal Difference (one-step backup):

n-step Temporal Difference (unify MC and TD):

Dyna (Integrate Planning, Acting, and Learning):

On-policy Prediction with Function Approximation

On-policy Control with Function Approximation

Off-policy Methods with Approximation

Eligibility Traces

Policy Gradient Methods

Implemented problems:

Build

Built with Maven

Test cases

Try Testcases

Figure 7.2

Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state random walk task


Figure 10.1

Figure 10.1: The Mountain Car task and the cost-to-go function learned during one run


Figure 10.4

Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding function approximation on the Mountain Car task


Figure 12.3

Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .


Figure 12.6

Figure 12.6: 19-state Random walk results: Performance of TD(λ) .


Figure 12.8

Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms


Figure 12.10

Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces


Figure 12.11

Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.

About

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy