Reinforcement Learning: An Introduction

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.

Inspired by ShangtongZhang/reinforcement-learning-an-introduction (Python) and idsc-frazzoli/subare (Java 8)

Features:

Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.

Implemented algorithms:

Model-based (Dynamic Programming):

Policy Iteration (Action-Value Iteration) (p.65)
Value Iteration (p.67)

Monte Carlo (episode backup):

Temporal Difference (one-step backup):

Tabular TD(0) (p.98)
Sarsa (p.106)
Q-learning (p.107)
Expected Sarsa (p.109)
Double Q-Learning (p.111)

n-step Temporal Difference (unify MC and TD):

Dyna (Integrate Planning, Acting, and Learning):

Random-sample one-step tabular Q-planning (p.133)
Tabular Dyna-Q (p.135)
Tabular Dyna-Q+ (p.138)
Prioritized Sweeping (p.140)
Prioritized Sweeping Stochastic Environment (p.141)

On-policy Prediction with Function Approximation

On-policy Control with Function Approximation

Episodic semi-gradient Sarsa (p.198)
Episodic semi-gradient n-step Sarsa (p.200)
Differential semi-gradient Sarsa (p.203)
Differential semi-gradient n-step Sarsa (p.206)

Off-policy Methods with Approximation

Semi-gradient off-policy TD(0) (p.210)
Semi-gradient Expected Sarsa (p.210)
n-step semi-gradient off-policy Sarsa (p.211)
n-step semi-gradient off-policy Q(σ) (p.211)

Eligibility Traces

Policy Gradient Methods

REINFORCE, A Monte-Carlo Policy-Gradient Method (episodic) (p.271)
REINFORCE with Baseline (episodic) (p.273)
One-step Actor-Critic (episodic) (p.274)
Actor-Critic with Eligibility Traces (episodic) (p.275)
Actor-Critic with Eligibility Traces (continuing) (p.277)

Implemented problems:

Grid world (p.61)
Jack's Car Rental and exercise 4.4 (p.65)
Gambler's Problem (p.68)
Blackjack (p.76)
Random Walk (p.102)
Windy Gridworld and King's Moves (p.106)
Cliff Walking (p.108)
Maximization Bias Example (p.110)
19-state Random Walk (p.118)
Dyna Maze (p.136)
Rod Maneuvering (p.141)
1000-state Random Walk (p.166)
Mountain Car (p.198)
Access-Control Queuing Task (p.204)

Build

Built with Maven

Test cases

Try Testcases

Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state random walk task

Figure 10.1: The Mountain Car task and the cost-to-go function learned during one run

Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding function approximation on the Mountain Car task

Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .

Figure 12.6: 19-state Random walk results: Performance of TD(λ) .

Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms

Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces

Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.

Name		Name	Last commit message	Last commit date
Latest commit History 367 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning: An Introduction

Implemented algorithms:

Implemented problems:

Build

Test cases

About

Releases

Packages

Contributors 2

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

wumo/Reinforcement-Learning-An-Introduction

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning: An Introduction

Implemented algorithms:

Implemented problems:

Build

Test cases

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages