projets for reinforcement learning class
contains:
Do-it-yourself implementation of value iteration, policy iteration
Do-it-yourself implementation of Multi-Armed Bandit algorithms : UCB (upper confidence bound), greedy policy...
Do-it-yourself on :
- On-Policy Reinforcement Learning with Parametric Policy
- Off-Policy Reinforcement Learning with Value Function Approximation