Multi-Agent Systems and Strategic Decision Making: Module CS4760
Multi-Agent Systems and Strategic Decision Making: Module CS4760
Lecture 1: Introduction to
Reinforcement Learning
Maria Chli
Based on D. Silver’s, Fei-Fei Li, J. Johnson & S. Yeung lecture series and
Sutton & Barto 2nd ed (2018)
Learning Outcomes
By the end of this lecture you should
be able to
• define a Reinforcement Learning System
and outline its major components
• recognise decision-making problems that
lend themselves to RL
• differentiate RL from other strands of
Machine Learning
2
What is an Agent?
An agent is a computer system capable of flexible
autonomous action in some environment in order to
meet its design objectives. [Wooldridge and
Jennings,1995]
perception
decision
action
3
What is Reinforcement Learning?
4
Reinforcement Learning
Observation
Reward
Environment Agent
Action
5
What is Reinforcement Learning?
Computer
Science
Engineering
Neuroscience
Machine Learning
RL
Operational Research
Mathematics Psychology
Classical/Operant
Conditioning
Bounded Rationality
Economics
Reinforcement
Learning
Unsupervised Supervised
Learning Learning
7
Characteristics of RL
What makes RL different from other machine
learning paradigms?
• There is no supervisor, only a reward signal
• Feedback may be delayed, not
instantaneous
• Time really matters (sequential data, non
i.i.d)
• Agent senses environment and its actions
affect the environment
8
Reinforcement Learning
Observation
Reward
Environment Agent
Action
9
Reinforcement Learning
Environment Agent
10
Reinforcement Learning
State st
Environment Agent
11
Reinforcement Learning
State st
Environment Agent
Action at
12
Reinforcement Learning
State st
Reward Rt
Environment Agent
Action at
13
Reinforcement Learning
State st
Reward Rt
Next state st+1
Environment Agent
Action at
14
Examples: Cart pole problem
Objective: Balance a pole on top of a movable cart
15
Examples: Robot Locomotion
Objective: Make the robot move forward
16
Examples: Atari Games
Objective: Complete the
game with the highest score
17
Examples: Go
Objective: Win the game!
19
Major Components of an RL
system
1. Policy: defines agent’s behaviour function
Learning agent’s way of behaving at a given time
2. Reward signal defines the goal of an RL problem
Agent’s objective: maximize the total reward it receives in
the long run
3. Value function: how good is each state and/or action
Whereas rewards determine the immediate desirability of
states, values indicate their long-term desirability
4. Model: agent’s representation of the environment
Eg. given a state and action, the model predicts the
resultant next state and reward. Models are used for
deciding on a course of action by considering possible
future situations before they are actually experienced. 20
Learning Outcomes
By the end of this lecture you should
be able to
• define a Reinforcement Learning System
and outline its major components
• recognise decision-making problems that
lend themselves to RL
• differentiate RL from other strands of
Machine Learning
21