0% found this document useful (0 votes)
62 views21 pages

Multi-Agent Systems and Strategic Decision Making: Module CS4760

1. Reinforcement learning involves an agent learning how to maximize rewards through interaction with an environment without an explicit teacher. 2. The key components of a reinforcement learning system are the policy, which defines the agent's behavior, the reward signal which defines the goal, the value function which estimates the long-term desirability of states and actions, and the environment model which predicts the results of actions. 3. Reinforcement learning differs from other machine learning methods in that it involves sequential decision making where feedback may be delayed and actions have long-term consequences.

Uploaded by

Tùng Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views21 pages

Multi-Agent Systems and Strategic Decision Making: Module CS4760

1. Reinforcement learning involves an agent learning how to maximize rewards through interaction with an environment without an explicit teacher. 2. The key components of a reinforcement learning system are the policy, which defines the agent's behavior, the reward signal which defines the goal, the value function which estimates the long-term desirability of states and actions, and the environment model which predicts the results of actions. 3. Reinforcement learning differs from other machine learning methods in that it involves sequential decision making where feedback may be delayed and actions have long-term consequences.

Uploaded by

Tùng Đào
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Module CS4760

Multi-Agent Systems and


Strategic Decision Making

Lecture 1: Introduction to
Reinforcement Learning
Maria Chli

Based on D. Silver’s, Fei-Fei Li, J. Johnson & S. Yeung lecture series and
Sutton & Barto 2nd ed (2018)
Learning Outcomes
By the end of this lecture you should
be able to
• define a Reinforcement Learning System
and outline its major components
• recognise decision-making problems that
lend themselves to RL
• differentiate RL from other strands of
Machine Learning

2
What is an Agent?
An agent is a computer system capable of flexible
autonomous action in some environment in order to
meet its design objectives. [Wooldridge and
Jennings,1995]

perception

decision

action

3
What is Reinforcement Learning?

• We consider the problem of learning


how to act, through experience and
without an explicit teacher.
• An RL agent must interact with its
world and from that learn how to
maximize some cumulative reward
over time.

4
Reinforcement Learning
Observation
Reward

Environment Agent

Action

Goal: Learn how to take actions in order to maximise reward

5
What is Reinforcement Learning?
Computer
Science

Engineering
Neuroscience
Machine Learning

Control Reward System

RL
Operational Research
Mathematics Psychology
Classical/Operant
Conditioning

Bounded Rationality

Economics

The problem: decision making 6


Machine learning - branches

Reinforcement
Learning

Unsupervised Supervised
Learning Learning

7
Characteristics of RL
What makes RL different from other machine
learning paradigms?
• There is no supervisor, only a reward signal
• Feedback may be delayed, not
instantaneous
• Time really matters (sequential data, non
i.i.d)
• Agent senses environment and its actions
affect the environment
8
Reinforcement Learning
Observation
Reward

Environment Agent

Action

Goal: Learn how to take actions in order to maximise reward

9
Reinforcement Learning

Environment Agent

10
Reinforcement Learning

State st

Environment Agent

11
Reinforcement Learning

State st

Environment Agent

Action at

12
Reinforcement Learning
State st
Reward Rt

Environment Agent

Action at

13
Reinforcement Learning
State st
Reward Rt
Next state st+1

Environment Agent

Action at

Goal: Learn how to take actions in order to maximise reward

14
Examples: Cart pole problem
Objective: Balance a pole on top of a movable cart

State: angle, angular speed, position, horizontal velocity


Action: horizontal force applied on the cart
Reward:1 at each time step if the pole is upright

15
Examples: Robot Locomotion
Objective: Make the robot move forward

State: angle and position of the joints


Action: Torques applied on joints
Reward:1 at each time step upright & forward movement

16
Examples: Atari Games
Objective: Complete the
game with the highest score

State: Raw pixel inputs of


the game state
Action: Game controls e.g.
Left, Right, Up, Down
Reward: Score
increase/decrease at each
time step

17
Examples: Go
Objective: Win the game!

State: Position of all pieces


Action: Where to put the
next piece down
Reward: 1 if win at the end
of the game, 0 otherwise

Many more examples:


• Fly stunt manoeuvres in a helicopter
• Defeat the world champion at Backgammon
• Manage an investment portfolio
• Control a power station 18
Sequential Decision Making
• Goal: select actions to maximise total future
reward
• Actions may have long term consequences
• Reward may be delayed
• It may be better to sacrifice immediate reward to
gain more long-term reward. Examples:
– A financial investment (may take months to mature)
– Blocking opponent moves (might help winning chances
many moves from now)

19
Major Components of an RL
system
1. Policy: defines agent’s behaviour function
Learning agent’s way of behaving at a given time
2. Reward signal defines the goal of an RL problem
Agent’s objective: maximize the total reward it receives in
the long run
3. Value function: how good is each state and/or action
Whereas rewards determine the immediate desirability of
states, values indicate their long-term desirability
4. Model: agent’s representation of the environment
Eg. given a state and action, the model predicts the
resultant next state and reward. Models are used for
deciding on a course of action by considering possible
future situations before they are actually experienced. 20
Learning Outcomes
By the end of this lecture you should
be able to
• define a Reinforcement Learning System
and outline its major components
• recognise decision-making problems that
lend themselves to RL
• differentiate RL from other strands of
Machine Learning

21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy