0% found this document useful (0 votes)

8 views8 pages

10 ML Introduction to Reinforcement Learning

The document provides an overview of Reinforcement Learning (RL), including its key features, terminology, and the Markov Decision Process (MDP) framework. It explains the roles of agents, environments, states, actions, rewards, and policies in RL, as well as the concepts of episodic and continuing tasks. Additionally, it discusses the importance of maximizing expected returns and the relationship between returns and policies in the context of MDPs.

Uploaded by

tdr2mqm6gr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

10 ML Introduction to Reinforcement Learning

Uploaded by

tdr2mqm6gr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Content

Introduction
Simplified view of Reinforcement Learning
Different ML techniques
Applications of RL
RL Terminology
Categories or RL algorithms

Content
Introduction
Simplified view of Reinforcement Learning
Different ML techniques
Applications of RL
RL Terminology
Categories or RL algorithms
Markov Decision Process
Definition
Markov Property
Reward, Goal, Episodes, Returns, Policy, Value Functions

Introduction | Key Features of RL

No explicit teacher
Learning by trial and error
Learning through repeated agent-environment interactions
Goal-oriented learning by maximizing cumulative reward
Delayed rewards
Need to balance exploitation vs exploration

1/ 8
The artificial entity that is being trained to perform a
Agent task by learning from
its own experience.
Everything outside the purview of the agent. The
environment has its own
Environment
internal dynamics which is usually not visible to the
agent.
Current situation of the environment (as observed by
State the agent), which
forms the basis for the decisions taken by the agent.
Choices made by the agent to change the state of the
Action
environment.
Scalar quantity emitted by the environment in response
Reward
to the action.
The cumulative sum of future rewards to be received
Return
by the agent.
Goal Maximize the expected return.
Defines the agent's behavior. This can be viewed as a
Policy mapping from per-
ceived states to actions to be taken in those states.
Specifies what is good in the long run. Value of a state
Value is the expected
Function return that an agent can expect to get starting from
that state.
Something that mimics the behavior of the
environment and allows infer-
Model
ences to be made about how the environment will
behave.

Markov Decision Process Intuitive Definition

An agent in RL must be capable of:
sensing the state of the environment
taking actions for affecting the state
realizing goals related to the state of the environment

2/ 8
What is a Markov Decision Process?
A Markov Decision Process (MDP) is a formal mathematical framework that is used to
define the interation between the agent and its environment in terms of states, actions and
rewards.

Markov Decision Process Formal Definition

Markov Decision Process (MDP)

An MDP is defined as the tuple , where:
is a finite set of states.
is a finite set of actions.
is a state transition probability function, which defines the probability of
transitioning to the next state from the current state on taking the action .

is a reward function, which defines the expected reward to be received on taking a

particular action in a given state.

is a discount factor for assigning more importance to immediate rewards.

Markov Property
A state is said to possess the Markov Property when it includes information about all
aspects of the past agent-environment interaction that make a difference for the future
(future is independent of past states, actions and rewards).

Exercise: Markovian and Non-Markovian Env.

(1) Devise an example task that fits into the MDP framework, identifying for each its states,
actions, and rewards.

(2) Can you think of an environment in which states do not have the Markov property?

3/ 8
Reward
The reward is a scalar quantity that forms the basis of evaluating the action taken
by an agent.
Reward is a measure of the immediate benefit of taking a particular action
The agent must be able to measure how well it is performing frequently over its
lifespan
If rewards are sparse figuring out good actions can be difficult

Exercise: Maze Runner

What is a good choice of rewards for a maze solver?

Goal
The goal of an RL agent is to maximize the expected return (cumulative rewards from
current state to final state).

Any goal can be thought of as the maximization of the expected return

A goal should be outside the agent's direct control

Episodic Tasks
Agent-environment interaction breaks down naturally into subsequences known as
episodes
Agent's state reset after terminal state

Continuing Tasks
Interaction does not break down into sub-sequences (e.g. gas pipeline monitoring, heating
system monitoring)

4/ 8
Markov Decision Process Returns
For episodic tasks, if the agent expects to receive rewards
from time till time , the return is defined as:

For continuing tasks, , so can evaluate to

A discounting factor used to limit the value of to a finite quantity.

Return

Markov Decision Process Recursive Relationship of

Return
The relationship between returns at successive steps can be easily derived:
Recursive relationship between and

Exercise: Calculate the Return

Suppose and the following sequence of rewards is received
, and , with . What are ? Hint:
Work backwards.

Markov Decision Process | Unified Notation

Episodic tasks can be viewed as a special case of continuing tasks
Terminal state acts as an absorbing state for which the reward is always 0

Both continuing and episodic tasks using

5/ 8
Markov Decision Process | Policy

Policy

Policy is a mapping from states to probabilities of selecting an action.

RL methods specify how the agent changes its policy based on its experience
A good policy is one that results in a lot of rewards in the long run

Markov Decision Process Value Functions

State-value Function
The state-value function of a state under a policy is defined as the expected return when
starting in and following thereafter.

Action-value Function
The action-value function of a state and action a under a policy is defined as the
expected return when starting in , taking the action a (which may not necessarily be
predicted by ) and following thereafter.

Markov Decision Process Recap

6/ 8
Term Description Expression
where
- is a finite set of states.
Framework
- is a finite set of actions.
defining
agent- is a state transition prob. func.
MDP
environment
interaction - is a reward function

- is a discount factor,
Current
state
Markov includes all
information -
Property
about the
past
Scalar
quantity for
evaluating
Reward
the
agent's
action.

Discounted
Return sum of future
rewards.
$
Maximize
Goal expected at each
Return

Markov Decision Process Recap

7/ 8
Term Description Expression
Mapping
from states
to
Policy probabilities
of
actions.
Expected
Return
State- when
value starting in
Function and
following $
thereafter.
Expected
Return
when
Action- starting in ,
value taking
Function action and
following
$
there-
after.

8/ 8

Pulsed Laser Ablation Advances and Applications in Nanoparticles and Nanostructuring Thin Films 1st Edition Ion N. Mihailescu 2024 Scribd Download
100% (3)
Pulsed Laser Ablation Advances and Applications in Nanoparticles and Nanostructuring Thin Films 1st Edition Ion N. Mihailescu 2024 Scribd Download
62 pages
Unit 03 RL Problem
No ratings yet
Unit 03 RL Problem
9 pages
Download Full Comprehensive Glycoscience: From Chemistry to Systems Biology 2nd Edition Joseph Barchi (Editor) - eBook PDF PDF All Chapters
100% (1)
Download Full Comprehensive Glycoscience: From Chemistry to Systems Biology 2nd Edition Joseph Barchi (Editor) - eBook PDF PDF All Chapters
50 pages
Proposal Flood Barrier Neww
No ratings yet
Proposal Flood Barrier Neww
26 pages
Performance of Grid-Connected Solar Photovoltaic P
No ratings yet
Performance of Grid-Connected Solar Photovoltaic P
9 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
Understanding the Markov Decision Process (MDP) _ Built In
No ratings yet
Understanding the Markov Decision Process (MDP) _ Built In
18 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
Markov Decision Process (MDP)
No ratings yet
Markov Decision Process (MDP)
31 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
(eBook PDF) Social Science: An Introduction to the Study of Society 16th Edition - The ebook in PDF format is ready for immediate access
100% (1)
(eBook PDF) Social Science: An Introduction to the Study of Society 16th Edition - The ebook in PDF format is ready for immediate access
50 pages
Entertain Tabs
No ratings yet
Entertain Tabs
24 pages
Medical Waste Management Market
No ratings yet
Medical Waste Management Market
8 pages
FPA630 Crop Science Assignment 24oct24
No ratings yet
FPA630 Crop Science Assignment 24oct24
2 pages
Lista de Comandos Database Wooldridge
No ratings yet
Lista de Comandos Database Wooldridge
9 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
ECO3021 Econometrics Test 1-1
No ratings yet
ECO3021 Econometrics Test 1-1
4 pages
Mdp
No ratings yet
Mdp
21 pages
kguh
No ratings yet
kguh
38 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
11 pages
Assigment
No ratings yet
Assigment
11 pages
Chapter17 1
No ratings yet
Chapter17 1
40 pages
ML Unit-5
No ratings yet
ML Unit-5
9 pages
WHO Drug Informat Ion: Volume 36 Number 4 2022
No ratings yet
WHO Drug Informat Ion: Volume 36 Number 4 2022
281 pages
Markov decision
No ratings yet
Markov decision
4 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
21 pages
Larson Questions 1
No ratings yet
Larson Questions 1
21 pages
UNIT 4 (2)
No ratings yet
UNIT 4 (2)
6 pages
PDF Unit-5(Full Unit)
No ratings yet
PDF Unit-5(Full Unit)
37 pages
A crash course on reinforcement learning - Felix Wagner
No ratings yet
A crash course on reinforcement learning - Felix Wagner
84 pages
Unit1 Matrix and Array
No ratings yet
Unit1 Matrix and Array
19 pages
CSE2530__Reinforcement_Learning__2025_P1+2
No ratings yet
CSE2530__Reinforcement_Learning__2025_P1+2
115 pages
RL Lecturer (1)
No ratings yet
RL Lecturer (1)
38 pages
RL-DQN-PG
No ratings yet
RL-DQN-PG
65 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
RL Frra
No ratings yet
RL Frra
9 pages
Brochure TOMRA 5B - EN
100% (1)
Brochure TOMRA 5B - EN
2 pages
EET305 SS Lecture Notes Full
No ratings yet
EET305 SS Lecture Notes Full
152 pages
RL
No ratings yet
RL
62 pages
2
No ratings yet
2
23 pages
RL UNIT - II
No ratings yet
RL UNIT - II
20 pages
RL RS-Unit_3 (1)
No ratings yet
RL RS-Unit_3 (1)
6 pages
AI (IT) UNIT-4
No ratings yet
AI (IT) UNIT-4
37 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
20 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Unit 04 Finite Markov Decision Processes
No ratings yet
Unit 04 Finite Markov Decision Processes
8 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Machine Learning For NLP
No ratings yet
Machine Learning For NLP
58 pages
CLI Global Sustainability Report 2021 PDF
No ratings yet
CLI Global Sustainability Report 2021 PDF
86 pages
20AI903_RL_UNIT 2
No ratings yet
20AI903_RL_UNIT 2
27 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
AS02
No ratings yet
AS02
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
35 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
Item 401 - Reinforcing Steel
No ratings yet
Item 401 - Reinforcing Steel
3 pages
Sections
No ratings yet
Sections
76 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
RL Frra
No ratings yet
RL Frra
10 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
LTA M&W - Road Works Marked
No ratings yet
LTA M&W - Road Works Marked
69 pages
Electrical Installers Guidelines Presentation
No ratings yet
Electrical Installers Guidelines Presentation
16 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
CMO Sample Papers For Class 9
No ratings yet
CMO Sample Papers For Class 9
4 pages
Notes of CH 2 Physical Features of India - Class 9th Geography Study Rankers PDF
100% (2)
Notes of CH 2 Physical Features of India - Class 9th Geography Study Rankers PDF
10 pages
111, NDT Brochure
No ratings yet
111, NDT Brochure
4 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Putnam, Hilary - What Is Mathematical Truth
No ratings yet
Putnam, Hilary - What Is Mathematical Truth
10 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
American Welding Society Welding Symbols
No ratings yet
American Welding Society Welding Symbols
11 pages
Chapter 10 Polar Coordinates
No ratings yet
Chapter 10 Polar Coordinates
4 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
No ratings yet
A Review of Reinforcement Learning For Financial Time Series Prediction and Portfolio Optimization
38 pages
Other Emotion Scale Opt - 3.3 - UP
No ratings yet
Other Emotion Scale Opt - 3.3 - UP
1 page
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
No ratings yet
MULTIPLE CHOICE. Choose The One Alternative That Best Completes The Statement or Answers The Question
4 pages
Planning Axiom Cables-1
No ratings yet
Planning Axiom Cables-1
5 pages
Awoke LSEThesis
No ratings yet
Awoke LSEThesis
38 pages
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

10 ML Introduction to Reinforcement Learning

Uploaded by

10 ML Introduction to Reinforcement Learning

Uploaded by

Content

Introduction | Key Features of RL

Markov Decision Process Intuitive Definition

Markov Decision Process Formal Definition

Markov Decision Process (MDP)

is a reward function, which defines the expected reward to be received on taking a

is a discount factor for assigning more importance to immediate rewards.

Exercise: Markovian and Non-Markovian Env.

Exercise: Maze Runner

Any goal can be thought of as the maximization of the expected return

For continuing tasks, , so can evaluate to

Markov Decision Process Recursive Relationship of

Exercise: Calculate the Return

Markov Decision Process | Unified Notation

Both continuing and episodic tasks using

Policy is a mapping from states to probabilities of selecting an action.

Markov Decision Process Value Functions

Markov Decision Process Recap

Markov Decision Process Recap

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.