0% found this document useful (0 votes)

6 views21 pages

Unit-4 MDP

Uploaded by

H Srinivasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views21 pages

Unit-4 MDP

Uploaded by

H Srinivasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Markov Decision process

Track
• MDP formulation,
• utility theory,
• utility functions,
• value iteration,
• policy iteration and
• partially observable MDPs.
Markov Decision process
A Markov decision process (MDP) is defined as a stochastic decision-
making process that uses a mathematical framework to model the
decision-making of a dynamic system in scenarios where the results are
either random or controlled by a decision maker, which makes
sequential decisions over time.
• MDPs rely on variables such as the environment, agent’s actions, and
rewards to decide the system’s next optimal action. They are classified into
four types — finite, infinite, continuous, or discrete — depending on
various factors such as sets of actions, available states, and the decision-
making frequency.

• MDPs have been around since the early part of the 1950s. The name
Markov refers to the Russian mathematician Andrey Markov who played a
pivotal role in shaping stochastic processes. In its initial days, MDPs were
known to solve issues related to inventory management and control,
queuing optimization, and routing matters. Today, MDPs find applications
in studying optimization problems via dynamic programming, robotics,
automatic control, economics, manufacturing, etc.
• In artificial intelligence, MDPs model sequential decision-making
scenarios with probabilistic dynamics. They are used to design intelligent
machines or agents that need to function longer in an environment where
actions can yield uncertain results.
• MDP models are typically popular in two sub-areas of AI: probabilistic
planning and reinforcement learning (RL).

• Probabilistic planning is the discipline that uses known models to

accomplish an agent’s goals and objectives. While doing so, it
emphasizes guiding machines or agents to make decisions while
enabling them to learn how to behave to achieve their goals.
• Reinforcement learning allows applications to learn from the
feedback the agents receive from the environment.
• Let’s understand this through a real-life example:

• Consider a hungry antelope in a wildlife sanctuary looking for food in its environment. It stumbles upon
a place with a mushroom on the right and a cauliflower on the left. If the antelope eats the mushroom,
it receives water as a reward. However, if it opts for the cauliflower, the nearby lion’s cage opens and
sets the lion free in the sanctuary. With time, the antelope learns to choose the side of the mushroom,
as this choice offers a valuable reward in return.

• In the above MDP example, two important elements exist — agent and environment. The agent here is
the antelope, which acts as a decision-maker. The environment reveals the surrounding (wildlife
sanctuary) in which the antelope resides. As the agent performs different actions, different situations
emerge. These situations are labeled as states. For example, when the antelope performs an action of
eating the mushroom, it receives the reward (water) in correspondence with the action and transitions
to another state. The agent (antelope) repeats the process over a period and learns the optimal action
at each state.

• In the context of MDP, we can formalize that the antelope knows the optimal action to perform (eat the
mushroom). Therefore, it does not prefer eating the cauliflower as it generates a reward that can harm
its survival. The example illustrates that MDP is essential in capturing the dynamics of RL problems.
• The MDP model operates by using key elements such as the agent, states, actions,
rewards, and optimal policies. The agent refers to a system responsible for making
decisions and performing actions. It operates in an environment that details the
various states that the agent is in while it transitions from one state to another.
MDP defines the mechanism of how certain states and an agent’s actions lead to
the other states. Moreover, the agent receives rewards depending on the action it
performs and the state it attains (current state). The policy for the MDP model
reveals the agent’s following action depending on its current state.

• The MDP framework has the following key components:

• S: states (s ∈ S)
• A: Actions (a ∈ A)
• P (St+1|st.at): Transition probabilities
• R (s): Reward
• The graphical representation of the MDP model is as follows:

• MDP model :
• The MDP model uses the Markov Property, which states that the future can be determined
only from the present state that encapsulates all the necessary information from the past. The
Markov Property can be evaluated by using this equation:

• P[St+1|St] = P[St+1 |S1,S2,S3……St]

• According to this equation, the probability of the next state (P[St+1]) given the present state
(St) is given by the next state’s probability (P[St+1]) considering all the previous states
(S1,S2,S3……St). This implies that MDP uses only the present/current state to evaluate the
next actions without any dependencies on previous states or actions.
• We have a problem where we need to decide whether the tribes
should go deer hunting or not in a nearby forest to ensure long-term
returns. Each deer generates a fixed return. However, if the tribes
hunt beyond a limit, it can result in a lower yield next year. Hence, we
need to determine the optimum portion of deer that can be caught
while maximizing the return over a longer period.

• The problem statement can be simplified in this case: whether to hunt

a certain portion of deer or not. In the context of MDP, the problem
can be expressed as follows:
States: The number of deer available in the forest in the year under
consideration. The four states include empty, low, medium, and high,
which are defined as follows:

• Empty: No deer available to hunt

• Low: Available deer count is below a threshold t_1
• Medium: Available deer count is between t_1 and t_2
• High: Available deer count is above a threshold t_2
• Rewards: Hunting at each state generates rewards of some kind. The
rewards for hunting at different states, such as state low, medium,
and high, maybe $5K, $50K, and $100k, respectively. Moreover, if the
action results in an empty state, the reward is -$200K. This is due to
the required e-breeding of new deer, which involves time and money.
• State transitions: Hunting in a state causes the transition to a state
with fewer deer. Subsequently, the action of no_hunting causes the
transition to a state with more deer, except for the ‘high’ state.
Examples of the Markov Decision Process
1. Routing problems
2.Managing maintenance and repair of dynamic systems
3. Designing intelligent machines
4. Designing quiz games
5. Managing wait time at a traffic intersection
6. Determining the number of patients to admit to a hospital
UTILITY THEORY AND UTILITY
FUNCTIONS

Current Electricity Sheet Solns
No ratings yet
Current Electricity Sheet Solns
95 pages
UNIT-4 OF AI
No ratings yet
UNIT-4 OF AI
9 pages
Engineering Mechanics Statics 14th Edition Hibbeler Solutions Manual download
100% (3)
Engineering Mechanics Statics 14th Edition Hibbeler Solutions Manual download
56 pages
Subject Enrichment Material Class X Maths
No ratings yet
Subject Enrichment Material Class X Maths
296 pages
MTS ans Key Rj
No ratings yet
MTS ans Key Rj
34 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Markov Decision
100% (3)
Markov Decision
212 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Microsoft PowerPoint - Lecture20Final-Part1
No ratings yet
Microsoft PowerPoint - Lecture20Final-Part1
65 pages
Unit 03 RL Problem
No ratings yet
Unit 03 RL Problem
9 pages
06 MDP
No ratings yet
06 MDP
89 pages
RL-DQN-PG
No ratings yet
RL-DQN-PG
65 pages
Optimization of Energy Management For Fuel Cell Electric Vehicle
No ratings yet
Optimization of Energy Management For Fuel Cell Electric Vehicle
57 pages
Calculus Handout@benedict
No ratings yet
Calculus Handout@benedict
80 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Pillow Block FBJ Chumaceras
No ratings yet
Pillow Block FBJ Chumaceras
62 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
AI (IT) UNIT-4
No ratings yet
AI (IT) UNIT-4
37 pages
Orifice (Part 2)
No ratings yet
Orifice (Part 2)
13 pages
Transformation
No ratings yet
Transformation
49 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Lecture4 Model Free Prediction
No ratings yet
Lecture4 Model Free Prediction
34 pages
PDF Unit-5(Full Unit)
No ratings yet
PDF Unit-5(Full Unit)
37 pages
Lecture#2_Markov Decision Process MDP An Introduction 2023
No ratings yet
Lecture#2_Markov Decision Process MDP An Introduction 2023
36 pages
Lecture Notes
No ratings yet
Lecture Notes
29 pages
Robust Markov Decision Processes- A Place Where AI and Formal Methods Meet
No ratings yet
Robust Markov Decision Processes- A Place Where AI and Formal Methods Meet
29 pages
Markov Decision Process
No ratings yet
Markov Decision Process
15 pages
Mdp
No ratings yet
Mdp
21 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
15 Factors Affecting The Selection of Construction Materials
No ratings yet
15 Factors Affecting The Selection of Construction Materials
6 pages
Characterization of The Diffusion Properties of Chromium in Stainless-Steel Oxides by
No ratings yet
Characterization of The Diffusion Properties of Chromium in Stainless-Steel Oxides by
7 pages
Markov Decision Processes (MDP) : Sudeshna Sarkar
No ratings yet
Markov Decision Processes (MDP) : Sudeshna Sarkar
14 pages
119686
No ratings yet
119686
24 pages
Mondal Smdp Alg
No ratings yet
Mondal Smdp Alg
23 pages
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
No ratings yet
An Introduction To Markov Decision Processes: Bob Givan Ron Parr Purdue University Duke University
23 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
Unit 1, 2 RL
No ratings yet
Unit 1, 2 RL
29 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
CSE4037 Reinforcement Learning: A Partially Observable Markov Decision Process
No ratings yet
CSE4037 Reinforcement Learning: A Partially Observable Markov Decision Process
19 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
MIT16 410F10 Lec22
No ratings yet
MIT16 410F10 Lec22
19 pages
mondal-smdps
No ratings yet
mondal-smdps
17 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
POMDP Tutoria POMDP - Tutoriall
No ratings yet
POMDP Tutoria POMDP - Tutoriall
55 pages
RL_UNIT-II (1)
No ratings yet
RL_UNIT-II (1)
14 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Markov Decision Process Tutorial
No ratings yet
Markov Decision Process Tutorial
22 pages
9_central potential problem
No ratings yet
9_central potential problem
9 pages
Markov Decision
No ratings yet
Markov Decision
11 pages
AS02
No ratings yet
AS02
16 pages
Collis 1959
No ratings yet
Collis 1959
28 pages
Markov Decision Process
No ratings yet
Markov Decision Process
8 pages
Chapter-9-Application-of-Differential-Equations-4
No ratings yet
Chapter-9-Application-of-Differential-Equations-4
15 pages
RL Ese
No ratings yet
RL Ese
7 pages
Machine 2-Problem
No ratings yet
Machine 2-Problem
7 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
MODULE6 7 A Partially Observable Markov Decision Process
No ratings yet
MODULE6 7 A Partially Observable Markov Decision Process
19 pages
UNIT 4 (2)
No ratings yet
UNIT 4 (2)
6 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
Practice Exam 2 - Answer Key
No ratings yet
Practice Exam 2 - Answer Key
5 pages
Policies, Search, Utility
No ratings yet
Policies, Search, Utility
13 pages
Markov decision
No ratings yet
Markov decision
4 pages
Logistics: CSE 473 Markov Decision Processes
No ratings yet
Logistics: CSE 473 Markov Decision Processes
10 pages
F07 Hw06a
No ratings yet
F07 Hw06a
13 pages
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
No ratings yet
Artificial Intelligence and Intelligent Agents (F29AI) MDP I: Intro To Markov Decision Processes
10 pages
Maths Test Class 10th Quad AP and triangles
No ratings yet
Maths Test Class 10th Quad AP and triangles
3 pages
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
9 pages
Types of Reinforcement Learning MDP
No ratings yet
Types of Reinforcement Learning MDP
3 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
THT/CL-40-2T-1.5-F-400 IE3: Code: 1139561
No ratings yet
THT/CL-40-2T-1.5-F-400 IE3: Code: 1139561
3 pages
Mat Jee
No ratings yet
Mat Jee
4 pages
Prd-Wi-04 Moulding
No ratings yet
Prd-Wi-04 Moulding
2 pages
120 Internal Exam Notice Sem I 2020 Batch B.Tech Poly 20 21
No ratings yet
120 Internal Exam Notice Sem I 2020 Batch B.Tech Poly 20 21
3 pages
LK-G155_Datasheet
No ratings yet
LK-G155_Datasheet
2 pages
Tutorial Sheet - 1 (Analysis of Forces) - Engineering Mechanics (AM12101)
0% (1)
Tutorial Sheet - 1 (Analysis of Forces) - Engineering Mechanics (AM12101)
2 pages
Half Yearly Syllabus
No ratings yet
Half Yearly Syllabus
1 page
Finals_Correl2_Treb_240713_094008
No ratings yet
Finals_Correl2_Treb_240713_094008
4 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
EMG 2502 - Tutorial Sheet 4
No ratings yet
EMG 2502 - Tutorial Sheet 4
21 pages
Definitions
No ratings yet
Definitions
2 pages
CV Yujie QF
No ratings yet
CV Yujie QF
1 page
Method Statement FOR Earth Resistivity Test
100% (2)
Method Statement FOR Earth Resistivity Test
9 pages
AP Calculus BC 2016 Scoring Guidelines
No ratings yet
AP Calculus BC 2016 Scoring Guidelines
7 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Markov Decision Process: Fundamentals and Applications
From Everand
Markov Decision Process: Fundamentals and Applications
Fouad Sabry
No ratings yet
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit-4 MDP

Uploaded by

Unit-4 MDP

Uploaded by

Markov Decision process

• Probabilistic planning is the discipline that uses known models to

• The MDP framework has the following key components:

• P[St+1|St] = P[St+1 |S1,S2,S3……St]

• The problem statement can be simplified in this case: whether to hunt

• Empty: No deer available to hunt

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.