0% found this document useful (0 votes)

13 views

Hota-ML-ReinforcementLearning

The document provides an introduction to Reinforcement Learning (RL) and its applications, including gaming and robotics. It explains the formal modeling of RL using Markov Decision Processes (MDP) and details the Q-Learning algorithm for learning optimal policies. Additionally, it discusses the use of Deep Q-Learning (DQN) to handle large state and action spaces by combining Q-Learning with deep learning techniques.

Uploaded by

2024tm05030

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Hota-ML-ReinforcementLearning

Uploaded by

2024tm05030

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Birla Institute of Technology and Science Pilani, Hyderabad Campus

29.11.2024

BITS F464: Machine Learning (1st Sem 2024-25)

Introduction to Reinforcement Learning(RL)

Chittaranjan Hota, Sr. Professor

Dept. of Computer Sc. and Information Systems
hota@hyderabad.bits-pilani.ac.in
What is Reinforcement Learning?
• Agent tries to maximize the cumulative reward from the
environment by performing a set of actions.

Image source: https://highlandcanine.com/ Image source: UltraTech Cement Stock, 27th Nov 2024 from
www.nseindia.com

Applications: Gaming, Robotics, Autonomous vehicles, Personalized treatment

etc.
Formal Modelling: Markov Decision Process
Markov: The future state can be determined only from the present state that
encapsulates all the necessary information from the past.

What should the player ‘O’ do here to avoid a loss?

MDP Continued…

(Discounted Cumulative Reward)

action

Agent state/obser Environment : Discount factor controlling

future rewards.

reward

St St+1 St+2
Q-Learning Algorithm
• Q-learning is a model-free reinforcement learning (RL) algorithm used to
learn the optimal policy for a Markov Decision Process (MDP)

Initialize Q-Table

Select an Action
After multiple Episodes,
a good Q-Table is ready

Perform Action

Measure Reward
+ [ + max ]
Update Q-Table
An Example of Q-Learning
• Initializing the environment: States: {s0, s1, s2}, Actions: {a0, a1}, Rewards:
R(s0, a0) = -1, R(s0, a1) = +2, R(s1, a0) = +3, R(s1, a1) = +1, R(s2, any action) = 0
(terminal state).

• Transitions: T(s0, a0)  s1, T(s0,a1)  s2 (goal), T(s1,a0)  s2, T(s1, a1)s0

• Parameters: α = 0.5, γ = 0.9, Initial Q-values (Q(s, a) = 0 for all s, a).

• Episode 1:
• current state: s0, action chosen: a0 (randomly using exploration), reward: R(s0,
a0) = -1, next state: s1.
• Update Q(s0,a0) using Bellman’s equation:

• Q (s0, a0)  0 + 0.5 [-1 + 0.9 * max Q(s1, a’) – 0]

• Q(s0, a0)  0.5 * [-1 + 0] = -0.5 (Since, Q(s1, a’) = 0 initially (no knowledge of s1).
Updated Q-values after 3 Episodes
State Action(a0) Action(a1)
Ex. Continued… s0 -0.5 1.0
s1 1.5 0.0
s2 0.0 0.0
•Episode 2: From s1
•current state: s1, action chosen: a0, reward: R(s1, a0) = +3, next state: s2.
•Update Q(s0,a0) using Bellman’s equation:
Q(s1,a0) Q(s1,a0) + α[R+ γ max Q(s2, a’) – Q(s1, a0)]
a’
•Q (s1, a0)  0 + 0.5 [3 + 0.9 * 0 – 0] = 1.5

•Episode 3: Back to s0(different action)

•current state: s0, action chosen: a1, reward: R(s0, a1) = +2, next state: s2.
•Update Q(s0,a1) using Bellman’s equation:
•Q(s0,a1) Q(s0,a1) + α[R+ γ max Q(s2, a’) – Q(s0, a1)]
a’
•Q (s0, a1)  0 + 0.5 [2 + 0.9 * 0 – 0] = 1.0

• Alternatively, you may use an ANN to learn Q-values: Deep Q-Learning (DQN)
Optimal Solution using Q-Learning: Maze
import numpy as np
import
matplotlib.pyplot
as plt

# Maze parameters
maze = [
[0, 1, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 2]
#'2'is the diamond
(goal state)
]

maze = np.array(maze)
Python Code
Continued…

…
Deep Q-Learning (DQN) for RL
• When the number of states and actions become very large, how do you
scale?
• Solution: Combine Q-Learning and Deep Learning Deep Q-Networks (DQN)
• Goal: Approximate a function: Q(s,a; θ), where θ represents the
trainable weights of the network

• Q(s,a) = r(s,a) + γ max Q(s’,a) Bellman’s equation

• Cost = {Q(s,a; θ) – [r(s,a)+γ max Q(s’,a; θ)]}2 ⇔

ANN
Q(a1)
s
Q
Q(a2)
⇒ Q(a3)
a (In-efficient as we need more
iterations) (Improved)
Thank You!

Good luck for Comprehensive Exams!

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
F20-AI-L11
No ratings yet
F20-AI-L11
52 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Unit 4
No ratings yet
Unit 4
12 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Chapter_1_Introduction_RL_Report_Kiran
No ratings yet
Chapter_1_Introduction_RL_Report_Kiran
2 pages
RL_MJJ
No ratings yet
RL_MJJ
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
15
No ratings yet
15
17 pages
Unit-5 Mlt
No ratings yet
Unit-5 Mlt
13 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Neural Networks Reinforcement Learning
No ratings yet
Neural Networks Reinforcement Learning
22 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Algorithms For Reinforcement Learning Csaba Szepesvari instant download
No ratings yet
Algorithms For Reinforcement Learning Csaba Szepesvari instant download
36 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
UNIT-5
No ratings yet
UNIT-5
54 pages
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
15 pages
Lec 09
No ratings yet
Lec 09
26 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
ML unit 4
No ratings yet
ML unit 4
17 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
UNIT- 5
No ratings yet
UNIT- 5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Elementos Basicos Aprendizaje Por Refuerzo
No ratings yet
Elementos Basicos Aprendizaje Por Refuerzo
52 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Q Learning
No ratings yet
Q Learning
38 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Sandholm 1996
No ratings yet
Sandholm 1996
20 pages
Q Learning
No ratings yet
Q Learning
38 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Rule-based Reinforcement Learning augmented by External Knowledge
No ratings yet
Rule-based Reinforcement Learning augmented by External Knowledge
7 pages
dqn-atari
No ratings yet
dqn-atari
26 pages
unit5 mlt
No ratings yet
unit5 mlt
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
RL-PRESENTATION
No ratings yet
RL-PRESENTATION
12 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning: R M V E R I
No ratings yet
Reinforcement Learning: R M V E R I
21 pages
Lec 11
No ratings yet
Lec 11
45 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
Q-Learning Algorithm (1)
No ratings yet
Q-Learning Algorithm (1)
13 pages
14.4 RL
No ratings yet
14.4 RL
17 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
17 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Types of Inquiry
No ratings yet
Types of Inquiry
1 page
BT45231 Display Manual
No ratings yet
BT45231 Display Manual
16 pages
Journal Pre-Proof: Current Developments in Nutrition
No ratings yet
Journal Pre-Proof: Current Developments in Nutrition
51 pages
Community Tax Relief LLC Epic Financial LLC Adam Dayan Bradley Jacob Dayan Chicago Illinois Complaint Mellul Orticelli Navas
No ratings yet
Community Tax Relief LLC Epic Financial LLC Adam Dayan Bradley Jacob Dayan Chicago Illinois Complaint Mellul Orticelli Navas
13 pages
Hyundai - Creta
No ratings yet
Hyundai - Creta
2 pages
MSRTC Time Table
0% (1)
MSRTC Time Table
2 pages
Research Paper Title: Security Issues and Challenges in Ict
No ratings yet
Research Paper Title: Security Issues and Challenges in Ict
2 pages
Referensi Nilai SQM Dan Kecerahan Langit
No ratings yet
Referensi Nilai SQM Dan Kecerahan Langit
15 pages
MMW Final Exam PDF
100% (1)
MMW Final Exam PDF
8 pages
Review Documents for Onboarding for Harsh Raj
No ratings yet
Review Documents for Onboarding for Harsh Raj
2 pages
Goodluck Jonathan: A President in Search of A Legacy
No ratings yet
Goodluck Jonathan: A President in Search of A Legacy
3 pages
Law Insider Air-Conditioning Clause
No ratings yet
Law Insider Air-Conditioning Clause
3 pages
Stress and Anxiety in Orthodontic Residents During
No ratings yet
Stress and Anxiety in Orthodontic Residents During
9 pages
Lab Activity 6 (1)
No ratings yet
Lab Activity 6 (1)
12 pages
Iie Msa Fee Schedule 2024
No ratings yet
Iie Msa Fee Schedule 2024
4 pages
MATH1231 - 1241 Test 1 Algebra
No ratings yet
MATH1231 - 1241 Test 1 Algebra
28 pages
Principles of Taxation For Business and Investment Planning 17th Edition Jones Solutions Manual 1
100% (40)
Principles of Taxation For Business and Investment Planning 17th Edition Jones Solutions Manual 1
36 pages
Snow White and The Seven Dwarfs Read Along
50% (2)
Snow White and The Seven Dwarfs Read Along
28 pages
Getting The: Board
No ratings yet
Getting The: Board
63 pages
Bibliography
No ratings yet
Bibliography
19 pages
10 2307@195772 PDF
No ratings yet
10 2307@195772 PDF
25 pages
Cat 257b Undercarriage Repair Manual SN slk00001 slk06999
No ratings yet
Cat 257b Undercarriage Repair Manual SN slk00001 slk06999
13 pages
WEEKLY REPORT 01_01 Oct_10 Oct_2024
No ratings yet
WEEKLY REPORT 01_01 Oct_10 Oct_2024
9 pages
Scissor Jack Minor Project
No ratings yet
Scissor Jack Minor Project
10 pages
Thermogel No 1 31124
No ratings yet
Thermogel No 1 31124
2 pages
Desing of Footing and Pedestals
No ratings yet
Desing of Footing and Pedestals
22 pages
End of Year Test Grade3 Answers Math
No ratings yet
End of Year Test Grade3 Answers Math
8 pages
Module 4 Electronic Fundamentals - Only Questions (B1)
100% (1)
Module 4 Electronic Fundamentals - Only Questions (B1)
28 pages
The Smartest and Most Accurate Ultrasonic Thickness Gauge
No ratings yet
The Smartest and Most Accurate Ultrasonic Thickness Gauge
4 pages
Quick Installation Guide: Get Started
No ratings yet
Quick Installation Guide: Get Started
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hota-ML-ReinforcementLearning

Uploaded by

Hota-ML-ReinforcementLearning

Uploaded by

Birla Institute of Technology and Science Pilani, Hyderabad Campus

BITS F464: Machine Learning (1st Sem 2024-25)

Chittaranjan Hota, Sr. Professor

Applications: Gaming, Robotics, Autonomous vehicles, Personalized treatment

What should the player ‘O’ do here to avoid a loss?

(Discounted Cumulative Reward)

Agent state/obser Environment : Discount factor controlling

• Parameters: α = 0.5, γ = 0.9, Initial Q-values (Q(s, a) = 0 for all s, a).

• Q (s0, a0)  0 + 0.5 [-1 + 0.9 * max Q(s1, a’) – 0]

•Episode 3: Back to s0(different action)

• Q(s,a) = r(s,a) + γ max Q(s’,a) Bellman’s equation

• Cost = {Q(s,a; θ) – [r(s,a)+γ max Q(s’,a; θ)]}2 ⇔

Good luck for Comprehensive Exams!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.