0% found this document useful (0 votes)

6 views

Chapter_1_Introduction_RL_Report_Kiran

Reinforcement Learning (RL) trains agents to make decisions through interaction with environments, focusing on maximizing cumulative rewards. The document discusses the Deep Q-Network (DQN) algorithm, an advanced method in RL that enhances classical Q-learning using deep neural networks, and outlines its key innovations like Experience Replay and Target Network. The implementation targets the CartPole balancing problem, utilizing DQN's capabilities to effectively manage the exploration-exploitation trade-off in a continuous state space.

Uploaded by

tkirangowda15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Chapter_1_Introduction_RL_Report_Kiran

Uploaded by

tkirangowda15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

1.

Introduction

1.1 Overview of Reinforcement Learning

Reinforcement Learning (RL) is a paradigm within machine learning that focuses on
training an agent to make sequential decisions by interacting with an environment. Unlike
supervised learning, where ground-truth labels guide the learning process, RL agents learn
from the consequences of their actions through rewards or penalties. The learning goal is to
maximize the expected cumulative reward over time.

Formally, an RL setting is often modeled as a Markov Decision Process (MDP), which

comprises:
- A set of states S
- A set of actions A
- A reward function R(s, a)
- A state transition function P(s'|s, a)
- A discount factor γ ∈ [0,1]

At each discrete time step t, the agent observes a state s_t, selects an action a_t, receives a
reward r_t, and transitions to a new state s_{t+1}. This cycle continues until a terminal state
is reached. The agent's behavior is governed by a policy π(a|s), which maps states to
actions.

Reinforcement Learning approaches can be broadly categorized as:

- Value-based methods (e.g., Q-learning)
- Policy-based methods (e.g., REINFORCE)
- Actor-Critic methods

Over time, classical algorithms like Q-learning have evolved into deep reinforcement
learning methods such as Deep Q-Networks (DQN), which use neural networks as function
approximators to handle high-dimensional input spaces. These advances have enabled RL
to operate in complex environments that were previously infeasible.

1.2 Key Algorithms in RL Used in This Implementation

The selected problem—CartPole balancing—is addressed using the Deep Q-Network (DQN)
algorithm, a foundational method in deep reinforcement learning. DQN is an extension of
the classical Q-learning algorithm, enhanced with deep neural networks to handle
continuous and high-dimensional state spaces.

Q-learning is a model-free, value-based RL algorithm. It seeks to learn the optimal action-

value function Q*(s,a), which gives the maximum expected future reward achievable from a
given state-action pair under the optimal policy. The Q-function is updated iteratively using
the Bellman equation:
Q(s_t, a_t) ← Q(s_t, a_t) + α [ r_t + γ max_{a'} Q(s_{t+1}, a') - Q(s_t, a_t) ]

where:
- α is the learning rate
- γ is the discount factor
- r_t is the reward at time t

DQN enhances Q-learning by approximating the Q-function using a deep neural network
Q(s, a; θ), where θ are the trainable weights. Key innovations that stabilize training in DQN
include:

- Experience Replay: A buffer that stores previous transitions (s, a, r, s'), enabling random
sampling and breaking correlation between sequential data points.
- Target Network: A separate network Q' with fixed weights θ' used to compute the target
Q-value. It is periodically updated to match the main network.
- Epsilon-Greedy Policy: Introduces exploration by selecting a random action with
probability ε and the best-known action with probability 1 - ε.

The loss function minimized in DQN is:

L(θ) = E_{(s,a,r,s')} [(r + γ max_{a'} Q(s', a'; θ⁻) - Q(s, a; θ))²]

This formulation ensures that the network learns to approximate the optimal action-value
function over time.

The CartPole environment from OpenAI Gym features:

- A continuous 4-dimensional state space
- A discrete 2-action space (left or right force)
- A reward signal of +1 for every time step the pole remains upright

Given the continuous nature of the state representation and the simplicity of the action
space, DQN is highly effective. It provides a clear and interpretable case study in balancing
the exploration-exploitation trade-off and understanding value approximation using neural
networks. Additionally, DQN allows for easy integration with the Stable Baselines3 library,
enabling fast and reproducible implementation with visual monitoring tools such as
TensorBoard.

Efden-001-002 FD Controller Maintenance
100% (1)
Efden-001-002 FD Controller Maintenance
70 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Layer Atmosphere
100% (1)
Layer Atmosphere
6 pages
Chief Mate Oral Exam Questions
90% (10)
Chief Mate Oral Exam Questions
6 pages
RLDL_PBL_AmriteshChandra_09411503121
No ratings yet
RLDL_PBL_AmriteshChandra_09411503121
15 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Hota-ML-ReinforcementLearning
No ratings yet
Hota-ML-ReinforcementLearning
12 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
2410.22766v1
No ratings yet
2410.22766v1
12 pages
15
No ratings yet
15
17 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
CS6700 Programming Assignment 2
No ratings yet
CS6700 Programming Assignment 2
17 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
No ratings yet
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
10 pages
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
No ratings yet
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
8 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
RL PyTexas 2017 PDF
No ratings yet
RL PyTexas 2017 PDF
29 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Chapter_2_3_Problem_and_Methodology_RL_Report_Kiran
No ratings yet
Chapter_2_3_Problem_and_Methodology_RL_Report_Kiran
3 pages
dqn-atari
No ratings yet
dqn-atari
26 pages
Reinforcement_Learning_Basics_and_Beyond
No ratings yet
Reinforcement_Learning_Basics_and_Beyond
1 page
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Algorithms For Reinforcement Learning Csaba Szepesvari instant download
No ratings yet
Algorithms For Reinforcement Learning Csaba Szepesvari instant download
36 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Conservativeddpg
No ratings yet
Conservativeddpg
13 pages
Untitled document
No ratings yet
Untitled document
11 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
UNIT 5 ML
No ratings yet
UNIT 5 ML
49 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
UNIT- 5
No ratings yet
UNIT- 5
43 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Reinforcement Learning - Basics
No ratings yet
Reinforcement Learning - Basics
7 pages
Unit 3
No ratings yet
Unit 3
12 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
DL questions
No ratings yet
DL questions
30 pages
unit 4
No ratings yet
unit 4
23 pages
Demonstration Final Presentation (1)
No ratings yet
Demonstration Final Presentation (1)
59 pages
Modern_Deep_Reinforcement_Learning_Algorithms
No ratings yet
Modern_Deep_Reinforcement_Learning_Algorithms
56 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
RL Course Report
No ratings yet
RL Course Report
10 pages
Deep Reinforcement Learning Handout v2.0.docx (1)
0% (1)
Deep Reinforcement Learning Handout v2.0.docx (1)
6 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
100% (5)
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
62 pages
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
No ratings yet
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
1 page
Op Tim Ization
No ratings yet
Op Tim Ization
19 pages
Einforcement Learning
No ratings yet
Einforcement Learning
27 pages
Unit3
No ratings yet
Unit3
13 pages
SSRN 4768234
No ratings yet
SSRN 4768234
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter I: Backgrounf of The Problem
No ratings yet
Chapter I: Backgrounf of The Problem
8 pages
Model 4010lx Operator Manual
No ratings yet
Model 4010lx Operator Manual
68 pages
Al Agha 2021
No ratings yet
Al Agha 2021
10 pages
Distillation Lecture Note-2
No ratings yet
Distillation Lecture Note-2
20 pages
Purple Comet MATH MEET April 2019
No ratings yet
Purple Comet MATH MEET April 2019
11 pages
25-RBA (Responsible Business Alliance) Member - Lenovo
No ratings yet
25-RBA (Responsible Business Alliance) Member - Lenovo
3 pages
Properties of A Well Written Text
No ratings yet
Properties of A Well Written Text
10 pages
Chapter - 3 Southeast Asia-Political Structure
No ratings yet
Chapter - 3 Southeast Asia-Political Structure
40 pages
Magnetic Materials: Maximum Torque N I S B
No ratings yet
Magnetic Materials: Maximum Torque N I S B
15 pages
At-222 Final Term Module 11
No ratings yet
At-222 Final Term Module 11
10 pages
CQHRM-293 Group PPT 01 Strategic Leadership and Navigation CQHRM-S8-S9-S10-S11
No ratings yet
CQHRM-293 Group PPT 01 Strategic Leadership and Navigation CQHRM-S8-S9-S10-S11
3 pages
Akgun (2021) - How Company Size Bias in ESG Scores Impacts The Small Cap Investor
No ratings yet
Akgun (2021) - How Company Size Bias in ESG Scores Impacts The Small Cap Investor
16 pages
Catco Analysis
No ratings yet
Catco Analysis
695 pages
Als461 - Take Home Written Assignment
No ratings yet
Als461 - Take Home Written Assignment
3 pages
Bridgewater College Teacher Education Program Lesson Plan
No ratings yet
Bridgewater College Teacher Education Program Lesson Plan
3 pages
Other - Carrier-Carlyle - 00060 - 49-52 - OTHER1
No ratings yet
Other - Carrier-Carlyle - 00060 - 49-52 - OTHER1
4 pages
Wavefront Customized Visual Correction - The Quest For Super Vision II (PDFDrive)
No ratings yet
Wavefront Customized Visual Correction - The Quest For Super Vision II (PDFDrive)
413 pages
Ch08Ex3
No ratings yet
Ch08Ex3
2 pages
Eb630 Blower: Shindaiwa Owner'S/Operator'S Manual
No ratings yet
Eb630 Blower: Shindaiwa Owner'S/Operator'S Manual
20 pages
The Johns Hopkins University Administrative Competency Dictionary
No ratings yet
The Johns Hopkins University Administrative Competency Dictionary
10 pages
Coronavirus (COVID-19) Records (1)
No ratings yet
Coronavirus (COVID-19) Records (1)
3 pages
Ia - Transmission Line Installation and Maintenance NC Ii
No ratings yet
Ia - Transmission Line Installation and Maintenance NC Ii
16 pages
Amroid Tablet: Composition
No ratings yet
Amroid Tablet: Composition
2 pages
Homework Unit 2 Lesson 2
100% (1)
Homework Unit 2 Lesson 2
4 pages
FM 5-102 Counter Mobility PDF
100% (2)
FM 5-102 Counter Mobility PDF
208 pages
IMNPD Assignment Group Part 1 - Sept 2024
No ratings yet
IMNPD Assignment Group Part 1 - Sept 2024
10 pages
PH Meter Calibration SOP
No ratings yet
PH Meter Calibration SOP
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter_1_Introduction_RL_Report_Kiran

Uploaded by

Chapter_1_Introduction_RL_Report_Kiran

Uploaded by

1.

1.1 Overview of Reinforcement Learning

Formally, an RL setting is often modeled as a Markov Decision Process (MDP), which

Reinforcement Learning approaches can be broadly categorized as:

1.2 Key Algorithms in RL Used in This Implementation

Q-learning is a model-free, value-based RL algorithm. It seeks to learn the optimal action-

The loss function minimized in DQN is:

The CartPole environment from OpenAI Gym features:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.