Gradient Descent Slides

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

Gradient Descent

Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Gradient Descent
Gradient descent is a name for a generic class of computer algorithms which
MINIMIZE a function.
GRADIENT DESCENT

The movement toward best-fit is achieved by taking


the derivative of the variable or variables involved,
These algorithms achieve this end by starting with initial parameter towards the direction with the lowest (calculus-
values and iteratively moving towards a set of parameter values that defined) gradient—that's the gradient part.
minimize some cost function or metric—that's the descent part.

Initial
A must be
J (Θ 0 ,Θ 1)
parameters
Process minimized
d

BEST-FIT
Gradient Descent: Graphical Representation
=0, =-0.5 =0, =2.5
4
Our These steps are involved
3.5 with a parameter named,
optimizes at
3 =0, =1 LEARNING RATE, α
=0, =2
2.5 =0, =2

𝑚 2
1
J ( θ0 ,θ 1 )= ∑ ( hθ ( x i ) − y i )
2
1.5
2 𝑚 𝑖=1
1 WHY IT IS A
=0, =1.5 =0, =1.5 2-D PLOT?
0.5
Cost
Function
-0.5 0.5 1 1.5 2 2.5 3

θ 1
Finding the UPDATED in Gradient Descent-
the ALGORITHM
Implementation in PROGRAMMING
LANGUAGE:
GRADIENT DESCENT ALGORITHM: Loop…:
{
Repeat until convergence:
temp0 := θ0 -
{
temp1 := θ1 -
θ0 := θ0 - CORRECT
θ0 := temp0
θ1 := θ1 - θ1 := temp1
} } Loop…:
{
temp0 := θ0 -
θ0 := temp0
INCORRECT temp1 := θ1 -
θ1 := temp1
}
Finding the UPDATED in Gradient Descent-
the DERIVATION
COST FUNCTION:

Calculation of :
So, the formula for θ0 becomes:
Or, () θ0 := θ0 -
θ0 := θ0 -
=()
=
Finding the UPDATED in Gradient Descent-
the DERIVATION
COST FUNCTION:

Calculation of : So, the formula for θ0 becomes:


Or, () θ1 := θ1 -
=() θ1 := θ1 -
=
Steps involved in Gradient Descent
Fix a value of LEARNING Consider the linear
Consider a dataset involving
RATE, α regression hypothesis:
{xi,yi}
Step 1 Step 2 (possibly <1) hθ(x)=θ0+θ1x
Step 3

Calculate by using Calculate by using Start with an initial values


the values given in dataset the values given in dataset of
- Step 6 = θ0+θ1 Step 4
- = θ0+θ1
… …
Find
Step 5
Step 8 Calculate MSE, ERROR as
Step 7 Calculate Find
-)
- Step 9 Step 10

Update θ0 := θ0 - and
Iterate through θ1 := θ1 -
Step 5 to 11 until MSE becomes minimized
Step 11
Numerical Problem in Gradient Descent
Consider an electric vehicle whose speed versus range (in mile) data are given in the table. If learning rate,
α=0.0001, and values are given as 400 and -1, respectively, then apply gradient descent algorithm with first two
iterations to indicate the error optimization. θ 0 =400, θ 1 =−1
hθ()=θ0+θ1 hθ()- (hθ()-)
55 316 hθ()=400-1*55=345 hθ()-=345-316=29 (hθ()-) =29*55=1595
ITERATION 1

60 292 hθ()=400-1*60=340 hθ()-=340-292=48 (hθ()-) =48*60=2880


65 268 hθ()=400-1*65=335 hθ()-=335-268=67 (hθ()-) =67*65=4355
70 246 hθ()=400-1*70=330 hθ()-=330-246=84 (hθ()-) =84*70=5880
75 227 hθ()=400-1*75=325 hθ()-=325-227=98 (hθ()-) =98*75=7350
80 207 hθ()=400-1*80=320 hθ()-=320-207=113 (hθ()-) =113*80=9040
=439 =31100
Mean Squared Error or MSE: + + + + + )=37063/6= 6177.17
Updated θ0 := θ0 - =400-0.007317=399.9926
θ1 := θ1 - = -1 -
θ 1 =− Contd…
hθ()=θ0+θ1 hθ()- (hθ()-)
hθ()=399.9926-
55 316 hθ()-=316.4861-316=0.4861 (hθ()-) =0.4861*55=26.7355
1.5183*55=316.4861
ITERATION 2 θ 0 =399.9926,

hθ()=399.9926-
60 292 hθ()-=308.8946-292=16.8946 (hθ()-) =16.8946*60=1013.676
1.5183*60=308.8946
hθ()=399.9926-
65 268 hθ()-=301.3031-268=33.3031 (hθ()-) =33.3031*65=2164.7015
1.5183*65=301.3031
hθ()=399.9926-
70 246 hθ()-=293.7116-246=47.7116 (hθ()-) =47.7116*70=3339.812
1.5183*70=293.7116
hθ()=399.9926-
75 227 hθ()-=286.12-227=59.12 (hθ()-) =59.12*75=4434
1.5183*75=286.12
hθ()=399.9926-
80 207 hθ()-=278.5286-207=71.5286 (hθ()-) =71.5286*80=5722.288
1.5183*80=278.5286
=229.044 =16701.213

Updated θ0 := θ0 - =400-0.003817=399.996183
θ1 := θ1 - = -1.5183 -

Mean Squared Error or MSE: + + + + + )=37063/6= 2047.112


Gradient Descent and #Iterations
Questions: Formula of Gradient Descent:
i) How to make sure gradient descent is working correctly? θ0 := θ0 -
ii) How to choose learning rate, α?

100 for
=200
min 200 CONCLUSION: So, if the plot is showing the
300 decreased value of Cost Function after every
iteration, then GRADIENT DESCENT WORKS
PROPERLY.

Number of iterations

Here, the Cost Function is not going DOWN MUCH MORE beyond 300/400 iterations. Number of iterations
Gradient Descent and Learning Rate

J ( θ) U
If learning Rate is TOO BIG, then
gradient descent OVERSHOOT the
minimum. J ( θ) S Z
Number of iterations
TOO SMALL Y
LEARNING RATE converges
X
slowly. Number of iterations
J ( θ)
SUMMARY:
i) If α is too small: SLOW CONVERGENCE
ii) If α is too large: may not decrease on every iteration; MAY
Number of iterations NOT CONVERGE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy