Gradient Descent Slides

Gradient Descent
Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Gradient Descent
Gradient descent is a name for a generic class of computer algorithms which
MINIMIZE a function.
GRADIENT DESCENT
The movement toward best-fit is achieved by taking

the derivative of the variable or variables involved,
These algorithms achieve this end by starting with initial parameter towards the direction with the lowest (calculus-
values and iteratively moving towards a set of parameter values that defined) gradient—that's the gradient part.
minimize some cost function or metric—that's the descent part.
Initial
A must be
J (Θ 0 ,Θ 1)
parameters
Process minimized
d
dΘ
BEST-FIT
Gradient Descent: Graphical Representation
=0, =-0.5 =0, =2.5
4
Our These steps are involved
3.5 with a parameter named,
optimizes at
3 =0, =1 LEARNING RATE, α
=0, =2
2.5 =0, =2
𝑚 2
1
J ( θ0 ,θ 1 )= ∑ ( hθ ( x i ) − y i )
2
1.5
2 𝑚 𝑖=1
1 WHY IT IS A
=0, =1.5 =0, =1.5 2-D PLOT?
0.5
Cost
Function
-0.5 0.5 1 1.5 2 2.5 3
θ 1
Finding the UPDATED in Gradient Descent-
the ALGORITHM
Implementation in PROGRAMMING
LANGUAGE:
GRADIENT DESCENT ALGORITHM: Loop…:
{
Repeat until convergence:
temp0 := θ0 -
{
temp1 := θ1 -
θ0 := θ0 - CORRECT
θ0 := temp0
θ1 := θ1 - θ1 := temp1
} } Loop…:
{
temp0 := θ0 -
θ0 := temp0
INCORRECT temp1 := θ1 -
θ1 := temp1
}
the DERIVATION
COST FUNCTION:
Calculation of :
So, the formula for θ0 becomes:
Or, () θ0 := θ0 -
θ0 := θ0 -
=()
=
the DERIVATION
COST FUNCTION:
Calculation of : So, the formula for θ0 becomes:

Or, () θ1 := θ1 -
=() θ1 := θ1 -
=
Steps involved in Gradient Descent
Fix a value of LEARNING Consider the linear
Consider a dataset involving
RATE, α regression hypothesis:
{xi,yi}
Step 1 Step 2 (possibly <1) hθ(x)=θ0+θ1x
Step 3
Calculate by using Calculate by using Start with an initial values

the values given in dataset the values given in dataset of
- Step 6 = θ0+θ1 Step 4
- = θ0+θ1
… …
Find
Step 5
Step 8 Calculate MSE, ERROR as
Step 7 Calculate Find
-)
- Step 9 Step 10
…
Update θ0 := θ0 - and
Iterate through θ1 := θ1 -
Step 5 to 11 until MSE becomes minimized
Step 11
Numerical Problem in Gradient Descent
Consider an electric vehicle whose speed versus range (in mile) data are given in the table. If learning rate,
α=0.0001, and values are given as 400 and -1, respectively, then apply gradient descent algorithm with first two
iterations to indicate the error optimization. θ 0 =400, θ 1 =−1
hθ()=θ0+θ1 hθ()- (hθ()-)
55 316 hθ()=400-1*55=345 hθ()-=345-316=29 (hθ()-) =29*55=1595
ITERATION 1
60 292 hθ()=400-1*60=340 hθ()-=340-292=48 (hθ()-) =48*60=2880

65 268 hθ()=400-1*65=335 hθ()-=335-268=67 (hθ()-) =67*65=4355
70 246 hθ()=400-1*70=330 hθ()-=330-246=84 (hθ()-) =84*70=5880
75 227 hθ()=400-1*75=325 hθ()-=325-227=98 (hθ()-) =98*75=7350
80 207 hθ()=400-1*80=320 hθ()-=320-207=113 (hθ()-) =113*80=9040
=439 =31100
Mean Squared Error or MSE: + + + + + )=37063/6= 6177.17
Updated θ0 := θ0 - =400-0.007317=399.9926
θ1 := θ1 - = -1 -
θ 1 =− Contd…
hθ()=θ0+θ1 hθ()- (hθ()-)
hθ()=399.9926-
55 316 hθ()-=316.4861-316=0.4861 (hθ()-) =0.4861*55=26.7355
1.5183*55=316.4861
ITERATION 2 θ 0 =399.9926,
hθ()=399.9926-
60 292 hθ()-=308.8946-292=16.8946 (hθ()-) =16.8946*60=1013.676
1.5183*60=308.8946
hθ()=399.9926-
65 268 hθ()-=301.3031-268=33.3031 (hθ()-) =33.3031*65=2164.7015
1.5183*65=301.3031
hθ()=399.9926-
70 246 hθ()-=293.7116-246=47.7116 (hθ()-) =47.7116*70=3339.812
1.5183*70=293.7116
hθ()=399.9926-
75 227 hθ()-=286.12-227=59.12 (hθ()-) =59.12*75=4434
1.5183*75=286.12
hθ()=399.9926-
80 207 hθ()-=278.5286-207=71.5286 (hθ()-) =71.5286*80=5722.288
1.5183*80=278.5286
=229.044 =16701.213
Updated θ0 := θ0 - =400-0.003817=399.996183
θ1 := θ1 - = -1.5183 -
Mean Squared Error or MSE: + + + + + )=37063/6= 2047.112

Gradient Descent and #Iterations
Questions: Formula of Gradient Descent:
i) How to make sure gradient descent is working correctly? θ0 := θ0 -
ii) How to choose learning rate, α?
100 for
=200
min 200 CONCLUSION: So, if the plot is showing the
300 decreased value of Cost Function after every
iteration, then GRADIENT DESCENT WORKS
PROPERLY.
Number of iterations
Here, the Cost Function is not going DOWN MUCH MORE beyond 300/400 iterations. Number of iterations
Gradient Descent and Learning Rate
J ( θ) U
If learning Rate is TOO BIG, then
gradient descent OVERSHOOT the
minimum. J ( θ) S Z
Number of iterations
TOO SMALL Y
LEARNING RATE converges
X
slowly. Number of iterations
J ( θ)
SUMMARY:
i) If α is too small: SLOW CONVERGENCE
ii) If α is too large: may not decrease on every iteration; MAY
Number of iterations NOT CONVERGE

Gradient Descent Slides

Uploaded by

Copyright:

Available Formats

Gradient Descent Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gradient Descent Slides

Uploaded by

Copyright:

Available Formats

Gradient Descent

The movement toward best-fit is achieved by taking

Calculation of : So, the formula for θ0 becomes:

Calculate by using Calculate by using Start with an initial values

60 292 hθ()=400-160=340 hθ()-=340-292=48 (hθ()-) =4860=2880

Mean Squared Error or MSE: + + + + + )=37063/6= 2047.112

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.