Gradient Descent Slides
Gradient Descent Slides
Gradient Descent Slides
Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Gradient Descent
Gradient descent is a name for a generic class of computer algorithms which
MINIMIZE a function.
GRADIENT DESCENT
Initial
A must be
J (Θ 0 ,Θ 1)
parameters
Process minimized
d
dΘ
BEST-FIT
Gradient Descent: Graphical Representation
=0, =-0.5 =0, =2.5
4
Our These steps are involved
3.5 with a parameter named,
optimizes at
3 =0, =1 LEARNING RATE, α
=0, =2
2.5 =0, =2
𝑚 2
1
J ( θ0 ,θ 1 )= ∑ ( hθ ( x i ) − y i )
2
1.5
2 𝑚 𝑖=1
1 WHY IT IS A
=0, =1.5 =0, =1.5 2-D PLOT?
0.5
Cost
Function
-0.5 0.5 1 1.5 2 2.5 3
θ 1
Finding the UPDATED in Gradient Descent-
the ALGORITHM
Implementation in PROGRAMMING
LANGUAGE:
GRADIENT DESCENT ALGORITHM: Loop…:
{
Repeat until convergence:
temp0 := θ0 -
{
temp1 := θ1 -
θ0 := θ0 - CORRECT
θ0 := temp0
θ1 := θ1 - θ1 := temp1
} } Loop…:
{
temp0 := θ0 -
θ0 := temp0
INCORRECT temp1 := θ1 -
θ1 := temp1
}
Finding the UPDATED in Gradient Descent-
the DERIVATION
COST FUNCTION:
Calculation of :
So, the formula for θ0 becomes:
Or, () θ0 := θ0 -
θ0 := θ0 -
=()
=
Finding the UPDATED in Gradient Descent-
the DERIVATION
COST FUNCTION:
Update θ0 := θ0 - and
Iterate through θ1 := θ1 -
Step 5 to 11 until MSE becomes minimized
Step 11
Numerical Problem in Gradient Descent
Consider an electric vehicle whose speed versus range (in mile) data are given in the table. If learning rate,
α=0.0001, and values are given as 400 and -1, respectively, then apply gradient descent algorithm with first two
iterations to indicate the error optimization. θ 0 =400, θ 1 =−1
hθ()=θ0+θ1 hθ()- (hθ()-)
55 316 hθ()=400-1*55=345 hθ()-=345-316=29 (hθ()-) =29*55=1595
ITERATION 1
hθ()=399.9926-
60 292 hθ()-=308.8946-292=16.8946 (hθ()-) =16.8946*60=1013.676
1.5183*60=308.8946
hθ()=399.9926-
65 268 hθ()-=301.3031-268=33.3031 (hθ()-) =33.3031*65=2164.7015
1.5183*65=301.3031
hθ()=399.9926-
70 246 hθ()-=293.7116-246=47.7116 (hθ()-) =47.7116*70=3339.812
1.5183*70=293.7116
hθ()=399.9926-
75 227 hθ()-=286.12-227=59.12 (hθ()-) =59.12*75=4434
1.5183*75=286.12
hθ()=399.9926-
80 207 hθ()-=278.5286-207=71.5286 (hθ()-) =71.5286*80=5722.288
1.5183*80=278.5286
=229.044 =16701.213
Updated θ0 := θ0 - =400-0.003817=399.996183
θ1 := θ1 - = -1.5183 -
100 for
=200
min 200 CONCLUSION: So, if the plot is showing the
300 decreased value of Cost Function after every
iteration, then GRADIENT DESCENT WORKS
PROPERLY.
Number of iterations
Here, the Cost Function is not going DOWN MUCH MORE beyond 300/400 iterations. Number of iterations
Gradient Descent and Learning Rate
J ( θ) U
If learning Rate is TOO BIG, then
gradient descent OVERSHOOT the
minimum. J ( θ) S Z
Number of iterations
TOO SMALL Y
LEARNING RATE converges
X
slowly. Number of iterations
J ( θ)
SUMMARY:
i) If α is too small: SLOW CONVERGENCE
ii) If α is too large: may not decrease on every iteration; MAY
Number of iterations NOT CONVERGE