0% found this document useful (0 votes)

11 views7 pages

OPTIMIZATION-MODULE IV

Module IV of BCS405C covers various unconstrained optimization techniques including steepest ascent/descent, Newton-Raphson method, gradient descent, mini-batch gradient descent, and stochastic gradient descent. It provides a detailed example of minimizing a function using the method of steepest descent and discusses the advantages of mini-batch gradient descent for handling large datasets. The module emphasizes the importance of these optimization techniques in machine learning and data science.

Uploaded by

sdfsdfs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

OPTIMIZATION-MODULE IV

Uploaded by

sdfsdfs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

MODULE IV

BCS405C – OPTIMIZATION TECHNIQUE

MODULE IV - Convex Optimization-II

Contents
• Unconstrained optimization -Method of steepest ascent/descent,
• NR method
• Gradient descent
• Mini batch gradient descent
• Stochastic gradient descent.
Unconstrained optimization -Method of steepest descent
https://www.youtube.com/watch?v=oKO_yDg8qw8

Example 1 – Minimize 𝒇(𝒙𝟏 , 𝒙𝟐 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

0
starting with the initial approximation 𝑋1 = [ ] using method of steepest descent
0
Solution

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

1
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE

𝝏𝒇
= 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐
𝝏𝒙𝟏
𝝏𝒇
= −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐
𝝏𝒙𝟐
𝝏𝒇
𝝏𝒙 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐
Gradient 𝛁 = [ 𝝏𝒇𝟏 ] =[ ]
= −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐
𝝏𝒙𝟐

𝟏
𝛁(𝒂𝒕 𝑿𝟏) = [ ]
−𝟏
Iteration1
−𝟏
First we find the direction 𝑺𝟏 = −𝛁 = [ ]
𝟏
𝟎 −𝟏 −𝝀
𝑿𝟏 + 𝑺𝟏 𝝀𝟏 = [ ] + 𝝀𝟏 [ ] = [ 𝟏 ]
𝟎 𝟏 𝝀𝟏
Next calculate the given function at this point.
𝒇(−𝝀𝟏 , −𝝀𝟏 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

= −𝝀𝟏 − −𝝀𝟏 + 𝟐𝝀𝟏 𝟐 − 𝟐𝝀𝟏 𝟐 + 𝝀𝟏 𝟐

= −𝟐𝝀𝟏 + 𝝀𝟏 𝟐
𝝏𝒇 𝝏𝒇
Next 𝝏𝝀 = −𝟐 + 𝟐𝝀𝟏 , 𝝏𝝀 = 𝟎 → −𝟐 + 𝟐 𝝀𝟏 = 𝟎 → 𝝀𝟏 = 𝟏
𝟏 𝟏

0 −𝟏 −𝟏
𝑋2 = 𝑿𝟏 + 𝑺𝟏 𝝀𝟏 = [ ] + 1 [ ] = [ ]
0 𝟏 𝟏
𝝏𝒇
𝝏𝒙 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐
𝛁= [ 𝝏𝒇𝟏 ] =[ ] at (-1,1)
= −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐
𝝏𝒙𝟐

−𝟏
𝛁(𝒂𝒕 𝑿𝟐) = [ ]
−𝟏
−𝟏
𝛁(𝒂𝒕 𝑿𝟐) = [ ] ≠ 𝑋1, So we repeat the iteration
−𝟏
Iteration2
𝟏
𝑺𝟐 = −𝛁 = [ ]
𝟏

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

2
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE

−𝟏 𝟏 −𝟏 + 𝝀𝟐
𝑿𝟐 + 𝑺𝟐 𝝀𝟐 = [ ] + 𝝀𝟐 [ ] = [ ]
𝟏 𝟏 𝟏 + 𝝀𝟐
Next calculate the given function at this point.
𝒇(−𝟏 + 𝝀𝟐 , 𝟏 + 𝝀𝟐 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

= −𝟏 + 𝝀𝟐 − (𝟏 + 𝝀𝟐 ) + 𝟐(−𝟏 + 𝝀𝟐) 𝟐 + 𝟐 (−𝟏 + 𝝀𝟐 ) (𝟏 + 𝝀𝟐 ) + (𝟏 + 𝝀𝟐 )𝟐

𝒇(−𝟏 + 𝝀𝟐 , 𝟏 + 𝝀𝟐 ) = −𝟐𝝀𝟐 + 𝟓𝝀𝟐 𝟐 − 𝟏

𝝏𝒇 𝝏𝒇
Next 𝝏𝝀 = −𝟐 + 𝟏𝟎𝝀𝟐 , 𝝏𝝀 = 𝟎 → 𝝀𝟐 = 𝟏/𝟓
𝟐 𝟐

−1 𝟏 −𝟎. 𝟖
𝑋3 = 𝑿𝟐 + 𝑺𝟐 𝝀𝟐 = [ ] + 1/5 [ ] = [ ]
1 𝟏 𝟏. 𝟐
𝝏𝒇
𝝏𝒙 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐 𝟎. 𝟐
𝛁= [ 𝝏𝒇𝟏 ] =[ ] at (−𝟎. 𝟖, 𝟏. 𝟐)= [ ] ≠ 𝑋1, So we repeat the iteration
= −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐 −𝟎. 𝟐
𝝏𝒙𝟐

Iteration3
−𝟎. 𝟐
𝑺𝟑 = −𝛁 = [ ]
𝟎. 𝟐

−𝟎. 𝟖 −𝟎. 𝟐 −𝟎. 𝟖−. 𝟐𝝀𝟑

𝑿𝟑 + 𝑺𝟑 𝝀𝟑 = [ ] + 𝝀𝟑 [ ]=[ ]
𝟏. 𝟐 𝟎. 𝟐 𝟏. 𝟐 +. 𝟐𝝀𝟑

𝒇(−𝟎. 𝟖−. 𝟐𝝀𝟑 , 𝟏. 𝟐 +. 𝟐𝝀𝟑 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

= − . 𝟎𝟒𝝀𝟑 𝟐 − . 𝟎𝟖𝝀𝟑 − 𝟏. 𝟐

𝝏𝒇 𝝏𝒇
= . 𝟎𝟖𝝀𝟑 −. 𝟎𝟖 , 𝝏𝝀 = 𝟎 → 𝝀𝟑 = 𝟏
𝝏𝝀𝟑 𝟑

−𝟏
𝑿𝟒 = 𝑿𝟑 + 𝑺𝟑 𝝀𝟑 = [ ]
𝟏. 𝟒

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

3
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE
𝝏𝒇
𝝏𝒙 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐 𝟎. 𝟐
𝛁 = [ 𝝏𝒇𝟏 ] = [ ] at (−𝟏, 𝟏. 𝟒)= [ ] ≠ 𝑋1, So we repeat the iteration
= −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐 .𝟐
𝝏𝒙𝟐

Since .2 is closer to 0, let us stop the iteration . If you want you can do the iterations few more
tim until you are closer to 𝑿𝟏
We will conclude theat at (−1, 1.4) is the minimum point
To find f minimum, calculate 𝒇((−1, 1.4)

NR method
https://www.youtube.com/watch?v=SIqfDj1DTyM
https://www.youtube.com/watch?v=1z1sD202jbE

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

4
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE

1/2 −1/2
𝐽1 −1 = [ ]
−1/2 1
𝜕𝑓
𝜕𝑥 1 + 4𝑥1 + 2𝑥2 1
𝑔1 = [ 𝜕𝑓1 ] =[ ] = [ ](substitute x values as 0, 0)
−1 + 2𝑥1 + 2𝑥2 −1
𝜕𝑥2

1 1
0 −2 1 −1
−1
𝑋2 = 𝑋1 − 𝐽1 𝑔1 = [ ] − [ 2 1 ] [ ]= [ ]
0 −2 1 −1 3/2

𝜕𝑓
𝜕𝑥 1 + 4𝑥1 + 2𝑥2 0
𝑔2 = [ 𝜕𝑓1] = [ ] = [ ](substitute x values as -1, 3/2)
−1 + 2𝑥1 + 2𝑥2 0
𝜕𝑥2

−1
As 𝑔2 = 0, 𝑋2 = [ ] is the optimum point.
3/2
3
Hence 𝑓 (−1, 2) = −7/2 (by substituting x and y as -1 and 3/2 in the given question)

Mini batch gradient descent & Stochastic gradient

descent.
https://www.youtube.com/watch?v=kmb5FuFCZKE
Gradient descent minimizes the loss function of a model. Mini-batch gradient
descent helps reduce the loss function by randomly choosing the data points and dividing them
into batches making the update after the complete computation of one batch, making
computation easy and the optimization fast.
In Gradient Descent we update weights in such a way that we obtain the minimum cost function
value. That is every updation we reaches closer to the local minima.
The problem is we will be passing the entire data set once. If the dataet is too
large(Example say 1million records). In this case what will happen is, if we update the weights.
Our model has to process huge amount of data, it will take a lot of time and each step needs
that much time. Apart from 1miilion records in the data set there may be a large number of
features also, which may also decide the processing time.
For example if we are having 100,000 images having pixel size 1000X1000. Then we will
be having 1011 bytes of data(that is 100GB). Our RAM/GPU is very limited size(may be 64GB),
and it will not be able to process or handle this much data at once. We will be getting out of

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

5
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE

Memory Error. To overcome this we use Mini Batch Gradient Descent. The way in which it
works is we will divide the data set into batches (For example 100,000 record- makes 100
batches each of size 1000). Instead of pasing the entire data set, we will be passing the Mini
Batch to the model, update the weights and then we pass another Mini Batch to the model. By
the time our model see the entire data set, we would have already made a lot of progress by
updating the weights 100 times. As we pass Mini Batches, it reduces the time taken to process
the data, and also will not be facing any Out of Memory Error. The only problem what we
are having with the Mini Batch Gradient Descent is we have only small batches of data and
then training the model, so that model will not be able to recognize the patterns what the other
Mini Batches will be having. It will be able to train that specificMini Batch.

For Example, if we take the entire data set then updation of weights takes place and
reaches to the local minima. This is the actual graph if we train the model using the entire
dataset, and the graph slowly approaches to the local minima.

But if we take the mini batches, the graph will be a plane as shown below.

If we take the mini batches, it will mot directly reaches to local minima, it will take a zig zag
curve as shown below and reaches to the local minima as shown below.

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

6
MODULE IV
BCS405C – OPTIMIZATION TECHNIQUE

The updation of weights does not takes place as in the parabolic curve. It will take a Zig – zag
curve till it reaches the local minima. Actually It will not reaches to the exact point of local
minima, but aroud or closer to that it will reach. This Zig – zag path is actually called a noise.
Imagine that our dataset contains only 1000 records(not 100, 000 records), at that time it
is pointless to use the Mini Batch Gradient Descent, because most of the time will be wasted
by following that zig – zag curve, instead of travelling through a straight path.

https://www.youtube.com/watch?v=qOeU9GCnU3w --- Gradient

Dr. Ranjini. P. S, M.Sc., M. Phil, Ph. D, M. Tech in Data Science & Machine Learning,
Professor, Department of Artificial Intelligence & Data Science,
Don Bosco Institute of Bangalore.

11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Gradient_Descent
No ratings yet
Gradient_Descent
52 pages
Lecture 9
No ratings yet
Lecture 9
31 pages
5_1_SD_17122020
No ratings yet
5_1_SD_17122020
47 pages
Lecture 14 From Sensitivities to Optimisation
No ratings yet
Lecture 14 From Sensitivities to Optimisation
20 pages
ch4
No ratings yet
ch4
28 pages
Berkeley-tutorial Optimization for Machine Learningpart2
No ratings yet
Berkeley-tutorial Optimization for Machine Learningpart2
35 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
HMD-Deep Learning-Lecture 2-2024
No ratings yet
HMD-Deep Learning-Lecture 2-2024
47 pages
Lecture 7 (with notes)
No ratings yet
Lecture 7 (with notes)
39 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
lec30 (6)
No ratings yet
lec30 (6)
22 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Optimization
No ratings yet
Optimization
89 pages
WEEK 4
No ratings yet
WEEK 4
61 pages
DL MODULE 2
No ratings yet
DL MODULE 2
8 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
cours5
No ratings yet
cours5
23 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
L3-ANN
No ratings yet
L3-ANN
15 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
GD Types
No ratings yet
GD Types
98 pages
EDA Lecture Module 4
No ratings yet
EDA Lecture Module 4
20 pages
ML MODULE 5 FULL NOTES
No ratings yet
ML MODULE 5 FULL NOTES
23 pages
t1-sol
No ratings yet
t1-sol
4 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
Cours D'optimisation
No ratings yet
Cours D'optimisation
159 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
MLSS Complete PDF
No ratings yet
MLSS Complete PDF
106 pages
C2_M2_Exam_withSol (1)
No ratings yet
C2_M2_Exam_withSol (1)
12 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Stats 102B Cheat Sheet
No ratings yet
Stats 102B Cheat Sheet
4 pages
Lec 6 Tutorial
No ratings yet
Lec 6 Tutorial
27 pages
CS 256: LMS Algorithms
No ratings yet
CS 256: LMS Algorithms
23 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Nonlinearity in Structural Dynamics Chapter App G
No ratings yet
Nonlinearity in Structural Dynamics Chapter App G
11 pages
L23 Stochastic Gradient and Mini Batch
No ratings yet
L23 Stochastic Gradient and Mini Batch
9 pages
NEOM Manual Part-II 4-Expts
No ratings yet
NEOM Manual Part-II 4-Expts
41 pages
A Novel Approach To Error Function Minimization For Feedforward Neural Networks
No ratings yet
A Novel Approach To Error Function Minimization For Feedforward Neural Networks
12 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
r206668v AMutenda Model
No ratings yet
r206668v AMutenda Model
62 pages
Computer Vision Exam Questions English
No ratings yet
Computer Vision Exam Questions English
9 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
AI-Unit1
No ratings yet
AI-Unit1
25 pages
Gcmma
No ratings yet
Gcmma
23 pages
Verb Classification
No ratings yet
Verb Classification
7 pages
Drone-Based AI System for Wildfire Monitoring and Risk Prediction
No ratings yet
Drone-Based AI System for Wildfire Monitoring and Risk Prediction
18 pages
AI-Generated Content (AIGC) : A Survey: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Hong Lin
No ratings yet
AI-Generated Content (AIGC) : A Survey: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, and Hong Lin
17 pages
Intership
No ratings yet
Intership
29 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Long_Short_Term_Memory_(LSTM)
No ratings yet
Long_Short_Term_Memory_(LSTM)
23 pages
Machine Learning(MCQ)
No ratings yet
Machine Learning(MCQ)
16 pages
Ai Docx-1
No ratings yet
Ai Docx-1
27 pages
Board Model Test 1
No ratings yet
Board Model Test 1
8 pages
MITIGATING THE IMPACT OF IONOSPHERIC DISTURBANCES ON 6G NETWORKS CHALLENGES, STRATEGIES, AND FUTURE DIRECTIONS
No ratings yet
MITIGATING THE IMPACT OF IONOSPHERIC DISTURBANCES ON 6G NETWORKS CHALLENGES, STRATEGIES, AND FUTURE DIRECTIONS
10 pages
Portia Nyatoti R1711131G MSI 739 Project
No ratings yet
Portia Nyatoti R1711131G MSI 739 Project
9 pages
AI_Practical_File_Annual
No ratings yet
AI_Practical_File_Annual
7 pages
Sat - 6.Pdf - Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Sat - 6.Pdf - Prediction of Modernized Loan Approval System Based On Machine Learning Approach
11 pages
Soft Computing: UNIT I:Basics of Artificial Neural Networks
No ratings yet
Soft Computing: UNIT I:Basics of Artificial Neural Networks
45 pages
The Role of Artificial Intelligence in Enhancing Cybersecurity
No ratings yet
The Role of Artificial Intelligence in Enhancing Cybersecurity
5 pages
Bayesian Algorithm
No ratings yet
Bayesian Algorithm
6 pages
28 - Action Recognition in Australian Rules Football Through Deep Learning
No ratings yet
28 - Action Recognition in Australian Rules Football Through Deep Learning
14 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
4.2 Artificial Intelligence and Machine Learning (CA-2)
No ratings yet
4.2 Artificial Intelligence and Machine Learning (CA-2)
4 pages
EE746 - Report
No ratings yet
EE746 - Report
6 pages
P Murali Krishna Data Scientist
No ratings yet
P Murali Krishna Data Scientist
4 pages
GPT 2 - Learninhg 2
No ratings yet
GPT 2 - Learninhg 2
2 pages
Project Assignment.2024
No ratings yet
Project Assignment.2024
2 pages
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
No ratings yet
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
3 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
L1 - SS13.R43 Fintech in Investment Management Lesson 1
No ratings yet
L1 - SS13.R43 Fintech in Investment Management Lesson 1
6 pages
Sop Neu
No ratings yet
Sop Neu
1 page
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

OPTIMIZATION-MODULE IV

Uploaded by

OPTIMIZATION-MODULE IV

Uploaded by

MODULE IV

BCS405C – OPTIMIZATION TECHNIQUE

MODULE IV - Convex Optimization-II

Example 1 – Minimize 𝒇(𝒙𝟏 , 𝒙𝟐 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

= −𝝀𝟏 − −𝝀𝟏 + 𝟐𝝀𝟏 𝟐 − 𝟐𝝀𝟏 𝟐 + 𝝀𝟏 𝟐

= −𝟏 + 𝝀𝟐 − (𝟏 + 𝝀𝟐 ) + 𝟐(−𝟏 + 𝝀𝟐) 𝟐 + 𝟐 (−𝟏 + 𝝀𝟐 ) (𝟏 + 𝝀𝟐 ) + (𝟏 + 𝝀𝟐 )𝟐

𝒇(−𝟏 + 𝝀𝟐 , 𝟏 + 𝝀𝟐 ) = −𝟐𝝀𝟐 + 𝟓𝝀𝟐 𝟐 − 𝟏

−𝟎. 𝟖 −𝟎. 𝟐 −𝟎. 𝟖−. 𝟐𝝀𝟑

𝒇(−𝟎. 𝟖−. 𝟐𝝀𝟑 , 𝟏. 𝟐 +. 𝟐𝝀𝟑 ) = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟏 𝟐 + 𝟐 𝒙𝟏 𝒙𝟐 + 𝒙𝟐 𝟐

Mini batch gradient descent & Stochastic gradient

https://www.youtube.com/watch?v=qOeU9GCnU3w --- Gradient

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.