Calculus of Variations Demystified

Open in app Get started
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Naoki Follow
Oct 18, 2020 · 12 min read · Listen
Save
Image by author
Calculus of Variations Demystified

How to derive the Euler-Lagrange equation
We use the calculus of variations to optimize functionals.
You read it right: functionals not functions.
But what are functionals? What does a functional really look like?
Moreover, there is this thing called the Euler-Lagrange equation.
What is it? How is it useful?

How do we derive such equation?
If you have any of the above questions, you are in the right place.
I’ll demystify it for you.
The shortest path problem

Suppose we want to find out the shortest path from the point A to the point B.
Image by author
If there is no obstacle, the answer should be the straight line from A to B.

Image by author
But for now, let’s forget the fact that we know the answer.
Instead, let’s frame the problem into an optimization problem and mathematically
show that the straight line is the answer.
Doing so let us understand what functionals and variations are as well as how to derive
the Euler-Lagrange equation to solve the optimization problem.
Before finding the shortest path, we need to be able to calculate the length of a path
from A to B.
As the famous Chinese proverb say, “a journey of a thousand miles begins with a single
step”.
So, let’s calculate the length of a small passage and integrate it from A to B.
As shown in the above diagram, a small passage can be approximated by a straight line.
So, the total length of a path can be calculated by the following integration:
Since A and B are fixed points, let’s define them as A = (x1, y1) and B = (x2, y2) .
p , ( , y ) ( , y )
In the last step, y1 and y2 are dropped as they are determined by x1 and x2 .
So, we are integrating the square root from x1 until x2 .
Now we know how to calculate the length of a path from A to B.
We are going to find the y = f(x) that minimizes the path length S.
What is a functional?
We actually know that the y = f(x) is the straight line between A and B but let’s still
forget that for the time being.
Instead, we think of many potential solution lines from A to B.

Image by author
The blue line is y = blue_line(x) .
The red line is y = red_line(x) .
And so on.
But having a function name for each line is too tedious, so we call them collectively y =
y(x) .
We think of y = y(x) as any line between A and B which could be the blue line, the red
line or else.
We can calculate the length of any line (given a particular y(x) ) using the formula S .
In other words, S takes an instance of y(x) and returns the length of the line.
So, S maps a function y(x) to a value.
We call S a functional which is a function of functions.
A functional is written with square brackets like S[y] which means the functional S
takes a function y and returns a value.
If we talk about functions of functions like this general, there could be too many kinds
of functionals, many of which may not be that useful.
Since the functional optimization originates from the Physics which deals with
positions and velocities, the following general form is very common and has many
applications (not only in Physics but also in machine learning).
J[y] is a functional that takes a function y.
J integrates L from x1 until x2 .
L is a function of x , y , and y' .
y' is the derivative of y in terms of x.
The above may look too abstract.
Let’s apply it to our shortest path problem.
For the shortest path problem, L is defined as follows:

We integrate L between x1 and x2 to get the length of the path.
In the shortest path problem, L does not include x and y but only y' .
That is ok.
The point is that if we can solve the optimization problem for the general form of
functionals, we can solve many problems including the shortest path problem.
Functional variations
Before attempting solving the general form of functionals, let’s think about how to
solve the shortest path problem.
Do we generate many random functions and evaluate the path length to find out the
shortest path?
Image by author
We can map each path to a path length value using the functional S.
The question is how to know if we actually find the shortest path or not.
We can take an alternative approach since we already know that the shortest path
exists.
We start with the shortest path and slightly modify it to see what kind of condition is
required for the shortest path.
Let’s call the shortest path function f(x) that minimize S.
H it i t d f
Here, we write f(x) instead of y(x) .
The function f(x) solves our problem.
y(x) is a candidate function that may or may not solve our problem like the blue line
and the red line.
When we give y(x) or f(x) to S , we write like S[y] and S[f] respectively.
S[y] gives the path length following y(x) from A to B.
S[f] gives the path length following f(x) from A to B which is the shortest path
length.
If we add an arbitrary function η(x) (eta of x) to f(x) , the following relationship is
asserted:
The term ϵη is called a variation of the function f , which is denoted as follows:
ϵ (epsilon) is a small number so that the variation is also small.
We analyze how the functional changes when we move ϵ towards zero.
To make it more concrete, let’s apply a variation to the shortest path.
In the below diagram, the blue straight line is the shortest path y = f(x) .
A purple line is an example of a variation ϵη added to f as in y = f(x) + ϵη(x) .

Image by author
S[f + ϵη] gives the length of the path given by the function f + ϵη .
Since S[f] is the shortest path length, the relationship S[f] <= S[f + ϵη] is true for
any variation.
When ϵ goes to zero, the purple line becomes the same as the blue line as the
variation vanishes.
Also, note that η(x) must be zero at the point A and B since any variation must include
the point A and B. (Remember we are looking for the shortest line from A to B).
So η(x1) = 0 and η(x2) = 0

So, η(x1) = 0 and η(x2) = 0 .
Why do we think about the variation?
The reason is that the variation makes the functional optimization into a function
optimization which we know how to solve.
If we look at S[f + ϵη] very closely, we can see that it depends only on ϵ.
f(x) is the function that gives the minimum path line which is fixed.
η(x) is an arbitrary function that is fixed for a particular variation.
Therefore, the only variable in S[f + ϵη] is ϵ.
In short, S[f + ϵη] is a function of ϵ.
So, let’s write it as S(ϵ) .
S(ϵ) is the function of ϵ that returns the path length for the function y that is the
solution function f plus a variation ϵη .
We just need to solve the minimization problem for the function S(ϵ) .
Conveniently, we also know that S becomes the minimum when ϵ goes to zero.
So, the derivative of S with ϵ should be zero when ϵ = 0.
Here, the apostrophe ( ‘ ) is for the derivative of S by ϵ.
In case it’s not clear, an apostrophe is used when we calculate the derivative of a
function by the only independent variable. Since ϵ is the only independent variable of
S in this context, the apostrophe here means the derivative of S by ϵ.
Let’s go back to the general form to reiterate all the points we’ve discussed so far and
derive the Euler-Lagrange equation to solve the optimization problem.
The Euler-Lagrange equation
The general form of functionals we are dealing with is as follows:
We say the function f minimizes J.
As such, J[f] is the minimum value of the functional J.
We define y = f + ϵη to mean we add a variation to the solution function.
y should still satisfy the boundary condition
As such, η(x1) = 0 and η(x2) = 0 .
Since J[f] is the minimum (constant) and η is an arbitrary function of x which is

fixed for a particular variation, we can define J[y] as a function of ϵ:
We’ve successfully converted the general form of functionals into a function of ϵ.
This function of ϵ is at minimum when ϵ = 0.
In other words, the first derivative of it becomes zero at ϵ = 0.
So far, we reinforced the same idea from the shortest path problem for the general
form of functionals.
Now, we are going to solve it for the general form of functionals to derive the Euler-
Lagrange equation.
B b i i i h b i
By substituting J[y] into the above equation:
Let’s calculate the total derivative of L(x, y, y') by ϵ.
The first term is zero since x has no dependency on ϵ.
y = f + ϵη and y' = f' + ϵη' .
f , f' , η and η' have no dependency on ϵ.
So, the total derivative of L by ϵ is as follows:

When ϵ=0 , y = f and y' = f' :
Substituting this into the equation ∅'(0)=0 .
In the second term, we have η’ .
As η is an arbitrary function, we have no way to calculate η’ .
Let’s eliminate η' by applying integration by parts on the second term.
The first term is zero since η(x1) = 0 and η(x2) = 0 .
Therefore,
As η(x) is an arbitrary function, the term in the brackets must be zero to satisfy the
equation ∅'(0)=0 .
This is called the Euler-Lagrange equation which is the condition that must be
satisfied for the optimal function f of the general form of functionals.
Solving the shortest path problem

Let’s solve the shortest path problem using the Euler-Lagrange equation.
As mentioned before, y' is the derivative of y in terms of x.
Now, we say y = f(x) is the function that minimizes the path length from A to B.
So, we think of f instead of y in L.
This f must satisfy the Euler-Lagrange equation.
Let’s solve the Euler-Lagrange equation for the shortest path problem.
Since f does not appear in L , the first term is zero.
As such, the second term is also zero.
We integrate the above by x:
C is a constant value.
This means f' is also a constant value.
It can be shown by taking the square of both sides and re-arrange the equation:
Let f' = a and we get f(x) = ax + b which is a straight line.
Voilà!
If we put the boundary conditions f(x1) = y1 and f(x2) = y2 , we get the values for a
and b.
The last not the least…
The essential idea of the calculus of variations is to make a functional into a function of
ϵ by adding the variation ϵη to the optimal function f so that the problem of
functional optimization becomes a function optimization problem.
We derived the Euler-Lagrange equation for the general form of functionals which
must be satisfied for solution functions.
Mathematically speaking, the Euler-Lagrange equation is not sufficient condition for

the functional minimum or maximum. It just tells us the solution function makes the
functional stationary.
However, in many problems, we know the solution would give the functional minimum
or maximum.
In the shortest path problem, the maximum has no limit, so we know the solution to
the Euler-Lagrange equation gives the minimum function.
In case we want to check if the solution function is for the maximum or minimum, we
need to calculate the second derivative of ∅(ϵ) by ϵ for ϵ=0 where y = f + ϵη .
Let’s do this for the shortest path problem:
The first derivative of S(ϵ) with ϵ:
The second derivative of S(ϵ) with ϵ:

This is already a positive value, so we know the Euler-Lagrange equation for the
shortest path problem gives the minimum function.
In theory, we should check the value of the second derivative when ϵ=0 because we
are talking about the local minimum/maximum around the range expressed by ϵ.
When ϵ=0 , y = f :
All in all, the solution function (the straight line) given by the Euler-Lagrange equation
for the shortest path problem gives the minimum of the functional S.
Lastly, I assumed all the derivatives are possible in this article. For example, y(x) and
η(x) must be differentiable by x.
I hope you have a clearer idea about the calculus of variations now.
Related Articles
Normal Distribution Demystified: Understanding the Maximum

Entropy Principle
Have you ever wondered why we often use the normal distribution?
How do we derive it anyway? Why do many probability…
kikaben.com
Lagrange Multiplier Demystified
Understanding why it works Have you ever wondered why we use the
Lagrange multiplier to solve constrained optimization…
kikaben.com
Constrained Optimization Demystified

Explained with a simple example Have you ever wondered what
constrained optimization problems are? Oftentimes, the word…
kikaben.com
References
Calculus of variations (Wikipedia)
Calculus of variations
The calculus of variations is a field of mathematical analysis that uses variations, which are small
changes in…
en.wikipedia.org
The Calculus of Variations (Bounded Rationality)
The Calculus of Variations

This post is going to describe a specialized type of calculus called variational calculus.
Analogous to the usual…
bjlkeng.github.io
Introduction to Calculus of Variations (Faculty of Khan)
https://youtu.be/6HeQc7CSkZs
260 2
Naoki writes on smart technologies from basic concepts to coding. If that

tickles your curiosity, subscribe to new stories via e-mail.
Your email
Subscribe
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about
our privacy practices.

Calculus of Variations Demystified - by Naoki - Medium

Uploaded by

Copyright:

Available Formats

Calculus of Variations Demystified - by Naoki - Medium

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calculus of Variations Demystified - by Naoki - Medium

Uploaded by

Copyright:

Available Formats

Open in app Get started

Oct 18, 2020 · 12 min read · Listen

We use the calculus of variations to optimize functionals.

You read it right: functionals not functions.

Moreover, there is this thing called the Euler-Lagrange equation.

What is it? How is it useful?

I’ll demystify it for you.

The shortest path problem

If there is no obstacle, the answer should be the straight line from A to B.

Open in app Get started

So, we are integrating the square root from x1 until x2 .

Now we know how to calculate the length of a path from A to B.

Instead, we think of many potential solution lines from A to B.

The blue line is y = blue_line(x) .

The red line is y = red_line(x) .

So, S maps a function y(x) to a value.

We call S a functional which is a function of functions.

takes a function y and returns a value.

J[y] is a functional that takes a function y.

J integrates L from x1 until x2 .

L is a function of x , y , and y' .

y' is the derivative of y in terms of x.

The above may look too abstract.

Let’s apply it to our shortest path problem.

For the shortest path problem, L is defined as follows:

We integrate L between x1 and x2 to get the length of the path.

Let’s call the shortest path function f(x) that minimize S.

The function f(x) solves our problem.

S[y] gives the path length following y(x) from A to B.

If we add an arbitrary function η(x) (eta of x) to f(x) , the following relationship is

The term ϵη is called a variation of the function f , which is denoted as follows:

ϵ (epsilon) is a small number so that the variation is also small.

We analyze how the functional changes when we move ϵ towards zero.

To make it more concrete, let’s apply a variation to the shortest path.

A purple line is an example of a variation ϵη added to f as in y = f(x) + ϵη(x) .

So η(x1) = 0 and η(x2) = 0

Open in app Get started

Why do we think about the variation?

η(x) is an arbitrary function that is fixed for a particular variation.

Therefore, the only variable in S[f + ϵη] is ϵ.

In short, S[f + ϵη] is a function of ϵ.

So, let’s write it as S(ϵ) .

So, the derivative of S with ϵ should be zero when ϵ = 0.

Here, the apostrophe ( ‘ ) is for the derivative of S by ϵ.

We say the function f minimizes J.

As such, J[f] is the minimum value of the functional J.

We define y = f + ϵη to mean we add a variation to the solution function.

y should still satisfy the boundary condition

As such, η(x1) = 0 and η(x2) = 0 .

Since J[f] is the minimum (constant) and η is an arbitrary function of x which is

We’ve successfully converted the general form of functionals into a function of ϵ.

This function of ϵ is at minimum when ϵ = 0.

In other words, the first derivative of it becomes zero at ϵ = 0.

Let’s calculate the total derivative of L(x, y, y') by ϵ.

The first term is zero since x has no dependency on ϵ.

y = f + ϵη and y' = f' + ϵη' .

f , f' , η and η' have no dependency on ϵ.

So, the total derivative of L by ϵ is as follows: