Geometric Optimal Control: Theory, Methods and Examples
By Heinz Schättler and Urszula Ledzewicz
()
About this ebook
Heinz Schättler is an Associate Professor at Washington University in St. Louis in the Department of Electrical and Systems Engineering, Urszula Ledzewicz is a Distinguished Research Professor at Southern Illinois University Edwardsville in the Department of Mathematics and Statistics.
Related to Geometric Optimal Control
Titles in the series (1)
Geometric Optimal Control: Theory, Methods and Examples Rating: 0 out of 5 stars0 ratings
Related ebooks
Equilibrium Statistical Mechanics Rating: 4 out of 5 stars4/5Elementary Principles in Statistical Mechanics Rating: 5 out of 5 stars5/5Understanding Leadership in Complex Systems: A Praxeological Perspective Rating: 0 out of 5 stars0 ratingsDefinition of Complexity Science: Scientific miscellaneous Rating: 0 out of 5 stars0 ratingsOptimization Under Stochastic Uncertainty: Methods, Control and Random Search Methods Rating: 0 out of 5 stars0 ratingsHuman and the 4th Dimension Rating: 0 out of 5 stars0 ratingsFinite Elements and Approximation Rating: 5 out of 5 stars5/5Basic and Advanced Physics Rating: 0 out of 5 stars0 ratingsAn Introduction to Probability and Stochastic Processes Rating: 5 out of 5 stars5/5Modern Anti-windup Synthesis: Control Augmentation for Actuator Saturation Rating: 5 out of 5 stars5/5Hybrid Dynamical Systems: Modeling, Stability, and Robustness Rating: 0 out of 5 stars0 ratingsParty Competition: An Agent-Based Model Rating: 0 out of 5 stars0 ratingsMethods for Applied Macroeconomic Research Rating: 3 out of 5 stars3/5Statistical Mechanics Rating: 0 out of 5 stars0 ratingsNumerical Methods for Stochastic Computations: A Spectral Method Approach Rating: 5 out of 5 stars5/5Systems Design for Remote Healthcare Rating: 0 out of 5 stars0 ratingsTheory of Linear Physical Systems: Theory of physical systems from the viewpoint of classical dynamics, including Fourier methods Rating: 0 out of 5 stars0 ratingsData Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications Rating: 0 out of 5 stars0 ratingsOrdinary Differential Equations and Dynamical Systems Rating: 0 out of 5 stars0 ratingsIntroduction to Statistics in Metrology Rating: 0 out of 5 stars0 ratingsSchaum's Outline of Elements of Statistics I: Descriptive Statistics and Probability Rating: 0 out of 5 stars0 ratingsNonlinear Inverse Problems in Imaging Rating: 0 out of 5 stars0 ratingsExploring Data Analysis: The Computer Revolution in Statistics Rating: 0 out of 5 stars0 ratingsThe Demand for Life Insurance: Dynamic Ecological Systemic Theory Using Machine Learning Techniques Rating: 0 out of 5 stars0 ratingsChallenges and Trends in Multimodal Fall Detection for Healthcare Rating: 0 out of 5 stars0 ratingsMarkov Models: An Introduction to Markov Models Rating: 3 out of 5 stars3/5Physical Implementation of Quantum Walks Rating: 0 out of 5 stars0 ratingsContract Theory in Continuous-Time Models Rating: 0 out of 5 stars0 ratingsSchrödinger’s Cat Smile Rating: 0 out of 5 stars0 ratings
Mathematics For You
Quantum Physics for Beginners Rating: 4 out of 5 stars4/5My Best Mathematical and Logic Puzzles Rating: 4 out of 5 stars4/5What If?: Serious Scientific Answers to Absurd Hypothetical Questions Rating: 5 out of 5 stars5/5Algebra - The Very Basics Rating: 5 out of 5 stars5/5The Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5Basic Math & Pre-Algebra For Dummies Rating: 4 out of 5 stars4/5Algebra I Workbook For Dummies Rating: 3 out of 5 stars3/5The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English! Rating: 4 out of 5 stars4/5Math Magic: How To Master Everyday Math Problems Rating: 3 out of 5 stars3/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Algebra II For Dummies Rating: 3 out of 5 stars3/5Mental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Basic Math & Pre-Algebra Workbook For Dummies with Online Practice Rating: 4 out of 5 stars4/5Calculus Made Easy Rating: 4 out of 5 stars4/5Flatland Rating: 4 out of 5 stars4/5Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis Rating: 0 out of 5 stars0 ratingsThe Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need Rating: 5 out of 5 stars5/5Pre-Calculus For Dummies Rating: 5 out of 5 stars5/5How to Solve It: A New Aspect of Mathematical Method Rating: 4 out of 5 stars4/5Calculus For Dummies Rating: 4 out of 5 stars4/5Relativity: The special and the general theory Rating: 5 out of 5 stars5/5How to Calculate Quickly: Full Course in Speed Arithmetic Rating: 4 out of 5 stars4/5ACT Math & Science Prep: Includes 500+ Practice Questions Rating: 3 out of 5 stars3/5Mental Math: Tricks To Become A Human Calculator Rating: 3 out of 5 stars3/5Trigonometry For Dummies Rating: 0 out of 5 stars0 ratingsMust Know High School Algebra Rating: 5 out of 5 stars5/5The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives Rating: 4 out of 5 stars4/5
Reviews for Geometric Optimal Control
0 ratings0 reviews
Book preview
Geometric Optimal Control - Heinz Schättler
Heinz Schättler and Urszula LedzewiczInterdisciplinary Applied MathematicsGeometric Optimal Control2012Theory, Methods and Examples10.1007/978-1-4614-3834-2_1© Springer Science+Business Media, LLC 2012
1. The Calculus of Variations: A Historical Perspective
Heinz Schättler¹ and Urszula Ledzewicz²
(1)
Washington University, St. Louis, MO, USA
(2)
Southern Illinois University, Edwardsville, IL, USA
Abstract
We begin with an introduction to the historical origin of optimal control theory, the calculus of variations. But it is not our intention to give a comprehensive treatment of this topic. Rather, we introduce the fundamental necessary and sufficient conditions for optimality by fully analyzing two of the cornerstone problems of the theory, the brachistochrone problem and the problem of determining surfaces of revolution with minimum surface area, so-called minimal surfaces. Our emphasis is on illustrating the methods and techniques required for getting complete solutions for these problems. More generally, we use the so-called fixed-endpoint problem, the problem of minimizing a functional over all differentiable curves that satisfy given boundary conditions, as a vehicle to introduce the classical results of the theory: (a) the Euler–Lagrange equation as the fundamental first-order necessary condition for optimality, (b) the Legendre and Jacobi conditions, both in the form of necessary and sufficient second-order conditions for local optimality, (c) the Weierstrass condition as additional necessary condition for optimality for so-called strong minima, and (d) its connection with field theory, the fundamental idea in any sufficiency theory. Throughout our presentation, we emphasize geometric constructions and a geometric interpretation of the conditions. For example, we present the connections between envelopes and conjugate points of a fold type and use these arguments to give a full solution for the minimum surfaces of revolution.
We begin with an introduction to the historical origin of optimal control theory, the calculus of variations. But it is not our intention to give a comprehensive treatment of this topic. Rather, we introduce the fundamental necessary and sufficient conditions for optimality by fully analyzing two of the cornerstone problems of the theory, the brachistochrone problem and the problem of determining surfaces of revolution with minimum surface area, so-called minimal surfaces. Our emphasis is on illustrating the methods and techniques required for getting complete solutions for these problems. More generally, we use the so-called fixed-endpoint problem, the problem of minimizing a functional over all differentiable curves that satisfy given boundary conditions, as a vehicle to introduce the classical results of the theory: (a) the Euler–Lagrange equation as the fundamental first-order necessary condition for optimality, (b) the Legendre and Jacobi conditions, both in the form of necessary and sufficient second-order conditions for local optimality, (c) the Weierstrass condition as additional necessary condition for optimality for so-called strong minima, and (d) its connection with field theory, the fundamental idea in any sufficiency theory. Throughout our presentation, we emphasize geometric constructions and a geometric interpretation of the conditions. For example, we present the connections between envelopes and conjugate points of a fold type and use these arguments to give a full solution for the minimum surfaces of revolution.
The classical ideas and concepts presented here will serve us both as an introduction to and motivation for the corresponding notions in optimal control theory to be discussed in subsequent chapters. Since geometric content is most easily visualized in the plane—and since the classical problems we are going to analyze are of this type—we restrict our introductory treatment here to one-dimensional problems. This mostly simplifies the notation, and only to a small extent the mathematics. We include a brief treatment of the multidimensional case in Sect. 2.3 as a corollary to the Pontryagin maximum principle from optimal control theory.
Chapter 1 is organized as follows: In Sect. 1.1 we introduce Johann Bernoulli’s brachistochrone problem, the very first problem in the calculus of variations posed as a challenge to the mathematical community of its time in 1696. The fundamental first-order necessary condition for optimality, the Euler–Lagrange equation, will be developed in Sect. 1.2 and used to compute the extremals for the brachistochrone problem, the cycloids. In general, extremals are curves that satisfy the Euler–Lagrange equation. In Sect. 1.3, we formulate the problem of finding surfaces of minimum area of revolution for positive differentiable functions and show that the catenaries are the only extremals. We also include a detailed analysis of the mapping properties of the family of catenaries and their envelope, a rather technical mathematical argument, which, however, is essential to the understanding of the full solutions to this problem that will be given in Sect. 1.7. This requires that we develop second-order necessary conditions for optimality and the notion of conjugate points, which will be done in Sects. 1.4 and 1.5, which naturally leads to results about the local optimality of trajectories. A global solution of problems in the calculus of variations requires the notion of a field of extremals, which will be developed in Sect. 1.6, and leads to the Weierstrass condition. In Sects. 1.7 and 1.8, we then return to the problem of minimum surfaces of revolution and the brachistochrone problem, respectively, and give complete global solutions. This requires a further and nontrivial analysis of the geometric properties of the flows of extremals for these problems, which will be carried out in detail in these sections. In fact, both of these classical problems cannot be analyzed directly with standard textbook results of the theory. For the problem of minimum surfaces of revolution, the reason is that a problem formulation within the class of positive continuously differentiable functions is not wellposed, and a second class of candidates for optimality, the so-called Goldschmidt extremals, which are only piecewise differentiable, rectifiable curves, needs to be taken into account. For the brachistochrone problem, the extremals, the cycloids, have singularities at their initial point that require special attention. We include complete and mathematically rigorous solutions for these two benchmark problems of the calculus of variations. All these arguments foreshadow similar constructions that will be carried out more generally in the solutions of optimal control problems in Chaps. 2 and 5. We close this introductory chapter with a brief discussion of the Hamilton–Jacobi equation in Sect. 1.9 and provide in Sect. 1.10 an outlook on how the conditions of the calculus of variations developed here as a whole foreshadowed the Pontryagin maximum principle of optimal control, the fundamental necessary conditions for optimality for an optimal control problem.
1.1 The Brachistochrone Problem
The origins of the subject, and in some sense for much of the further development of calculus as a whole, lie with the statement of the following problem posed in 1696 by Johann Bernoulli:
[Brachistochrone]
Giventwo points A and B in the vertical plane, for a moving particle m, find the path AmB descending along which by its own gravity and beginning to be urged from the point A, it may in the shortest time reach B.
This problem, which sounds much better in Latin than in its English translation, is the so-called brachistochrone problem, named by combining the two Greek words for shortest
(βραχιστoς, brachistos) and time
(χρoνoς, chronos). Johann Bernoulli, who already had solved this problem with a rather ingenious ad hoc argument, challenged the scientific community of his time, or as he called it the sharpest mathematical minds of the globe,
to give solutions to this problem. In 1697, several solutions, including Johann’s own and solutions by his older brother Jakob, Newton, and others also including a note by Leibniz, whose solution was similar to Johann’s, were published in the Acta Eruditorum. The solution is a cycloid, the locus that a point fixed on the circumference of a circle traverses as the circle rolls on a line, not a line itself, a rather implausible candidate Johann Bernoulli had warned against in his posting, nor a circle, the solution suggested by Galileo more than a hundred years earlier. In hindsight, it was the newly developed ideas of calculus, of which Galileo was deprived, and their relations with physics problems that made it possible to give these solutions. A common thread in all of them was the use of arguments—mathematics and physics still being the same discipline at that time—based on Fermat’s principle or the laws of refraction by Snell and Huygens. It was up to Euler, and even more so to his student Lagrange, to consider the problem as the minimum-time problem it is and give a solution based on variational ideas that eventually grew into the calculus of variations. We refer the reader to the article [245] by H. Sussmann and J. Willems for an excellent and detailed exposition of the historical context and the developments that led from the posing of this problem to the formulation of the maximum principle of optimal control, one of the main topics of this book.
In order to give a mathematical formulation, let A be the origin of a two-dimensional coordinate system and denote the horizontal coordinate by x and the vertical coordinate by y, but orient it downward (see Fig. 1.1). It is clear from conservation of energy that the terminal point B cannot lie higher than the initial point A and thus it needs to have coordinates B = (x 0, y 0) with y 0 nonnegative. Without loss of generality, we also assume that x 0 > 0, ignoring the trivial case of free fall (x 0 = 0). The objective is to minimize the time it takes for the particle to move from A to B along a curve Γ that connects A with B when the only force that acts upon the particle is gravity. It is implicitly assumed in the problem formulation that the particle descends without friction. It seems obvious that curves Γ that are not graphs of functions y : [0, x 0] → ℝ + , x↦y(x), cannot be optimal. Mathematically, this can be proven, but it requires a different and more general setup for the problem than the classical one. Here, we wish to follow the classical argument and thus, a priori, restrict the problem formulation to curves Γ that are graphs of functions y.
A978-1-4614-3834-2_1_Fig1_HTML.gifFig. 1.1
The brachistochrone problem
The time of descent along such a curve can easily be computed with some elementary facts from physics. The speed of the particle is the change of distance traveled in time,
$$v = \frac{ds} {dt},$$and the total time can formally be computed as
$$T = \int \nolimits \nolimits dt = \int \nolimits \nolimits \frac{ds} {v}.$$In our case, s represents the arc length of the graph of a function y : [0, x 0] → ℝ + , x↦y(x), and thus
$$s(z) ={ \int \nolimits }_{a}^{z}\sqrt{1 + {y}^{{\prime} } {(x)}^{2}}dx,\qquad 0 \leq z \leq {x}_{ 0},$$which gives
$$T ={ \int \nolimits }_{0}^{{x}_{0} } \frac{\sqrt{1 + {y}^{{\prime} } {(x)}^{2}}} {v(x)} dx.$$We then need to express the velocity v as a function of x. In the absence of friction, the decrease in potential energy is accompanied by an equal increase in kinetic energy, i.e.,
$$mgy = \frac{1} {2}m{v}^{2},$$and thus the velocity at the point (x, y(x)) is given by
$$v(x) = \sqrt{2gy(x).}$$Summarizing, mathematically, the brachistochrone problem therefore can be formulated as the following minimization problem:
[Brachistochrone]
Finda function
that minimizes the integral
$$I(y) ={ \int \nolimits }_{0}^{{x}_{0} }\sqrt{\frac{1 + {y}^{{\prime} } {(x)}^{2 } } {2gy(x)}} dx.$$Note that we are dealing with a minimization problem over a set of functions; that is, the functions themselves are the variables of the problem. This raises some immediate questions, and at the very least, we should specify exactly what this class of functions,
the domain of the minimization problem, actually is. It turns out that this is not always an obvious choice, and it certainly is not in the case of the brachistochrone problem. For instance, we have the boundary condition y(0) = 0, and thus this is an improper integral. So we probably might want to require that the integral converge. On the other hand, if the integral diverges to ∞, obviously this is not going to be an optimal
choice, and if we allow for this, then why not keep these functions, since no harm will be done. Another obvious requirement is that y be differentiable, at least on the open interval (0, x 0). But do we need differentiability at the endpoints from the right at 0 and from the left at x 0? This is not quite clear. In fact, all we need is that the integral remain finite. Indeed, the solutions to the problem, the cycloids, are functions whose derivative y ′ (x) diverges to ∞ as x decreases to 0. As this example shows, the choice of the class of functions over which to minimize a given functional is not always a simple issue.
On the other hand, the choice of functions to minimize over is intimately connected with the important question of the existence of a solution. The theory of existence of solutions is well-established, and there exist numerous classical sources on this topic (for example, see [70, 118, 260], to mention just a couple of textbooks). The techniques used in existence proofs are very different from the methods pursued in this book, and therefore we will not address the issue of existence of solutions. Instead, we simply proceed, assuming that solutions to the problems we consider exist within a reasonably nice class of functions (as will be the case for the problems we shall consider) and try to single out candidates for a minimum. Our interest in this text is to characterize such a minimizer through necessary conditions for optimality and provide sufficient conditions that allow us to conclude the local or even global optimality of a candidate found through application of these necessary conditions. For the brachistochrone problem, and, more generally, for problems in the calculus of variations, a natural approach is, as in classical calculus, to ask whether the objective I might be differentiable
and then develop first- and second-order derivative tests. This is the approach of Lagrange, who formalized and generalized the main necessary condition for optimality derived earlier for the brachistochrone problem with geometric means by his teacher, Leonhard Euler, himself a student of Johann Bernoulli in Basel.
1.2 The Euler–Lagrange Equation
The brachistochrone problem is a special case of what commonly is called the simplest problem in the calculus of variations, and we use this problem to develop the fundamental results of the theory. As is customary, we denote the space of all continuous real-valued functions x : [a, b] → ℝ, t↦x(t), defined on the compact interval [a, b] by C([a, b]). We use the notation C r ([a, b]) for functions that are r-times continuously differentiable on the open interval (a, b) and have derivatives that extend continuously to the compact interval [a, b]. Furthermore, when describing intervals, we consistently use brackets to denote that the boundary point is included and parentheses to indicate that the boundary point is not part of the set. For example, (a, b] = { t ∈ ℝ : a < t ≤ b}, etc.
Definition 1.2.1.
We denote by $$\mathcal{X}$$ the Banach space C([a, b]) equipped with the supremum norm
$$\|{x\|}_{\mathbf{C}} =\| {x\|}_{\infty } {=\max }_{a\leq t\leq b}\vert x(t)\vert.$$Convergence in the supremum norm is uniform convergence (see Appendix A). It is well-known that C([a, b]) with the supremum norm is a Banach space, i.e., if $$\{{x{}_{n}\}}_{n\in \mathbb{N}}$$ is a sequence of continuous functions that is Cauchy in the supremum norm, then there exists a continuous function x such that x n converges to x uniformly on [a, b] (see Proposition A.2.2.
Definition 1.2.2.
We denote by $$\mathcal{Y}$$ the Banach space that is obtained by equipping C ¹([a, b]) with the norm
$$\|{x\|}_{D} =\| {x\|}_{\infty } +\|\dot{ {x}\|}_{\infty }.$$It easily follows that convergence in the norm $$\|{\cdot \|}_{D}$$ is equivalent to uniform convergence of both the curves and their derivatives on the compact interval [a, b]. Figure 1.2 gives an example of a low-amplitude, high-frequency oscillation that lies in a small neighborhood of $$x \equiv 0$$ in $$\mathcal{X}$$ , but not in $$\mathcal{Y}$$ .
A978-1-4614-3834-2_1_Fig2_HTML.gifFig. 1.2
A low-amplitude, high-frequency oscillation not close to $$x \equiv 0$$ in $$\mathcal{Y}$$
We can now formulate the simplest problem in the calculus of variations.
[CV]
Let L : ℝ ×ℝ ×ℝ → ℝ, (t, x, y)↦L(t, x, y), be an r-times continuously differentiable function, r ≥ 2. Among all functions x ∈ C ¹([a, b]) that, for two given points A and B in ℝ, satisfy the boundary conditions x(a) = A and x(b) = B, find one that minimizes the functional
$$I[x] ={ \int \nolimits }_{a}^{b}L(t,x(t),\dot{x}(t))dt.$$The integrand L is called the Lagrangian of the problem. Note that the derivative $$\dot{x}$$ of the function x takes the place of the argument y when the objective is evaluated, and it is therefore customary (although a bit confusing in the beginning) to simply denote this variable by $$\dot{x}$$ instead of y.
Definition 1.2.3 (Weak and strong minima).
A function x ∗ ∈ C ¹([a, b]) is called a weak local minimum if it minimizes the functional I over some neighborhood of x ∗ in $$\mathcal{Y}$$ ; it is said to provide a strong local minimum if it provides a minimum of I over some neighborhood of x ∗ in $$\mathcal{X}$$ .
Note that strong local minimizers x ∗ minimize the functional I over all functions x that have the property that they are close to the reference at every time t ∈ [a, b], whereas weak local minimizers are optimal only relative to those functions that in addition have their derivatives $$\dot{x}$$ close to the derivative of the reference $$\dot{{x}}_{{_\ast}}$$ and thus minimize over a smaller collection of functions.
Example.
The function $$x \equiv 0$$ is a weak, but not a strong, local minimum for the functional
$$I[x] ={ \int \nolimits }_{0}^{\pi }x{(t)}^{2}(1 -\dot{ x}{(t)}^{2})dt,\quad x(0) = 0,\;x(\pi ) = 0,$$defined on C ¹([0, π]). By inspection, the functional I[x] is nonnegative on the open unit ball $${B}_{1}(0) =\{ x \in \mathcal{Y} :\| {x\|}_{D} < 1\}$$ , and thus $${x}_{{_\ast}}\equiv 0$$ is a weak local minimum. But clearly, if $$\|\dot{{x}\|}_{\infty }$$ becomes too large, the functional can be made negative. For ε > 0 and n ∈ ℕ simply consider the low-amplitude, high-frequency oscillations x n ∈ C ¹([0, π]) given by x n (t) = εsin(nt). Then we have
$$\begin{array}{rcl} I[{x}_{n}]& =& {\epsilon }^{2}{ \int \nolimits }_{0}^{{\pi }\sin }{}^{2}(nt)\left (1 - {\epsilon }^{2}{n{}^{2}\cos }^{2}(nt)\right )dt \\ & =& \frac{{\epsilon }{}^{2}} {n} {\int \nolimits }_{0}^{{n\pi }\sin }{}^{2}(s)\left (1 - {\epsilon }^{2}{n{}^{2}\cos }{}^{2}(s)\right )ds \\ & =& {\epsilon }^{2}{ \int \nolimits }_{0}^{{\pi }\sin }{}^{2}(s)ds - {\epsilon }^{4}{n}^{2}{ \int \nolimits \nolimits }_{0}^{{\pi }\sin }{}^{2}{(s)\cos }^{2}(s)ds \\ & =& {\epsilon }^{2}{ \int \nolimits }_{0}^{{\pi }\sin }{}^{2}(s)ds -\frac{{\epsilon }^{4}{n}^{2}} {4} {\int \nolimits }_{0}^{{\pi }\sin }{}^{2}(2s)ds \\ & =& {\epsilon }^{2}{ \int \nolimits }_{0}^{{\pi }\sin }{}^{2}(s)ds -\frac{{\epsilon }{}^{4}{n}^{2}} {4} {\int \nolimits }_{0}^{{2\pi }\sin }{}^{2}(r)\frac{dr} {2} \\ & =& {\epsilon }^{2}\left (1 -\frac{{\epsilon }^{2}{n}^{2}} {4} \right ){\int \nolimits }_{0}^{{\pi }\sin }{}^{2}(s)ds, \\ \end{array}$$and thus I[x n ] < 0 for $$n > \frac{2} {\epsilon }$$ . In fact, I[x n ] → − ∞ as n → ∞, and this functional does not have a minimum. □
We now develop necessary conditions for a function x ∗ to be a weak local minimizer. The fundamental necessary conditions for optimality follow from the well-known conditions for minimizing a function in calculus by perturbing the reference x ∗ with a function h ∈ C ¹([a, b]), $$h(a) = h(b) = 0$$ . Let C 0 ¹([a, b]) denote this class of functions. Clearly, if x ∗ is a local minimizer, then for ε small enough, the function x ∗ + εh is also admissible for the minimization problem and lies in the neighborhood over which x ∗ is minimal. Thus, given any h ∈ C 0 ¹([a, b]), for ε in some small neighborhood of 0, we have that
$$I[{x}_{{_\ast}}] \leq I[{x}_{{_\ast}} + \epsilon h],$$and it thus is a necessary condition for minimality of x ∗ that for all h ∈ C 0 ¹([a, b]), the first derivative of the function $$\varphi (\epsilon ;h) = I({x}_{{_\ast}} + \epsilon h)$$ at ε = 0 vanish and that the second derivative be nonnegative:
$${\varphi }^{{\prime}}(0;h) ={ \frac{d} {d\epsilon }}_{\vert \epsilon =0}I[{x}_{{_\ast}} + \epsilon h] = 0$$(1.1)
and
$${\varphi }^{{\prime\prime}}(0;h) ={ \frac{{d}^{2}} {d{\epsilon }^{2}}}_{\vert \epsilon =0}I[{x}_{{_\ast}} + \epsilon h] \geq 0.$$(1.2)
The quantities in Eqs. (1.1) and (1.2) are called the first and second variationof the problem, respectively, and are customarily denoted by δI(x ∗ )[h] and δ² I(x ∗ )[h]. Differentiating under the integral gives that
$$\delta I[{x}_{{_\ast}}](h) ={ \int \nolimits }_{a}^{b}{L}_{ x}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})h + {L}_{\dot{x}}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})\dot{h}\,dt$$(1.3)
with the curve x ∗ , its derivative $$\dot{{x}}_{{_\ast}}$$ , and the variation h all evaluated at t, and similarly,
$${ \delta }^{2}I[{x}_{ {_\ast}}](h) ={ \int \nolimits }_{a}^{b}(h,\dot{h})\left (\begin{array}{cc} {L}_{xx}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})&{L}_{x\dot{x}}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}}) \\ {L}_{\dot{x}x}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})&{L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}}) \end{array} \right )\left (\begin{array}{c} h\\ \dot{h} \end{array} \right )dt.$$(1.4)
The following lemma is fundamental in the calculus of variations and makes it possible to eliminate the variational directions h from the necessary conditions.
Lemma 1.2.1.
Suppose α and β are continuous functions defined on a compact interval [a,b], α,β ∈ C([a,b]), with the property that
$${\int \nolimits }_{a}^{b}\alpha (t)h(t) + \beta (t)\dot{h}(t)dt = 0$$(1.5)
for all h ∈ C 0 ¹ ([a,b]). Then β is continuously differentiable, β ∈ C ¹ ([a,b]), and
$$\dot{\beta }(t) = \alpha (t)\quad \mathrm{for\ all\ \ }t \in (a,b).$$Proof.
If we define A(t) = ∫ a t α(s)ds, then, it follows from integration by parts that for all h ∈ C ¹([a, b]) and any constant c ∈ ℝ, we have that
$$0 ={ \int \nolimits }_{a}^{b}\left (-A(t) + \beta (t) - c\right )\dot{h}(t)\,dt.$$Choosing
$$c = \frac{1} {b - a}{\int \nolimits }_{a}^{b}(\beta (t) - A(t))\,dt$$and taking
$$h(t) ={ \int \nolimits }_{a}^{t}(\beta (s) - A(s) - c)\,ds,$$it follows that h ∈ C 0 ¹([a, b]) and
$${\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt = 0.$$Since $$\dot{h}$$ is continuous, h is constant, and h(a) = 0 implies $$h \equiv 0$$ . Hence $$\beta (t) = A(t) + c$$ , and therefore β is differentiable with derivative $$\dot{\beta }(t) = \alpha (t)$$ . □
Applying this to Eq. (1.3), we obtain the Euler–Lagrange equation, the fundamental first-order necessary condition for optimality for problems in the calculus of variations.
Corollary 1.2.1 (Euler–Lagrange equation).
If x ∗ is a weak minimum for the functional I, then the partial derivative $$\frac{\partial L} {\partial \dot{x}}$$ is differentiable along the curve t → (t,x ∗ (t), $$\dot{{x}}_{{_\ast}}(t))$$ with derivative given by
$$\frac{d} {dt}\left (\frac{\partial L} {\partial \dot{x}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\right ) = \frac{\partial L} {\partial x} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)).$$(1.6)
Admissible functions x : t↦x(t) that satisfy this differential equation are called extremals. Note that it is not required that L be twice differentiable for the Euler–Lagrange equation to hold. It is a part of the statement of the lemma that the composition $$t \rightarrow {L}_{\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ will always be differentiable in t (regardless of whether the second partial derivatives of L exist) and that its derivative is given by $${L}_{x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ .
If the Lagrangian L is twice differentiable, and if also the extremal x ∗ under consideration lies in C ², then we can differentiate Eq. (1.6) to obtain
$${L}_{t\dot{x}} + {L}_{\dot{x}x}\dot{{x}}_{{_\ast}} + {L}_{\dot{x}\dot{x}}\ddot{{x}}_{{_\ast}} = {L}_{x}.$$In this case, the Euler–Lagrange equation becomes a possibly highly nonlinear second-order ordinary differential equation. This equation, however, will be singular if the second derivative $${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ vanishes at some times t ∈ (a, b). We shall see in Sect. 1.4 that it is a second-order necessary condition for optimality, the so-called Legendre condition, that this term be nonnegative, but generally, it need not be positive. If $${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ is positive for all t ∈ [a, b], then the Euler–Lagrange equation is a regular second-order ordinary differential equation, and if we specify as initial conditions x(a) = A and $$\dot{x}(a) = p$$ for some parameter p, then a local solution x(t; p) exists near t = a. However, this solution need not necessarily exist on the full interval [a, b]. In addition, we are interested in the solution to the two-point boundary value problem with boundary condition x(b) = B. Thus there may be no solutions, the solution may be unique, or there may exist several, even infinitely many, solutions. The analysis of the Euler–Lagrange equation therefore generally is a challenging and nontrivial problem.
The following result, known as the Hilbert differentiability theorem, often is useful for establishing a priori differentiability properties of extremals.
Proposition 1.2.1 (Hilbert differentiability theorem).
Let x ∗ : [a,b] → ℝ be an extremal with the property that
$${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\neq 0\qquad \mathrm{for\ all\quad }t \in [a,b].$$Then x ∗ has the same smoothness properties as the Lagrangian L for problem [CV] . That is, if L ∈ C r , then also x ∗ ∈ C r.
Proof.
For some constant c, the extremal x ∗ is a solution to the Euler–Lagrange equation in integrated form,
$$\frac{\partial L} {\partial \dot{x}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) -{\int \nolimits }_{a}^{t}\frac{\partial L} {\partial x} (s,{x}_{{_\ast}}(s),\dot{{x}}_{{_\ast}}(s))ds - c = 0.$$If we define a function G(t, u) as
$$G(t,u) = \frac{\partial L} {\partial \dot{x}} (t,{x}_{{_\ast}}(t),u) -{\int \nolimits }_{a}^{t}\frac{\partial L} {\partial x} (s,{x}_{{_\ast}}(s),\dot{{x}}_{{_\ast}}(s))ds - c,$$then the equation G(t, u) = 0 has the solution $$u =\dot{ {x}}_{{_\ast}}(t)$$ . By the implicit function theorem (see Theorem A.3.1), this solution is locally unique and k times continuously differentiable if the function G(t, ⋅) is k times continuously differentiable and if the partial derivative
$$\frac{\partial G} {\partial u} (t,\dot{{x}}_{{_\ast}}(t)) = \frac{{\partial }^{2}L} {\partial \dot{{x}}^{2}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$does not vanish. Hence x ∗ ∈ C r if L is r times continuously differentiable. □
For integrands L of the form $$L(x,\dot{x}) = {x}^{\alpha }\sqrt{1 +\dot{ {x}}^{2}}$$ with α ∈ ℝ, we have that
$${L}_{\dot{x}\dot{x}}(x,\dot{x}) = \frac{{x}^{\alpha }} {{\left (1 +\dot{ {x}}^{2}\right )}^{\frac{3} {2} }} > 0,$$and thus all extremals that lie in x > 0 are twice continuously differentiable. This applies, for example, to the brachistochrone problem with $$\alpha = -\frac{1} {2}$$ .
Corollary 1.2.2 (First integral).
If x ∗ is a weak minimum for the functional I that is twice continuously differentiable and if the function L is time-invariant, then it follows that the function
$$t\mapsto L(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) -\dot{ {x}}_{{_\ast}}(t)\frac{\partial L} {\partial \dot{x}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$(1.7)
is constant over the interval [a,b], i.e., the Euler–Lagrange equation has a first integral
given by
Proof.
Since L does not explicitly depend on t, we have, omitting the arguments, that
$$\begin{array}{rcl} \frac{d} {dt}(L -\dot{ x}{L}_{\dot{x}})& =& {L}_{x}\dot{x} + {L}_{\dot{x}}\ddot{x} -\ddot{ x}{L}_{\dot{x}} -\dot{ x} \frac{d} {dt}\left ({L}_{\dot{x}}\right ) \\ & =& \dot{x}\left ({L}_{x} - \frac{d} {dt}\left ({L}_{\dot{x}}\right )\right ) \equiv 0. \end{array}$$□
Thus, for a time-invariant system, all twice continuously differentiable extremals need to lie in the level curves of the function $$L -\dot{ x}{L}_{\dot{x}}$$ . This often provides a quick path to finding extremals for a given problem, and we will use formula (1.7) to find extremals for the brachistochrone problem. Deleting the constant, in this case L is given by
$$L(x,\dot{x}) = \sqrt{\frac{1 +\dot{ {x}}^{2 } } {x}},$$and thus the first integral takes the form
$$\sqrt{\frac{1 +\dot{ {x}}^{2 } } {x}} -\dot{ x}\left (\frac{1} {2}\sqrt{ \frac{1} {x\left (1 +\dot{ {x}}^{2}\right )}}2\dot{x}\right ) = c,$$where c is some constant. Simplifying this expression yields
$$(1 +\dot{ {x}}^{2} -\dot{ {x}}^{2})\sqrt{ \frac{1} {x\left (1 +\dot{ {x}}^{2}\right )}} = c,$$or equivalently, and renaming the constant,
$$x\left (1 +\dot{ {x}}^{2}\right ) = 2\xi > 0.$$(1.8)
While it is possible to solve the resulting ordinary differential equation with standard methods (albeit a bit tedious), it is more elegant and quicker to introduce a reparameterization t = t(τ) of the time scale so that the derivative $$\dot{x}$$ becomes
$$\frac{dx} {dt} = -\tan \left (\frac{\tau } {2}\right ).$$(1.9)
For in this case, we then have
$$1 +\dot{ {x}}^{2} ={ \frac{1} {\cos }^{2}\left (\frac{\tau } {2} \right )}$$and thus
$$x = \frac{2\xi } {1 +\dot{ {x}}^{2}} = 2{\xi \cos }^{2}\left (\frac{\tau } {2}\right ) = \xi (1 +\cos \tau ).$$(1.10)
Furthermore,
$$\begin{array}{rcl} \frac{\mathit{dt}} {\mathit{d}\tau } = \frac{\mathit{dt}} {\mathit{dx}} \frac{\mathit{dx}} {\mathit{d}\tau }& = - \frac{1} {\tan \left (\frac{\tau } {2} \right )}\xi (-\sin \tau ) = \xi \frac{2\sin \left (\frac{\tau } {2} \right )\cos \left (\frac{\tau } {2} \right )} {\tan \left (\frac{\tau } {2} \right )} & \\ & = 2{\xi \cos }^{2}\left (\frac{\tau } {2} \right ) = \xi (1 +\cos \tau ), & \\ \end{array}$$which gives
$$t(\tau ) = \xi (\tau +\sin \tau ) + c.$$Since extremals start at the origin, x(a) = 0, we choose $$a = -\pi $$ , and then the constant is c = ξπ. Thus, overall, the required time reparameterization is given by
$$t(\tau ) = \xi (\pi + \tau +\sin \tau ).$$(1.11)
Equations (1.10) and (1.11) represent, as already mentioned, a family of curves called cycloids. Geometrically, a cycloid is the locus that a point fixed on the circumference of a circle of radius ξ traverses as the circle rolls on the lower side of the line x = 0. Examples of such curves are drawn in Fig. 1.3 below. Note that although we represent the cycloids as curves in $${\mathbb{R}}_{+}^{2} =\{ (t,x) : t > 0,x > 0\}$$ , each cycloid is the graph of some function t↦x(t). We conclude this discussion of the extremals for the brachistochrone problem by showing that there exists one and only one cycloid in this family that passes through a given point (t, x) with t and x positive.
Theorem 1.2.1.
The family $$\mathcal{C}$$ of curves defined by Ξ,
$$\begin{array}{rcl} \Xi & : (-\pi,\pi ) \times (0,\infty ) \rightarrow {\mathbb{R}}_{+}^{2}, & \\ & \qquad (\tau,\xi )\mapsto (t(\tau ;\xi ),x(\tau ;\xi )) = (\xi (\pi + \tau +\sin \tau ),\xi (1 +\cos \tau )),&\end{array}$$(1.12)
covers $${\mathbb{R}}_{+}^{2} =\{ (t,x) : t > 0,x > 0\}$$ diffeomorphically; that is, the map Ξ is one-to-one, onto, and has a continuously differentiable inverse.
Proof.
(Outline) Define f : ( − π, π) → (0, ∞) by
$$f(\tau ) = \frac{1 +\cos \tau } {\pi + \tau +\sin \tau }.$$It is a matter of elementary calculus to verify that f is 1 : 1 and onto. (Establish that f has a simple pole at $$\tau = -\pi $$ , is strictly monotonically decreasing over the interval ( − π, π), and is continuous at π with value 0.) Thus, given (α, β) ∈ ℝ + ², there exists a unique τ ∈ ( − π, π) such that $$f(\tau ) = \frac{\beta } {\alpha }$$ ; the parameter ξ is then given by solving either Eqs. (1.10) or (1.11) for ξ. Furthermore, an explicit calculation verifies that the Jacobian determinant of the transformation is everywhere positive, and thus by the inverse function theorem (see Theorem A.3.1), Ξ has a continuously differentiable inverse as well. □
This verifies that the family of cycloids forms what later on will be called a central field on the region ℝ + ². We shall prove in Sect. 1.8 that these curves are indeed the optimal solutions. Note that in the original parameterization x = x(t) of the curves as functions of t, the derivative $$\dot{x}$$ converges to ∞ as t converges to 0 from the right (cf. Eq. (1.9)).
A978-1-4614-3834-2_1_Fig3_HTML.gifFig. 1.3
The field of cycloids for the brachistochrone problem
1.3 Surfaces of Revolution of Minimum Area
A second classical example in the calculus of variations that dates back to Euler in the mid-eighteenth century is to find surfaces of revolution that have a minimum area, so-called minimal surfaces: Given t 1 > 0 and x 0 and x 1 positive numbers, among all curves that join two given points (0, x 0) and (t 1, x 1) and have positive values, find the one that generates a surface that has the smallest surface area of revolution when rotated around the t-axis (see Fig. 1.4).
A978-1-4614-3834-2_1_Fig4_HTML.gifFig. 1.4
Surfaces of revolution
As in the brachistochrone problem, for starters, we restrict the curves to be graphs of differentiable functions, and we do not allow the functions to touch 0. There is no difficulty in finding the solutions experimentally: dip two circular loops into soap water and remove them, holding them close to each other; the soap film that forms between the circles is the desired solution. Unfortunately, this is a significantly harder problem mathematically. But then again, its solution offers important insights into calculus of variations problems as a whole, and in many ways, in the words of Gilbert Bliss, this example is the most satisfactory illustration which we have of the principles of the general theory of the calculus of variations
[39, p. 85]. Mathematically, the problem can be formulated as follows:
[Minimal Surfaces]
Givent 1 > 0, let x 0 and x 1 be positive reals. Minimize the integral
$$I[x] = 2\pi {\int \nolimits }_{0}^{{t}_{1} }x\sqrt{1 +\dot{ {x}}^{2}}dt$$over all functions x ∈ C ¹([0, x 1]) that have positive values, x : [0, t 1] → (0, ∞), and satisfy the boundary conditions
$$x(0) = {x}_{0}\qquad \mathrm{and}\qquad x({t}_{1}) = {x}_{1}.$$The integrand L of the general problem formulation is given by
$$L(x,\dot{x}) = 2\pi x\sqrt{1 +\dot{ {x}}^{2}},$$and, as for the brachistochrone problem, it follows from the Hilbert differentiability theorem that extremals lie in C ². We thus again use the first integral $$L -\dot{ x}{L}_{\dot{x}} = c$$ with c a constant to find the extremals. Here
$$x\sqrt{1 +\dot{ {x}}^{2}} -\dot{ x}x \frac{\dot{x}} {\sqrt{1 +\dot{ {x}}^{2}}} = c,$$and simplifying this expression gives
$$\frac{x} {\sqrt{1 +\dot{ {x}}^{2}}}(1 +\dot{ {x}}^{2} -\dot{ {x}}^{2}) = c,$$or renaming the constant,
$$\frac{x} {\sqrt{1 +\dot{ {x}}^{2}}} = \beta.$$(1.13)
This equation is again most easily solved through a reparameterization of the time scale, t = t(τ). Here we want the derivative $$\dot{x}$$ to be
$$\frac{dx} {dt} =\sinh \tau,$$(1.14)
so that
$$1 +\dot{ {x}}^{2} {=\cosh }^{2}\tau $$and thus
$$x(\tau ) = \beta \cosh \tau.$$This also implies
$$\frac{\mathit{dt}} {\mathit{d}\tau } = \frac{\mathit{dt}} {\mathit{dx}} \frac{\mathit{dx}} {\mathit{d}\tau } = \frac{1} {\sinh \tau }\beta \sinh \tau = \beta,$$and thus we simply have
$$t(\tau ) = \alpha + \beta \tau $$for some constant α. Hence, the general solutions to the Euler–Lagrange equation are given by the following family of catenaries:
$$x(t) = \beta \cosh \left (\frac{t - \alpha } {\beta } \right ),\qquad \alpha \in \mathbb{R},\qquad \beta > 0.$$(1.15)
However, as illustrated in Fig. 1.5, the mapping properties of this family of extremals are drastically different from those of the cycloids for the brachistochrone problem. Now, given a fixed initial condition (0, x 0), it is no longer true that an extremal necessarily passes through any point (t 1, x 1) ∈ ℝ + ², while there exist two extremals through other points (t 1, x 1). The theorem below summarizes the mapping properties of the family of catenaries. We call an open and connected subset of ℝ n a region.
Theorem 1.3.1.
There exists a differentiable, strictly monotonically increasing, and strictly convex function γ : [0,∞) → [0,∞), t↦γ(t), that satisfies
$$\gamma (0) = 0,\ \dot{\gamma }(0+) = 0,\ \mathrm{{and}\ \lim }_{t\rightarrow +\infty }\gamma (t) = +\infty {,\ \lim }_{t\rightarrow +\infty }\dot{\gamma }(t) = +\infty,$$with the property that if C denotes the graph of γ, then there exist exactly two catenaries that pass through any point (t 1 ,x 1 ) in the region R above the graph of γ, and there exists no catenary that passes through a point (t 1 ,x 1 ) in the region S below the graph of γ. Through any point (t 1 ,x 1 ) on the graph C itself there exists a unique catenary that passes through the point.
This material is classical, and we follow the presentation of Bliss [39, pp. 92–98] in our proof below. Since we shall return to this example numerous times for illustrative purposes, we also include some intricate details of the calculations to give the complete picture. The reader who is mostly interested in the theoretical developments may elect to skip this technical proof without loss of continuity, but some of the properties established here will be used later on.
Proof.
The initial condition x 0 imposes the relation
$${x}_{0} = \beta \cosh \left (-\frac{\alpha } {\beta }\right )$$and reduces the collection of catenaries defined by Eq. (1.15) to a 1-parameter family. Introduce a new parameter p, $$p = -\frac{\alpha } {\beta }$$ , so that we have
$$\beta = \frac{{x}_{0}} {\cosh p} \quad \mathrm{and}\quad \alpha = -p\beta = -\frac{p{x}_{0}} {\cosh p}.$$This gives the following parameterization of all extremals through the point (0, x 0) in terms of the parameter p:
$$x(t;p) = \frac{{x}_{0}} {\cosh p} \cosh \left (p + \frac{t} {{x}_{0}}\cosh p\right ),\quad p \in \mathbb{R},\quad t > 0.$$(1.16)
We fix the time t and compute the range of x(t; p) for p ∈ ℝ. As p → ± ∞, the term $$p + \frac{t} {{x}_{0}} \cosh p$$ diverges much faster to + ∞ than p does, and it follows from the properties of the hyperbolic cosine that
$${\lim }_{p\rightarrow \pm \infty }x(t;p) = +\infty.$$This also is readily verified using L’Hospital’s rule. Hence the real analytic function x(t; ⋅) has a global minimum at some point p ∗ = p ∗ (t) and
$$\frac{\partial x} {\partial p}(t;{p}_{{_\ast}}) = 0.$$The key technical computations are given in the lemma below, where, as is customary, we denote partial derivatives with respect to t by a dot.
Lemma 1.3.1.
If p ∗ is a stationary point, $$\frac{\partial x} {\partial p}(t;{p}_{{_\ast}}) = 0$$ , then
$$\dot{x}(t;{p}_{{_\ast}}) = \frac{\partial x} {\partial t} (t;{p}_{{_\ast}}) > 0\qquad \mathrm{and}\qquad \frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}) > 0.$$(1.17)
Proof.
In these computations, it is important to combine terms that arise properly to recognize sign relations that are not evident a priori. We have that
$$\dot{x}(t;p) =\sinh \left (p + \frac{t} {{x}_{0}}\cosh p\right ),$$and for later reference note that
$$\frac{x(t;p)} {\dot{x}(t;p)} = \frac{\cosh \left (p + \frac{t} {{x}_{0}} \cosh p\right )} {\sinh \left (p + \frac{t} {{x}_{0}} \cosh p\right )} \frac{{x}_{0}} {\cosh p}.$$(1.18)
The partial derivatives of x with respect to p are best expressed in terms of this quantity at times t and 0. We have that
$$\begin{array}{rcl} \frac{\partial x} {\partial p}(t;p)& =&{ \frac{{x}_{0}} {\cosh }^{2}p}\left [\sinh \left (p + \frac{t} {{x}_{0}}\cosh p\right )\left (1 + \frac{t} {{x}_{0}}\sinh p\right )\cosh p\right. \\ & & \qquad \qquad \left.-\cosh \left (p + \frac{t} {{x}_{0}}\cosh p\right )\sinh p\right ] \\ & =& \frac{\sinh \left (p + \frac{t} {{x}_{0}} \cosh p\right )\sinh p} {\cosh p} \\ & & \times \left [\frac{{x}_{0} + t\sinh p} {\sinh p} -\frac{\cosh \left (p + \frac{t} {{x}_{0}} \cosh p\right )} {\sinh \left (p + \frac{t} {{x}_{0}} \cosh p\right )} \frac{{x}_{0}} {\cosh p} \right ] \\ & =& \frac{\dot{x}(t;p)\dot{x}(0;p)} {\cosh p} \left [t + \frac{x(0;p)} {\dot{x}(0;p)} -\frac{x(t;p)} {\dot{x}(t;p)}\right ].\end{array}$$(1.19)
This quantity has an interesting and useful geometric interpretation due to Lindelöf: For the moment, fix p and consider the catenary x(s, p) as a function of s and let ℓ 0 and ℓ t denote the tangent lines to the graph at the initial point (0, x 0) and at the point (t, x(t; p)), respectively, i.e.,
$${\mathcal{l}}_{0}(s) = {x}_{0} + s\dot{x}(0;p)\mathrm{\qquad and\qquad }{\mathcal{l}}_{t}(s) = x(t;p) + (s - t)\dot{x}(t;p).$$These tangent lines intersect for
$$s = \frac{x(t;p) - {x}_{0} - t\dot{x}(t;p)} {\dot{x}(0;p) -\dot{ x}(t;p)},$$and the value at the intersection is given by
$$\begin{array}{rcl} \varkappa (t;p)& = {x}_{0} + \frac{x(t;p)-{x}_{0}-t\dot{x}(t;p)} {\dot{x}(0;p)-\dot{x}(t;p)} \dot{x}(0;p) & \\ & = \frac{\dot{x}(t;p)\dot{x}(0;p)} {\dot{x}(t;p)-\dot{x}(0;p)}\left [t -\frac{x(t;p)} {\dot{x}(t;p)} + {x}_{0}\left (\frac{\dot{x}(t;p)-\dot{x}(0;p)} {\dot{x}(t;p)\dot{x}(0;p)} + \frac{1} {\dot{x}(t;p)}\right )\right ]& \\ & = \frac{\dot{x}(t;p)\dot{x}(0;p)} {\dot{x}(t;p)-\dot{x}(0;p)}\left [t -\frac{x(t;p)} {\dot{x}(t;p)} + \frac{{x}_{0}} {\dot{x}(0;p)}\right ]. & \\ \end{array}$$We can therefore express the partial derivatives of x with respect to p as
$$\frac{\partial x} {\partial p}(t;p) = \varkappa (t;p)\frac{\dot{x}(t;p) -\dot{ x}(0;p)} {\cosh p}.$$(1.20)
Since the function t↦x(t; p) is strictly convex, it follows that $$\dot{x}(t;p) >\dot{ x}(0;p)$$ , and thus p ∗ is a stationary point of the function p↦x(t; p), now with t fixed, if and only if ϰ(t; p ∗ ) = 0, that is, if and only if the tangent line ℓ 0 to the catenary at the initial point (0, x 0) and the tangent line ℓ t through the point (t, x(t; p)) intersect on the horizontal, or time, axis (see Fig. 1.6). In particular, this is possible only if $$\dot{x}(t;{p}_{{_\ast}})$$ is positive. Geometrically, the point x(t, p ∗ ) always lies on the increasing portion of the catenary s↦x(s, p ∗ ). Furthermore, at any stationary point p ∗ , we have that
$$\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} -\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})} = t > 0.$$(1.21)
We now compute the second partial derivative with respect to p, but evaluate it only at a stationary point p ∗ . Since the term in brackets in Eq. (1.19) vanishes, we get that
$$\frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}) = \frac{\dot{x}(t;{p}_{{_\ast}})\dot{x}(0;{p}_{{_\ast}})} {\cosh {p}_{{_\ast}}} \frac{\partial } {\partial p}{\left (\frac{x(0;p)} {\dot{x}(0;p)} -\frac{x(t;p)} {\dot{x}(t;p)}\right )}_{\vert {p}_{{_\ast}}}.$$Differentiating Eq. (1.18) gives
$$\begin{array}{rcl} & \frac{\partial } {\partial p}\left (\frac{x(t;p)} {\dot{x}(t;p)}\right ) = -{ \frac{{x}_{0}+t\sinh p} {\sinh }^{2}\left (p+ \frac{t} {{x}_{0}} \cosh p\right )\cosh p} - {x}_{0}\frac{\cosh \left (p+ \frac{t} {{x}_{0}} \cosh p\right )} {\sinh \left (p+ \frac{t} {{x}_{0}} \cosh p\right )}{ \frac{\sinh p} {\cosh }^{2}p}& \\ & \quad = -{\frac{\left ({x}_{0}+t\sinh p\right )\cosh p+{x}_{0}\cosh \left (p+ \frac{t} {{x}_{0}} \cosh p\right )\sinh \left (p+ \frac{t} {{x}_{0}} \cosh p\right )\sinh p} {\sinh }^{2}{\left (p+ \frac{t} {{x}_{0}} \cosh p\right )\cosh }^{2}p}. & \\ \end{array}$$At a stationary point p ∗ , we have that (cf. the derivation of Eq. (1.19))
$${x}_{0} + t\sinh {p}_{{_\ast}} = {x}_{0}\frac{\cosh \left ({p}_{{_\ast}} + \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\sinh {p}_{{_\ast}}} {\sinh \left ({p}_{{_\ast}} + \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\cosh {p}_{{_\ast}}},$$(1.22)
and thus
$$\begin{array}{rcl} \frac{\partial } {\partial p}{\left (\frac{x(t;p)} {\dot{x}(t;p)}\right )}_{\vert {p}_{{_\ast}}} =& -{\frac{{x}_{0}\cosh \left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\sinh {p}_{{_\ast}}} {\sinh }^{3}{\left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\cosh }^{2}{p}_{{_\ast}}}& \\ & -\frac{{x}_{0}\cosh \left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\sinh {p}_{{_\ast}}} {\sinh {\left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\cosh }^{2}{p}_{{_\ast}}} \qquad \qquad \qquad \quad & \\ \end{array}$$$$\begin{array}{rcl} & = -{x}_{0}\frac{\cosh \left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\left [1{+\sinh }^{2}\left ({p}_{ {_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\right ]\sinh {p}_{{_\ast}}} {\sinh }^{3}{\left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\cosh }^{2}{p}_{{_\ast}} & \\ & = -{x{}_{0} \frac{{\cosh }^{3}\left ({p}_{ {_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\sinh {p}_{{_\ast}}} {\sinh }^{3}{\left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )\cosh }^{2}{p}_{{_\ast}}} & \\ & = - \frac{1} {{x}_{0}^{2}}{ \left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})}\right )}^{3}\cosh {p}_{ {_\ast}}\sinh {p}_{{_\ast}}. \end{array}$$Hence, overall,
$$\begin{array}{rcl} \frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}})& = \frac{\dot{x}(t;{p}_{{_\ast}})\sinh {p}_{{_\ast}}} {\cosh {p}_{{_\ast}}} \frac{1} {{x}_{0}^{2}} \left [{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})}\right )}^{3} -{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})}\right )}^{3}\right ]\cosh {p}_{ {_\ast}}\sinh {p}_{{_\ast}}& \\ & = \frac{\dot{x}(t;{p}_{{_\ast}})} {{x}_{0}^{2}}{ \left [{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})}\right )}^{3} -{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})}\right )}^{3}\right ]\sinh }^{2}{p}_{ {_\ast}}. &\end{array}$$(1.23)
We already have shown above that $$\dot{x}(t;{p}_{{_\ast}}) > 0$$ , and it follows from Eq. (1.21) that
$$\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} > \frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})}.$$Clearly, p ∗ ≠0, and thus
$$\frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}) > 0.$$□
This lemma implies that every stationary point of the function p↦x(t; p) is a local minimum. But then this function x(t; ⋅) has a unique stationary point p ∗ = p ∗ (t) corresponding to its global minimum over ℝ, and p ∗ : [0, ∞) → [0, ∞), t↦p ∗ (t), is well-defined, i.e., is a single-valued function. Furthermore, p ∗ is given by the solution of the equation
$$\frac{\partial x} {\partial p}(t;p) = 0.$$Since $$\frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}(t)) > 0$$ , it follows from the implicit function theorem that p ∗ (t) is continuously differentiable with derivative given by
$$\dot{{p}}_{{_\ast}}(t) = -\frac{\frac{{\partial }^{2}x} {\partial t\partial p}(t;{p}_{{_\ast}}(t))} {\frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}(t))}.$$A978-1-4614-3834-2_1_Fig5_HTML.gifFig. 1.5
The family of catenaries and its envelope
A978-1-4614-3834-2_1_Fig6_HTML.gifFig. 1.6
Tangent lines to the catenary x( ⋅; p ∗ ) for a stationary p ∗
It is clear that for t fixed, x(t; p) is strictly decreasing in p for p < p ∗ (t) and strictly increasing for p > p ∗ (t) with limit ∞ as p → ± ∞. Hence, for any point (t 1, x 1) with t 1 > 0 and x 1 > 0, the equation x(t 1; p) = x 1 has exactly two solutions p 1 < p ∗ (t 1) < p 2 if x 1 > x(t 1; p ∗ (t 1)), the unique solution p ∗ (t 1) if x 1 = x(t 1; p ∗ (t 1)), and no solutions for x 1 < x(t 1; p ∗ (t 1)). The minimum value x(t 1; p ∗ (t 1)) is positive, and since
$$\frac{\partial x} {\partial p}(t;0) = {x}_{0}\sinh \left ( \frac{t} {{x}_{0}}\right ) > 0,$$p ∗ (t) is always negative.
Consequently, the curve
$$\gamma : [0,\infty ) \rightarrow [0,\infty ),\quad t\mapsto \gamma (t),$$is defined by
$$\gamma (t) = x(t;{p}_{{_\ast}}(t)).$$The derivatives of γ are then given by
$$\dot{\gamma }(t) = \frac{d\gamma } {dt} (t) = \frac{\partial x} {\partial t} (t;{p}_{{_\ast}}(t)) + \frac{\partial x} {\partial p}(t;{p}_{{_\ast}}(t))\dot{{p}}_{{_\ast}}(t) = \frac{\partial x} {\partial t} (t;{p}_{{_\ast}}(t)) > 0,$$so that γ is strictly increasing, and
$$\begin{array}{rcl} \ddot{\gamma }(t)& = \frac{{d}^{2}\gamma } {d{t}^{2}} (t) = \frac{{\partial }^{2}x} {\partial {t}^{2}} (t;{p}_{{_\ast}}(t)) + \frac{{\partial }^{2}x} {\partial p\partial t}(t;{p}_{{_\ast}}(t))\dot{{p}}_{{_\ast}}(t)& \\ & = \frac{{\partial }^{2}x} {\partial {t}^{2}} (t;{p}_{{_\ast}}(t)) -\frac{{\left ( \frac{{\partial }^{2}x} {\partial t\partial p}(t;{p}_{{_\ast}}(t))\right )}^{2}} { \frac{{\partial }^{2}x} {\partial {p}^{2}} (t;{p}_{{_\ast}}(t))}. & \\ \end{array}$$In general, we have that
$$\frac{{\partial }^{2}x} {\partial {t}^{2}} (t;p) =\cosh \left (p + \frac{t} {{x}_{0}}\cosh p\right ) \frac{\cosh p} {{x}_{0}} = x{(t;p) \frac{\cosh }{}^{2}p} {{x}_{0}^{2}}$$and
$$\frac{{\partial }^{2}x} {\partial p\partial t}(t;p) =\cosh \left (p + \frac{t} {{x}_{0}}\cosh p\right )\left (1 + \frac{t} {{x}_{0}}\sinh p\right ).$$Using Eq. (1.22) to eliminate x 0 + tsinhp ∗ at the critical point p ∗ , we obtain
$$\begin{array}{rcl} \frac{{\partial }^{2}x} {\partial p\partial t}(t;{p}_{{_\ast}})& = \frac{{\cosh }^{2}\left ({p}_{ {_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )} {\sinh \left ({p}_{{_\ast}}+ \frac{t} {{x}_{0}} \cosh {p}_{{_\ast}}\right )} \frac{\sinh {p}_{{_\ast}}} {\cosh {p}_{{_\ast}}} & \\ & = \frac{1} {{x}_{0}^{2}} \frac{x{(t;{p}_{{_\ast}})}^{2}} {\dot{x}(t;{p}_{{_\ast}})} \cosh {p}_{{_\ast}}\sinh {p}_{{_\ast}}.& \\ \end{array}$$Putting all this together, and using Eq. (1.23), we therefore have that
$$\begin{array}{rcl} \frac{{d}^{2}\gamma } {d{t}^{2}} (t)& = x(t;{p}_{{_\ast}})\frac{{\cosh }^{2}{p}_{ {_\ast}}} {{x}_{0}^{2}} - \frac{{\left ( \frac{1} {{x}_{0}^{2}} \frac{x{(t;{p}_{{_\ast}})}^{2}} {\dot{x}(t;{p}_{{_\ast}})} \cosh {p}_{{_\ast}}\sinh {p}_{{_\ast}}\right )}^{2}} { \frac{\dot{x}(t;{p}_{{_\ast}})} {{x}_{0}^{2}}{ \left [{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} \right )}^{3}-{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})} \right )}^{3}\right ]\sinh }^{2}{p}_{{_\ast}}}& \\ & = x(t;{p}_{{_\ast}})\frac{{\cosh }^{2}{p}_{ {_\ast}}} {{x}_{0}^{2}} \left [1 - \frac{{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} \right )}^{3}} {{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} \right )}^{3}-{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})} \right )}^{3}} \right ] & \\ & = -\frac{x(t;{p}_{{_\ast}})} {{x}_{0}^{2}}{ \frac{{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})} \right )}^{3}} {{\left (\frac{x(t;{p}_{{_\ast}})} {\dot{x}(t;{p}_{{_\ast}})} \right )}^{3}-{\left (\frac{x(0;{p}_{{_\ast}})} {\dot{x}(0;{p}_{{_\ast}})} \right )}^{3}} \cosh }^{2}{p}_{{_\ast}}. & \\ \end{array}$$By Eq. (1.21), the denominator is positive. Since p ∗ < 0, it follows that $$\dot{x}(0;{p}_{{_\ast}}) =\sinh {p}_{{_\ast}} < 0$$ , and thus $$\ddot{\gamma }(t)$$ is positive. Hence the function γ is strictly convex over (0, ∞).
In particular, these properties imply that $${\lim }_{t\rightarrow +\infty }\gamma (t) = +\infty $$ . But also $$\dot{\gamma }(t) \rightarrow +\infty $$ as t → + ∞: The function $$\kappa (p) = \frac{p} {\cosh p}$$ is bounded over ℝ, and therefore,
$$\begin{array}{rcl}{ p}_{{_\ast}}(t) + \frac{t} {{x}_{0}}\cosh {p}_{{_\ast}}(t)& = \left (\frac{{p}_{{_\ast}}(t)} {\cosh {p}_{{_\ast}}(t)} + \frac{t} {{x}_{0}} \right )\cosh {p}_{{_\ast}}(t) & \\ & \geq \frac{{p}_{{_\ast}}(t)} {\cosh {p}_{{_\ast}}(t)} + \frac{t} {{x}_{0}} \rightarrow \infty \quad \mathrm{as\ }t \rightarrow \infty.& \\ \end{array}$$Thus, also
$$\dot{\gamma }(t) = \frac{\partial x} {\partial t} (t;{p}_{{_\ast}}(t)) =\sinh \left ({p}_{{_\ast}}(t) + \frac{t} {{x}_{0}}\cosh {p}_{{_\ast}}(t)\right ) \rightarrow \infty \quad \mathrm{as}\ t \rightarrow \infty.$$We conclude the proof with an analysis of the asymptotic properties of γ(t) as t → 0 + . An arbitrary catenary x(s; p) has its minimum when $$p + \frac{s} {{x}_{0}} \cosh p = 0$$ , and the minimum value is given by $$\frac{{x}_{0}} {\cosh p}$$ . For t > 0 small enough, there exists a solution $$\tilde{p} =\tilde{ p}(t)$$ to the equation
$$\frac{p} {\cosh p} = - \frac{t} {{x}_{0}}$$with the property that $$\tilde{p}(t) \rightarrow -\infty $$ as t → 0 + (see Fig. 1.7).
A978-1-4614-3834-2_1_Fig7_HTML.gifFig. 1.7
Asymptotic properties of γ(t)
For any time t, we thus have that
$$\gamma (t) {=\min }_{p\in \mathbb{R}}x(t;p) \leq x(t;\tilde{p}(t)) = \frac{{x}_{0}} {\cosh \tilde{p}(t)},$$and consequently $${\lim }_{t\rightarrow 0+}\gamma (t) = 0$$ . Furthermore, since the graph of γ lies below the curve ς of minima,
$$\varsigma : t\mapsto \left (t, \frac{{x}_{0}} {\cosh \tilde{p}(t)}\right ) = \left (-\frac{{x}_{0}\tilde{p}(t)} {\cosh \tilde{p}(t)}, \frac{{x}_{0}} {\cosh \tilde{p}(t)}\right ),$$it also follows that there exists a time τ ∈ (0, t) at which the slope $$\dot{\gamma }(\tau )$$ must be smaller than the slope of the line connecting the origin with this minimum point on the catenary, i.e.,
$$\dot{\gamma }(\tau ) \leq \frac{1} {t} \frac{{x}_{0}} {\cosh \tilde{p}(t)} = - \frac{\cosh \tilde{p}(t)} {{x}_{0}\tilde{p}(t)} \frac{{x}_{0}} {\cosh \tilde{p}(t)} = - \frac{1} {\tilde{p}(t)} \rightarrow 0\quad \mathrm{as\ }t \rightarrow 0 +.$$Since $$\dot{\gamma }(t)$$ is strictly increasing, we therefore also have that $${\lim }_{t\rightarrow 0+}\dot{\gamma }(t) = 0$$ . This concludes the proof. □
For endpoints (t 1, x 1) in the region R above the curve C, there exist two solutions, and thus these geometric properties naturally raise the question as to which of the catenaries is optimal. The curve C : [0, ∞) → [0, ∞), t↦(t, γ(t)), is regular (i.e., the tangent vector $$\dot{\gamma }$$ is everywhere nonzero) and is everywhere tangent to exactly one member of the parameterized family of extremals, x : [0, ∞) → [0, ∞), t↦x(t, p) (with p < 0), without itself being a member of the family. In geometry such a curve is called an envelope (see Fig. 1.8). The catenary $$x(t;0) = {x}_{0}\cosh \left ( \frac{t} {{x}_{0}} \right )$$ for p = 0 is asymptotic to C at infinity, and for p > 0 the catenaries do not intersect the curve C. In fact, for each p < 0, there exists a unique time t = t c (p) > 0 when the corresponding catenary (that is, the graph of the curve) is tangent to the envelope C, and if we were to allow curves that lie in x 1 < 0 as well, then for p > 0 the catenaries touch the curve that is obtained from C by reflection along the x-axis at time $${t}_{c}(p) = -{t}_{c}(-p) < 0$$ . The structure of the solutions for x 1 < 0 is symmetric to the one for x 1 > 0, and thus we consider only this scenario. By restricting the times along the catenaries appropriately, we can eliminate the overlaps. If we define
$$D =\{ (t,p) : p \in \mathbb{R},\,0 < t < \tau (p)\},$$A978-1-4614-3834-2_1_Fig8_HTML.gifFig. 1.8
The restricted family of catenaries with envelope C
where
$$\tau (p) =\, \left \{\begin{array}{@{}l@{\quad }l@{}} \infty \quad &\mathrm{if\ }p \geq 0,\\ {t}_{ c}(p)\quad &\mathrm{if\ }p < 0, \end{array} \right.$$then, as with the field of cycloids in the brachistochrone problem, the mapping
$$\begin{array}{rcl} \Xi & : D \rightarrow R, & \\ & \quad (t,p)\mapsto (t,x(t;p)) = \left (t, \frac{{x}_{0}} {\cosh p} \cosh \left (p + \frac{t} {{x}_{0}} \cosh p\right )\right ),&\end{array}$$(1.24)
is a diffeomorphism from D onto the region R above the envelope, and this parameterized family again defines a central field. In fact, we shall see that all trajectories in this field are strong local minima, while curves defined on an interval [0, t 1] with t 1 ≥ t c (p) are not even weak local minima. The time t c (p) is said to be conjugate to t = 0 along the extremal x = x( ⋅; p), and the point (t c (p), x(t c (p); p)) is the conjugate point to the initial point (0, x 0) along this extremal. The curve C, whose precise shape of course depends on the initial condition x 0, is also called the curve of conjugate points for the problem of minimal surfaces.
In the next sections, we develop the tools that allow us to prove these statements about optimality and, more generally, explain the significance of conjugate points for the local optimality of curves in the calculus of variations.
1.4 The Legendre and Jacobi Conditions
If x ∗ is a weak local minimum for problem [CV], then besides being an extremal, by Eq. (1.2), we also have for all functions
$$h \in {C}_{0}^{1}([a,b]) =\{ h \in {C}^{1}([a,b]) :\ h(a) = h(b) = 0\}$$that the second variation is nonnegative,
$${\delta }^{2}I[{x}_{ {_\ast}}](h) ={ \int \nolimits }_{a}^{b}{L}_{ xx}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}}){h}^{2}+2{L}_{\dot{ x}x}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})\dot{h}h+{L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}},\dot{{x}}_{{_\ast}})\dot{{h}}^{2}dt \geq 0.$$This implies the following second-order necessary condition for optimality.
Theorem 1.4.1 (Legendre condition).
If x ∗ is a weak local minimum for problem [CV] (see Fig. 1.9), then
$${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) \geq 0\qquad \mathrm{for\ all\quad }t \in [a,b].$$Proof.
This condition should be clear intuitively. Choosing functions h ∈ C 0 ¹([a, b]) that keep the norm h ∞ small while having large derivatives $${\left \Vert \dot{h}\right \Vert }_{\infty }$$ , it follows that the term multiplying $$\dot{{h}}^{2}$$ will become dominant and thus needs to be nonnegative. We prove this by contradiction. Suppose there exists a time τ ∈ (a, b) at which $${L}_{\dot{x}\dot{x}}(\tau,{x}_{{_\ast}}(\tau ),\dot{{x}}_{{_\ast}}(\tau )) = -2\beta < 0$$ . Choose ε > 0 such that $${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) < -\beta $$ for $$t \in [\tau - \epsilon,\tau + \epsilon ] \subset [a,b]$$ and pick the function
$$h(t) = \left \{\begin{array}{lll} {\sin }^{2}\left (\frac{\pi } {\epsilon } (t - \tau )\right )&\quad &\mathrm{for\ }\mid t - \tau \mid \leq \epsilon,\\ 0 &\quad &\mathrm{otherwise }. \end{array} \right.$$A978-1-4614-3834-2_1_Fig9_HTML.gifFig. 1.9
The variation for the proof of the Legendre condition
Then, since
$$\dot{h}(t) = 2\sin \left (\frac{\pi } {\epsilon } (t - \tau )\right )\cos \left (\frac{\pi } {\epsilon } (t - \tau )\right )\frac{\pi } {\epsilon } = \frac{\pi } {\epsilon }\sin \left (\frac{2\pi } {\epsilon } (t - \tau )\right ),$$we have h ∈ C 0 ¹([a, b]) and
$$\begin{array}{rcl}{ \delta }^{2}I[{x}_{ {_\ast}}](h)& ={ \int \nolimits }_{\tau -\epsilon }^{\tau +\epsilon }{L}_{xx}{(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\sin }^{4}\left (\frac{\pi } {\epsilon } (t - \tau )\right ) & \\ & \quad +{ \int \nolimits }_{\tau -\epsilon }^{\tau +\epsilon }2{L}_{\dot{x}x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)){\frac{\pi } {\epsilon } \sin }^{2}\left (\frac{\pi } {\epsilon } (t - \tau )\right )\sin \left (\frac{2\pi } {\epsilon } (t - \tau )\right )& \\ & \quad +{ \int \nolimits }_{\tau -\epsilon }^{\tau +\epsilon }{L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)){\frac{{\pi }^{2}} {{\epsilon }^{2}} \sin }^{2}\left (\frac{2\pi } {\epsilon } (t - \tau )\right )dt. & \\ \end{array}$$If we now let M and N, respectively, be upper bounds on the absolute values of the continuous functions $${L}_{xx}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ and $${L}_{\dot{x}x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ on the interval [a, b], then we have the contradiction that
$$\begin{array}{rcl}{ \delta }^{2}I[{x}_{ {_\ast}}](h)& \leq 2\epsilon M + 4N\pi - \beta {\int \nolimits }_{\tau -\epsilon }^{\tau +\epsilon }{\frac{{\pi }^{2}} {{\epsilon }^{2}} \sin }{}^{2}\left (\frac{2\pi } {\epsilon } (t - \tau )\right )dt& \\ & = 2\epsilon M + 4N\pi - \beta \frac{\pi } {2\epsilon }{ \int \nolimits }_{-2\pi }^{{2\pi }\sin }{}^{2}\left (s\right )ds & \\ & = 2\epsilon M + 4N\pi - \beta \frac{{\pi }^{2}} {\epsilon } \rightarrow -\infty \qquad \mathrm{as\ }\epsilon \rightarrow 0 +. \end{array}$$Hence $${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ must be nonnegative on the open interval (a, b) and thus, by continuity, also on the compact interval [a, b]. □
Definition 1.4.1 (Legendre conditions).
An extremal x ∗ satisfies the Legendre condition along [a, b] if the function $$t\mapsto {L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ is nonnegative; it satisfies the strengthened Legendre condition if $${L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ is positive on [a, b].
Legendre mistakenly believed that the strengthened version of this condition also would be sufficient for a local minimum. However, considering the geometric properties of the extremals for the minimum surfaces of revolution, this clearly cannot be true, since all extremals for this problem satisfy the strengthened Legendre condition. It was up to Jacobi to rectify Legendre’s argumentation and come up with the correct formulations, now known as the Jacobi conditions, which we develop next. For convenience, throughout the rest of this and the next section, we make the simplifying assumption that the function L is three times continuously differentiable, L ∈ C ³.
Let x ∗ be an extremal that satisfies the strengthened Legendre condition everywhere on [a, b]. Then it follows from Hilbert’s differentiability theorem, Proposition 1.2.1, that x ∗ ∈ C ², and we can integrate the mixed term in the second variation by parts to get
$${\int \nolimits }_{a}^{b}2{L}_{\dot{ x}x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\dot{h}(t)h(t)dt = -{\int \nolimits }_{a}^{b}\left ( \frac{d} {dt}\left [{L}_{\dot{x}x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\right ]\right )h{(t)}^{2}dt.$$This allows us to rewrite the second variation in the simpler form
$${\delta }^{2}I[{x}_{ {_\ast}}](h) ={ \int \nolimits }_{a}^{b}Q(t)h{(t)}^{2} + R(t)\dot{h}{(t)}^{2}dt,$$where
$$Q(t) = {L}_{xx}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) -\left.\left. \frac{d} {dt}\right ({L}_{\dot{x}x}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\right )$$and
$$R(t) = {L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)).$$Under the assumption that L ∈ C ³, it follows that Q is continuous and R is continuously differentiable, Q ∈ C([a, b]) and R ∈ C ¹([a, b]).
Definition 1.4.2 (Quadratic form).
Given a symmetric bilinear form ℬ defined on a real vector space X, ℬ : X ×X → ℝ, (x, y)↦ℬ(x, y), the mapping $$\mathcal{Q} : X \rightarrow \mathbb{R}$$ , $$h\mapsto \mathcal{Q}(h) = \mathcal{B}(h,h)$$ , is called a quadratic form. The quadratic form $$\mathcal{Q}$$ is said to be positive semidefinite if $$\mathcal{Q}(h) \geq 0$$ for all h ∈ X; it is positive definite if it is positive semidefinite and $$\mathcal{Q}(h) = 0$$ holds only for the function $$h \equiv 0$$ . The kernel of a quadratic form, $$\ker \mathcal{Q}$$ , consists of all h ∈ X for which $$\mathcal{Q}(h) = 0$$ . On a normed space X, a quadratic form is said to be strictly positive definite if there exists a positive constant c such that for all h ∈ X,
$$\mathcal{Q}(h) \geq c{\left \Vert h\right \Vert }^{2}.$$We formulate necessary and sufficient conditions for a quadratic form $$\mathcal{Q} :$$ C 0 ¹([a, b]) → ℝ given by
$$\mathcal{Q}(h) ={ \int \nolimits }_{a}^{b}Q(t)h{(t)}^{2} + R(t)\dot{h}{(t)}^{2}dt$$(1.25)
with Q ∈ C([a, b]) and R ∈ C ¹([a, b]) to be positive semidefinite, respectively positive definite, on the space C 0 ¹([a, b]). These are classical results that can be found in most textbooks on the subject. Our presentation here follows the one in Gelfand and Fomin [105]. These positivity conditions then translate into second-order necessary and sufficient conditions for a weak local minimum for problem [CV]. Henceforth $$\mathcal{Q}$$ will always denote this quadratic form defined by Eq. (1.25). Theorem 1.4.1 immediately implies the following condition:
Corollary 1.4.1.
If the quadratic form $$\mathcal{Q}(h)$$ is positive semidefinite over C 0 ¹ ([a,b]), then R(t) is nonnegative on [a,b]. □
Assuming that R(t) is positive on [a, b], Legendre’s idea was to complete the square in the quadratic form $$\mathcal{Q}$$ and write $$\mathcal{Q}$$ as a sum of positive terms. For any differentiable function w ∈ C ¹([a, b]), we have that
$$0 ={ \int \nolimits }_{a}^{b} \frac{d} {dt}(w{h}^{2})dt ={ \int \nolimits }_{a}^{b}\dot{w}{h}^{2} + 2wh\dot{h}\,dt,$$and thus adding this term to the quadratic form gives
$$\begin{array}{rcl} \mathcal{Q}(h)& ={ \int \nolimits }_{a}^{b}(R\dot{{h}}^{2} + 2wh\dot{h} + (Q +\dot{ w}){h}^{2})dt & \\ & ={ \int \nolimits }_{a}^{b}R{\left (\dot{h} + \frac{w} {R}h\right )}^{2} + \left (\dot{w} + Q -\frac{{w}^{2}} {R} \right ){h}^{2}dt.& \\ \end{array}$$If we now choose w as a solution to the differential equation
$$\dot{w} = \frac{{w}^{2}} {R} - Q,$$(1.26)
and if this solution were to exist over the full interval [a, b], then we would have $$\mathcal{Q}(h) \geq 0$$ for all h ∈ C 0 ¹([a, b]) and $$\mathcal{Q}(h) = 0$$ if and only if $$\dot{h} + \frac{w} {R}h = 0$$ . But since h(a) = 0, this is possible only if $$h \equiv 0$$ . Thus, in this case the quadratic functional $$\mathcal{Q}(h)$$ would be positive definite. However, (1.26) is a Riccati differential equation, and while solutions exist locally, in general there is no guarantee that solutions exist over the full interval [a, b]. For example, for R = α and $$Q = -\frac{1} {\alpha }$$ we get $$\dot{w} = \frac{1} {\alpha }\left (1 + {w}^{2}\right )$$ , with general solution $$w(t) =\tan ( \frac{t} {\alpha })$$ , which exists only on intervals of length at most b − a < απ, which can be arbitrarily small. Riccati differential equations arise as differential equations satisfied by quotients of functions that themselves obey linear differential equations. As a consequence of this fact, solutions to a Riccati differential equation can be related to the solutions of a second-order linear differential equation by means of a classical change of variables. In the result below, this connection is used to give a characterization of the escape times of solutions to Riccati equations in terms of the existence of nonvanishing solutions to a corresponding second-order linear differential equation.
Proposition 1.4.1.
Suppose the function R is positive on the interval [a,b]. Then the Riccati differential equation
$$R(Q +\dot{ w}) = {w}^{2}$$has a solution over the full interval [a,b] if and only if there exists a solution to the second-order linear differential equation
$$\frac{d} {dt}\left (R\dot{y}\right ) = Qy$$(1.27)
that does not vanish over [a,b].
Proof.
If there exists a solution y to (1.27) that does not vanish on the interval [a, b], then the function $$w = -\frac{\dot{y}} {y}R$$ is well-defined on [a, b], and it is simply a matter of verification to show that w solves the Riccati equation:
$$\begin{array}{rcl} R(Q +\dot{ w})& = R\left (Q + \frac{-\frac{d} {dt}(\dot{y}R)y+\dot{y}R\dot{y}} {{y}^{2}} \right ) & \\ & = R\left (\frac{Q{y}^{2}-Q{y}^{2}+R\dot{{y}}^{2}} {{y}^{2}} \right ) ={ \left (\frac{R\dot{y}} {y} \right )}^{2} = {w}^{2}.& \\ \end{array}$$Conversely, if w is a solution to the Riccati equation (1.27) that exists on the full interval [a, b], let y be a nontrivial solution to the linear differential equation $$R\dot{y} + wy = 0$$ . Then $$R\dot{y}$$ is continuously differentiable and we have that
$$\frac{d} {dt}(R\dot{y}) = \frac{d} {dt}(-wy) = -w\dot{y} -\dot{ w}y = -w\left (-\frac{wy} {R} \right ) -\left (\frac{{w}^{2}} {R} - Q\right )y = Qy.$$Furthermore, as a nontrivial solution, y does not vanish. □
Note that if y is a solution to Eq. (1.27) that vanishes at some time c, then so is y α(t) = αy(t) for any α ∈ ℝ, α≠0. In particular, to see whether there exist nonvanishing solutions, without loss of generality we may normalize the initial condition on the derivative so that $$\dot{y}(a) = 1$$ . This leads to the following definition.
Definition 1.4.3 (Conjugate points and Jacobi equation).
A time c ∈ (a, b] is said to be conjugate to a if the solution y to the initial value problem
$$\frac{d} {dt}(R\dot{y}) = Qy,\qquad y(a) = 0,\qquad \dot{y}(a) = 1,$$(1.28)
vanishes at c; in this case the point (c, x ∗ (c)) on the reference extremal is called a conjugate point to (a, x ∗ (a)). The equation
$$\frac{d} {dt}(R\dot{y}) = Qy$$(1.29)
is called the Jacobi equation.
We now show that the absence of conjugate points on the interval (a, b] is equivalent to the quadratic form $$\mathcal{Q}(h)$$ being positive definite.
Theorem 1.4.2.
Let x ∗ be an extremal for which the strengthened Legendre condition is satisfied on the interval [a,b], i.e.,
$$R(t) = {L}_{\dot{x}\dot{x}}(t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) > 0\qquad \mathrm{for\ all\quad }t \in [a,b].$$Then the quadratic functional
$$\mathcal{Q}(h) ={ \int \nolimits }_{a}^{b}R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}dt$$is positive definite for h ∈ C 0 ¹ ([a,b]) if and only if the interval (a,b] contains no time conjugate to a.
Proof.
Essentially, the sufficiency of this condition has already been shown: If there exists no time conjugate to a in (a, b], then, since the solution to the Jacobi equation depends continuously on the initial time, for ε > 0 sufficiently small, the solution to the initial value problem
$$\frac{d} {dt}(R\dot{y}) = Qy,\qquad y(a - \epsilon ) = 0,\qquad \dot{y}(a - \epsilon ) = 1,$$still exists and does not vanish on the interval (a − ε, b]. Hence, by Proposition 1.4.1, there exists a solution to the Riccati equation $$R(Q +\dot{ w}) = {w}^{2}$$ over the full interval [a, b] and thus
$$\mathcal{Q}(h) ={ \int \nolimits }_{a}^{b}R{\left (\dot{h} + \frac{w} {R}h\right )}^{2} \geq 0,$$with equality if and only if $$\dot{h} + \frac{w} {R}h = 0$$ , i.e., $$h \equiv 0$$ .
The condition about the nonexistence of conjugate times is also necessary for $$\mathcal{Q}(h)$$ to be positive definite: Consider the convex combination of the quadratic form $$\mathcal{Q}(h)$$ and the quadratic form given by $${\int \nolimits \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt$$ , i.e., set
$${\mathcal{Q}}_{s}(h) = s{\int \nolimits }_{a}^{b}\left (R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}\right )dt + (1 - s){\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt.$$(1.30)
Note that the quadratic form $${\mathcal{Q}}_{0}(h) ={ \int \nolimits \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt$$ is positive definite on C 0 ¹([a, b]). For if $${\mathcal{Q}}_{0}(h) = 0$$ , then it follows that h is constant, and thus $$h \equiv 0$$ by the boundary conditions. Since both quadratic forms $${\mathcal{Q}}_{1}(h)$$ and $${\mathcal{Q}}_{0}(h)$$ are positive definite on C 0 ¹([a, b]), it follows that the convex combination $${\mathcal{Q}}_{s}(h)$$ is positive definite for all s ∈ [0, 1].
Let y : [0, 1] ×[a, b] → ℝ denote the solutions to the corresponding Jacobi equation (1.29) given by
$$\frac{d} {dt}(\left [sR + (1 - s)\right ]\dot{y}) = sQy,\qquad y(s,a) = 0,\qquad \dot{y}(s,a) = 1.$$For s = 0, we have the trivial equation $$\ddot{y}=0$$ and thus $$y(0,t) = t - a$$ , i.e., there exists no time conjugate to a. We now show that this property is preserved along the convex combination, and thus also for s = 1 there does not exist a time conjugate to a in (a, b].
A978-1-4614-3834-2_1_Fig10_HTML.gifFig. 1.10
The zero set $$\mathcal{Z}$$
Consider the zero set $$\mathcal{Z}$$ of y away from the trivial portion for t = a (see Fig. 1.10),
$$\mathcal{Z} =\{ (s,t) \in [0,1] \times (a,b] : y(s,t) = 0\}.$$Since $$\dot{y}(s,a) = 1$$ , there exists an ε > 0 such that we actually have
$$\mathcal{Z}\subset [0,1] \times [a + \epsilon,b].$$(For every s ∈ [0, 1] there exists a neighborhood U s of (s, a) in [0, 1] ×[a, b] such that y(s, t) > 0 for (s, t) ∈ U s and t > 0. By compactness we can choose a uniform bound ε > 0.) If $$({s}_{0},{t}_{0}) \in \mathcal{Z}$$ , then $$\dot{y}({s}_{0,}{t}_{0})$$ cannot vanish, since otherwise y(s, ⋅) vanishes identically in t as a solution to a second-order linear differential equation that vanishes with its derivative at t 0. But this is not possible, since $$\dot{y}(s,a) = 1$$ . Hence, by the implicit function theorem, the equation y(s, t) = 0 can locally be solved for t with a continuously differentiable function t = t(s) near any point $$({s}_{0,}{t}_{0}) \in \mathcal{Z}$$ . Thus, if $$({s}_{0,}{t}_{0}) \in \mathcal{Z}$$ , then it follows that there exists a curve $$C \subset \mathcal{Z}$$ that passes through (s 0, t 0) and can be described as the graph of some function ς : I → [a, b], s↦ς(s), defined on some maximal interval I ⊂ [0, 1]. But this interval I must be all of [0, 1]. For we always have ς(s) ≥ a + ε, and thus C cannot leave the set [0, 1] ×[a, b] through t = a. But it cannot escape through t = b either. For if for some $$\bar{s} \in [0,1]$$ we have $$y(\bar{s},b) = 0$$ , then the function h defined by $$h(t) = y(\bar{s},t)$$ lies in C 0 ¹([a, b]), and by integration by parts we have that
$$\begin{array}{rcl} & {\int \nolimits }_{a}^{b}\left [sR(t) + (1 - s)\right ]\dot{h}{(t)}^{2} + sQ(t)h{(t)}^{2}dt & \\ & \qquad ={ \int \nolimits }_{a}^{b}\left (-\frac{d} {dt}\left [\left (sR(t) + (1 - s)\right )\dot{h}(t)\right ] + sQ(t)h(t)\right )h(t)\,dt = 0.& \\ \end{array}$$Since $$\dot{h}(a) = 1$$ , $$h(\cdot ) = y(\bar{s},\cdot )$$ is not identically zero, contradicting the fact that $${\mathcal{Q}}_{s}(h)$$ is positive definite. But then the curve C must extend over the full interval I = [0, 1] with both ς(0) and ς(1) taking values in the open interval (a, b). But for s = 0 we have $$y(0,t) = t - a$$ , and thus $$\mathcal{Z}$$ cannot intersect the segment {0} ×[a + ε, b]. Contradiction. Thus the set $$\mathcal{Z}$$ is empty. □
We separately still formulate the following fact, which was shown in the proof of Theorem 1.4.2.
Corollary 1.4.2.
If h ∈ C 0 ¹ ([a,b]) is a solution to the Jacobi equation, then $$h \in \ker \mathcal{Q}$$ , i.e.,
$$\mathcal{Q}(h) ={ \int \nolimits }_{a}^{b}R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}dt = 0.$$□
This fact, coupled with the preceding proof, also allows us to give the desired second-order necessary condition for optimality of extremals:
Theorem 1.4.3.
If R(t) > 0 and the quadratic functional
$$\mathcal{Q}(h) ={ \int \nolimits }_{a}^{b}R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}dt$$is positive semidefinite over C 0 ¹ ([a,b]), then the open interval (a,b) contains no time conjugate to a.
Proof.
In this case, the convex combination $${\mathcal{Q}}_{s}(h)$$ defined in Eq. (1.30) is still positive definite for s ∈ [0, 1), and thus for these values there does not exist a time c conjugate to a in (a, b]. In particular, the set $$\mathcal{Z}$$ therefore does not intersect [0, 1) ×(a, b]. If there were to exist a time c ∈ (a, b) such that $$(1,c) \in \mathcal{Z}$$ , then, since $$\dot{y}(1,c)\neq 0$$ , there would exist a differentiable curve in $$\mathcal{Z}$$ that starts at the point (1, c) and takes values in (a, b). Specifically, there would exist a differentiable function ς defined over some interval [1 − ε, 1], ς : [1 − ε, 1] → (a, b), s↦ς(s), such that $$(s,\varsigma (s)) \in \mathcal{Z}$$ . Contradiction. Note, however, that if $$\mathcal{Q}(h)$$ is only positive semidefinite, then it is possible that b is conjugate to a and this gives rise to a conjugate point (b, x ∗ (b)). This happens if the Jacobi equation has a nontrivial solution that vanishes at t = b. □
We summarize the necessary conditions for a weak minimum:
Corollary 1.4.3 (Necessary conditions for a weak local minimum).
Suppose x ∗ : [a,b] → ℝ is a weak local minimum for problem [CV] . Then
1.
x ∗ is an extremal, i.e., satisfies the Euler–Lagrange equation
$$\frac{d} {dt}\left (\frac{\partial L} {\partial \dot{x}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\right ) = \frac{\partial L} {\partial x} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t));$$2.
x ∗ satisfies the Legendre condition, i.e.,
$$\frac{{\partial }^{2}L} {\partial \dot{{x}}^{2}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) \geq 0\qquad \mathrm{for\ all\quad }t \in [a,b];$$3.
if x ∗ satisfies the strengthened Legendre condition, i.e.,
$$\frac{{\partial }^{2}L} {\partial \dot{{x}}^{2}} (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) > 0\qquad \mathrm{for\ all\quad }t \in [a,b],$$then the open interval (a,b) contains no times conjugate to a.
Definition 1.4.4 (Jacobi condition).
An extremal x ∗ that satisfies the strengthened Legendre condition over the interval [a, b] satisfies the Jacobi condition if the open interval (a, b) contains no time conjugate to a; if the half-open interval (a, b] contains no time conjugate to a, the strengthened Jacobi condition is satisfied.
Theorem 1.4.4 (Sufficient conditions for a weak local minimum).
An extremal x ∗ : [a,b] → ℝ that satisfies the strengthened Legendre and Jacobi conditions is a weak local minimum for problem [CV].
Proof.
Let μ = min[a, b] R(t) > 0 and for α ∈ [0, μ) consider the quadratic form
$${\mathcal{Q}}_{\alpha }(h) ={ \int \nolimits }_{a}^{b}R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}dt - \alpha {\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt$$with corresponding Jacobi equation
$$\frac{d} {dt}\left [(R(t) - \alpha )\dot{y}(t)\right ] = Q(t)y(t),\qquad y(a) = 0,\qquad \dot{y}(a) = 1.$$The solutions y α(t) to this initial value problem depend continuously on the parameter α, and by assumption no conjugate time c to a exists in the interval (a, b] for α = 0. Hence it follows that the solution y α(t) does not vanish in the interval (a, b] for sufficiently small α > 0, and thus, by Theorem 1.4.2, the quadratic form $${\mathcal{Q}}_{\alpha }(h)$$ is positive definite on C 0 ¹([a, b]), i.e., for all h ∈ C 0 ¹([a, b]), h≠0, we have that
$${\int \nolimits }_{a}^{b}R(t)\dot{h}{(t)}^{2} + Q(t)h{(t)}^{2}dt > \alpha {\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt.$$This relation allows us to conclude that x ∗ is a weak local minimum: Let h ∈ C 0 ¹([a, b]) with small norm $${\left \Vert h\right \Vert }_{D} ={ \left \Vert h\right \Vert }_{\infty } +{ \left \Vert \dot{h}\right \Vert }_{\infty }$$ . Using Taylor’s theorem, the value I[x ∗ + h] of the objective can be expressed in the form
$$I[{x}_{{_\ast}} + h] = I[{x}_{{_\ast}}] + \delta I[{x}_{{_\ast}}](h) + \frac{1} {2}{\delta }^{2}I[{x}_{ {_\ast}}](h) + r({x}_{{_\ast}};h)$$with r(x ∗ ; h) denoting the remainder. Since we are assuming that L is three times continuously differentiable, the remainder can be expressed as a sum of bounded terms multiplying a cubic expression in h and $$\dot{h}$$ . This implies that r(x ∗ ; h) can be written in the form
$$r({x}_{{_\ast}};h) ={ \int \nolimits }_{a}^{b}\left (\xi (t)h{(t)}^{2} + \rho (t)\dot{h}{(t)}^{2}\right )dt$$and the terms ξ and ρ are of order $$O({\left \Vert h\right \Vert }_{D})$$ , i.e., can be bounded by $$C{\left \Vert h\right \Vert }_{D}$$ for some positive constant C. It follows from Hölder’s inequality (see Proposition D.3.1) that
$${h}^{2}(t) ={ \left ({\int \nolimits }_{a}^{t}\dot{h}(s)ds\right )}^{2} \leq (t - a){\int \nolimits }_{a}^{t}\dot{h}{(s)}^{2}ds \leq (t - a){\int \nolimits }_{a}^{b}\dot{{h}}^{2}(s)ds$$and thus
$${\int \nolimits }_{a}^{b}{h}^{2}(t)dt \leq \frac{1} {2}{(b - a)}^{2}{ \int \nolimits }_{a}^{b}\dot{{h}}^{2}(s)ds.$$Hence
$$\left \vert r({x}_{{_\ast}},h)\right \vert \leq C(1 + \frac{1} {2}{(b - a)}^{2})\left ({\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt\right ){\left \Vert h\right \Vert }_{ D}.$$Since x ∗ is an extremal, we have δI[x ∗ ](h) = 0, and from the calculation above, the second variation can be bounded below as
$${\delta }^{2}I[{x}_{ {_\ast}}](h) > \alpha {\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt.$$Thus, overall, we have that
$$\begin{array}{rcl} I[{x}_{{_\ast}} + h]& > I[{x}_{{_\ast}}] + \frac{1} {2}\alpha {\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt - C\left (1 + \frac{1} {2}{(b - a)}^{2}\right )\left ({\int \nolimits }_{ a}^{b}\dot{h}{(t)}^{2}dt\right ){\left \Vert h\right \Vert }_{ D}& \\ & = I[{x}_{{_\ast}}] + \left (\frac{1} {2}\alpha - C(1 + \frac{1} {2}{(b - a)}^{2}){\left \Vert h\right \Vert }_{ D}\right )\left ({\int \nolimits }_{a}^{b}\dot{h}{(t)}^{2}dt\right ) & \\ & > I[{x}_{{_\ast}}] & \\ \end{array}$$for sufficiently small h D . Hence x ∗ is a weak local minimum. □
1.5 The Geometry of Conjugate Points and Envelopes
The Jacobi equation (1.29) provides a simple and efficient means to calculate conjugate points numerically. As an example, consider the problem of minimum surfaces of revolution. In this case,
$$L(x,\dot{x}) = x\sqrt{1 +\dot{ {x}}^{2}},$$and thus
$$\frac{\partial L} {\partial \dot{x}} (x,\dot{x}) = x \frac{\dot{x}} {\sqrt{1 +\dot{ {x}}^{2}}},\qquad \frac{{\partial }^{2}L} {\partial x\partial \dot{x}}(x,\dot{x}) = \frac{\dot{x}} {\sqrt{1 +\dot{ {x}}^{2}}},\qquad \frac{{\partial }^{2}L} {\partial \dot{{x}}^{2}} (x,\dot{x}) = \frac{x} {{\left (1 +\dot{ {x}}^{2}\right )}^{\frac{3} {2} }}.$$Along the extremal $${x}_{{_\ast}}(t) = \beta \cosh \left (\frac{t-\alpha } {\beta } \right )$$ , t ≥ 0, we have that
$$R(t) = \frac{{\partial }^{2}L} {\partial \dot{{x}}^{2}} ({x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) = \frac{{x}_{{_\ast}}(t)} {{\left (1 +\dot{ {x}}_{{_\ast}}{(t)}^{2}\right )}^{\frac{3} {2} }} ={ \frac{\beta } {\cosh }^{2}\left (\frac{t-\alpha } {\beta } \right )}$$and
$$\begin{array}{rcl} Q(t)& = \frac{{\partial }^{2}L} {\partial {x}^{2}} ({x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) - \frac{d} {dt}\left ( \frac{{\partial }^{2}L} {\partial x\partial \dot{x}}({x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))\right ) & \\ & = - \frac{{\partial }^{3}L} {\partial x\partial \dot{{x}}^{2}} ({x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t)) \cdot \ddot{ {x}}_{{_\ast}}(t) = - \frac{\ddot{{x}}_{{_\ast}}(t)} {{\left (1+\dot{{x}}_{{_\ast}}{(t)}^{2}\right )}^{\frac{3} {2} }} & \\ & = -{\frac{\frac{1} {\beta }\cosh \left (\frac{t-\alpha } {\beta } \right )} {\cosh }^{3}\left (\frac{t-\alpha } {\beta } \right )} = -\frac{1} {\beta }{ \frac{1} {\cosh }^{2}\left (\frac{t-\alpha } {\beta } \right )}. & \\ \end{array}$$Thus, in differentiated form, the Jacobi equation becomes
$$R(t)\ddot{u} +\dot{ R}(t)\dot{u}(t) = Q(t)u$$with
$$\dot{R}(t) = -2{ \frac{\sinh \left (\frac{t-\alpha } {\beta } \right )} {\cosh }^{3}\left (\frac{t-\alpha } {\beta } \right )}.$$Multiplying all terms by $$\frac{{\beta }^{2}} {R(t)} = {\beta \cosh }^{2}\left (\frac{t-\alpha } {\beta } \right )$$ , we obtain the following equivalent formulation of the Jacobi equation:
$${\beta }^{2}\ddot{y} - 2\beta \tanh \left (\frac{t - \alpha } {\beta } \right )\dot{y} + y = 0,\qquad y(0) = 0,\quad \dot{y}(0) = 1.$$Since we are interested only in the zeros of nontrivial solutions to the Jacobi equation, we can arbitrarily normalize the initial condition for the derivative, and thus there is no need to multiply by $$\frac{1} {R(t)}$$ . This equation is easily solved numerically.
Figure 1.11 shows the solution of the Jacobi equation for the extremal with values $$\alpha = -p\beta $$ and $$\beta = \frac{{x}_{0}} {\cosh p}$$ for x 0 = 1 and $$p = -1.2$$ as a dashed curve and the extremal as a solid curve. Note that the zero of the Jacobi equation exactly identifies the point of tangency between the extremal and the envelope of the family of catenaries as the conjugate point. This geometric feature is a general property of certain conjugate points and is related to the fact that the Jacobi equation is the variational equation of the Euler–Lagrange equation. We now develop these geometric properties.
A978-1-4614-3834-2_1_Fig11_HTML.gifFig. 1.11
Solution to the Jacobi equation
We henceforth assume that L ∈ C ³ and that the strengthened Legendre condition is satisfied along an extremal x ∗ . Under these conditions, the extremal x ∗ can be embedded into a 1-parameter family of extremals: it follows from Hilbert’s differentiability theorem that x ∗ ∈ C ²([a, b]), and the Euler–Lagrange equation can be rewritten in the form
$${L}_{\dot{x}t} + {L}_{\dot{x}x}\dot{x} + {L}_{\dot{x}\dot{x}}\ddot{x} = {L}_{x}.$$This is a regular second-order ordinary differential equation with continuous coefficients which, in a neighborhood of the reference extremal $$t\mapsto (t,{x}_{{_\ast}}(t),\dot{{x}}_{{_\ast}}(t))$$ , can be written in the form
$$\ddot{x} = F(t,x,\dot{x})$$with a continuously differentiable function F. For p in some neighborhood ( − ε, ε) of 0, there exists a solution x = x(t; p) to this equation for the initial conditions
$$x(a;p) = {x}_{{_\ast}}(a) = A\quad \mathrm{and}\quad \dot{x}(a;p) =\dot{ {x}}_{{_\ast}}(a) + p.$$For p = 0, this is the reference extremal x ∗ defined over [a, b]. Again, without loss of generality we may assume that x ∗ is defined on some open interval containing [a, b], and thus by continuous dependence of the solutions of an ordinary differential equation on initial conditions and parameters, for ε > 0 small enough all the solutions x( ⋅; p) will exist on some open interval containing [a, b]. Under our assumptions, the solutions are also continuously differentiable functions of the initial conditions. Thus x(t; p) is continuously differentiable in both t and p, and the partial derivative with respect to the parameter p can be computed as the solution to the corresponding variational equation. While this construction of a parameterized family that contains the reference extremal is canonical for the problem [CV], sometimes (like in the case of minimum surfaces of revolution) other parameterizations may be preferred or may be more natural. We thus, more generally, formalize the construction in the definition below.
Definition 1.5.1 (Parameterized family of extremals for problem [CV]).
A C ¹-parameterized family ℰ of extremals for problem [CV] consists of a family x = x( ⋅; p) of solutions to the Euler–Lagrange equation defined over some domain D = { (t, p) : p ∈ P, a ≤ t < t f (p)} with P an open interval parameterizing the extremals and t f : P → ℝ, p↦t f (p), a function determining the interval of definition of the extremal x( ⋅; p), so that the following properties hold:
1.
for all t ∈ [a, b], x(t; 0) ≡ x ∗ (t);
2.
all extremals in the family satisfy the initial conditions
$$x(a;p) = A\quad \mathrm{and}\quad \frac{\partial \dot{x}} {\partial p}(a;p)\neq 0\quad \mathrm{for\ all}\quad p \in P;$$(1.31)
3.
the partial derivatives of x with respect to the parameter p exist, are continuous as functions of both variables t and p, and satisfy
$$\frac{d} {dt}\left (\frac{\partial x} {\partial p}(t;p)\right ) = \frac{\partial } {\partial p}\left (\frac{\partial x} {\partial t} (t;p)\right ) = \frac{\partial \dot{x}} {\partial p}(t;p).$$(1.32)
Any member x( ⋅; p) of this family is said to be