Partial Differential Equations in Applied Mathematics: Shawn Koohy, Guangming Yao, Kalani Rubasinghe
Partial Differential Equations in Applied Mathematics: Shawn Koohy, Guangming Yao, Kalani Rubasinghe
Partial Differential Equations in Applied Mathematics: Shawn Koohy, Guangming Yao, Kalani Rubasinghe
✩ This work was supported by NSA, USA H89230-22-1-0008: Summer Research Experience for Undergraduates in Math.
∗ Corresponding author.
E-mail address: gyao@clarkson.edu (G. Yao).
https://doi.org/10.1016/j.padiff.2023.100499
Received 15 November 2022; Received in revised form 7 February 2023; Accepted 7 February 2023
2666-8181/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
showed the abilities that neural networks possess towards breaking this We will next review differentiation under Brownian motion, which
curse for data analysis3,26,27 and for solving differential equations28–30 . is the Itô’s formula. Our goal is to show that the solution to an SDE
Our goal in this paper is to find an approximation of 𝑢(𝑡, 𝑥) for can help us find the solution 𝑢 to (1.1). In the case of (1.1), the Itô’s
various 𝑡 and 𝑥 using machine learning and neural networks for semi- formula takes the following form:
linear parabolic PDEs in low and high-dimensional spaces, including
traditional one- to three-dimensional spaces, and even a test five- Theorem 2.1 (Itô’s Formula33 ). Suppose that 𝑋(𝑡) has a stochastic
dimensional spaces PDE and the Allen–Cahn equations in low and high differential
dimensional spaces including sixty-dimensional spaces. With the rela-
𝑑𝑋𝑡 = 𝜇𝑑𝑡 + 𝜎𝑑𝑊𝑡 ,
tively new re-emergence of machine learning and its grand amount of
usefulness along with versatility, combining SDEs and neural networks For 𝜇 ∈ 𝐿1 , 𝜎 ∈ 𝐿2 . Assume 𝑢(𝑡, 𝑥) ∶ 𝐑 × [𝑜, 𝑇 ] → 𝐑 is a continuous and
2
gives us a great approach to solving low and high-dimensional PDEs31 . that 𝜕𝑢 , 𝜕𝑢 , 𝜕𝑢 exist and are continuous. Then 𝑌𝑡 = 𝑢(𝑡, 𝑋𝑡 ) is again an Itô
𝜕𝑡 𝜕𝑥 𝜕𝑥2
In Section 2, we reviewed Brownian motion and Ito’s Lemma followed a process and has the stochastic differential form:
proof of solutions to the BSDE is equivalent to a solution to the original
PDEs. Before we use BSDE for problems with unknown analytical 𝑑𝑌𝑡 = 𝑑𝑢(𝑡, 𝑋𝑡 )
solutions, we examined the BSDE solver algorithm on a benchmark 𝜕𝑢 𝜕𝑢 1 𝜕2 𝑢
= 𝑑𝑡 + 𝑑𝑋𝑡 + ⋅ (𝑑𝑋𝑡 )2
diffusion–reaction equation in 5D. A 5D test problem was created to 𝜕𝑡 𝜕𝑥 2 𝜕𝑥2
test the accuracy of the algorithm and to understand key parameters 𝜕𝑢 𝜕𝑢 1 𝜕2 𝑢 2
= 𝑑𝑡 + 𝑑𝑋𝑡 + 𝜎 𝑑𝑡
in the neural network, such as the number of layers, learning rate, 𝜕𝑡 𝜕𝑥 2 𝜕𝑥2
initial guess, and activation functions. We discovered the algorithm 𝜕𝑢 𝜕𝑢 𝜕𝑢 1 𝜕2 𝑢 2
= 𝑑𝑡 + 𝜇𝑑𝑡 + + 𝜎𝑑𝑊𝑡 + 𝜎 𝑑𝑡.
is flexible with respect to the initial guesses. It converges to desired 𝜕𝑡 𝜕𝑥 𝜕𝑥 2 𝜕𝑥2
solutions fairly quickly and the learning rate adjustment can improve □
the algorithm’s efficiency. We analyzed how the network’s solution be-
haves at different final times, along with using different initial starting Theorem 2.2 (Multidimensional Itô’s Formula33 ). For a vector 𝜇 and
points, activation functions, and heavy learning rate manipulation in matrix 𝜎, let 𝑋𝑡 = (𝑋𝑡1 , 𝑋𝑡2 , … , 𝑋𝑡𝑑 )𝑇 be a vector of Itô processes such that
Section 4. Section 5 will be dealing with various Allen-Chan equations
𝑑𝑋𝑡 = 𝜇 𝑑𝑡 + 𝜎 𝑑𝑊𝑡 .
in low and high-dimensional spaces with various orders of potential
functions. With these equations, we present their solutions at different Then
points in time and space. Specifically for the Allen–Cahn equation, we 𝜕𝑢 1( )𝑇 ( )
𝑑𝑢(𝑡, 𝑋𝑡 ) = 𝑑𝑡 + (∇𝑢)𝑇 𝑑𝑋𝑡 + 𝑑𝑋𝑡 𝐇𝑥 𝑢 𝑑𝑋𝑡 ,
explore solutions using varying parameters, the interaction length 𝜖, 𝜕𝑡 2
{ )]} (2.3)
and the order of potential functions 𝛼, in 1D, 3D, and 60D. A short 𝜕𝑢 1 [ (
= + (∇𝑢)𝑇 𝜇 + Tr 𝜎 𝑇 𝜎 𝐇𝑥 𝑢 𝑑𝑡 + (∇𝑢)𝑇 𝜎 𝑑𝑊𝑡
conclusion is drawn in Section 6. 𝜕𝑡 2
where ∇𝑢 is the gradient of 𝑢 w.r.t. 𝑥, 𝐇𝑥 𝑢 is the Hessian matrix of 𝑢 w.r.t.
2. Brownian motion and SDE 𝑥, and 𝑇 𝑟 is the trace operator. □
A stochastic process is a collection of random variables indexed by Theorem 2.3 (31 ). The semilinear parabolic differential Eq. (1.1) has
time. It can be provided in two ways, through discrete or continuous a solution 𝑢(𝑡, 𝑥) if and only if 𝑢(𝑡, 𝑥) satisfies the following backward
time giving us {𝑋0 , 𝑋1 , … , 𝑋𝑡 } or {𝑋𝑡 }𝑡≥0 , respectively. As an alternate stochastic differential equation (BSDE)
definition, a stochastic process can be thought of as a probability 𝑢(𝑡, 𝑋𝑡 ) − 𝑢(0, 𝑋0 )
distribution over a space of paths. 𝑡 ( )
We now introduce what is known as Brownian motion, sometimes =− 𝑓 𝑠, 𝑋𝑠 , 𝑢(𝑠, 𝑋𝑠 ), 𝜎 𝑇 (𝑠, 𝑋𝑠 )∇𝑢(𝑠, 𝑋𝑠 ) 𝑑𝑠
∫0 □ (2.4)
referred to as a Wiener process. We denote Brownian motion by 𝑊𝑡 . 𝑡
𝑇
+ [∇𝑢(𝑠, 𝑋𝑠 )] 𝜎(𝑠, 𝑋𝑠 )𝑑𝑊𝑠 .
∫0
Definition 2.1 (32 ). A real-valued stochastic process 𝑊 (⋅) is called a
Brownian motion (or Wiener process) if
Proof. For our simplicity, we rewrite (2.4) as follows:
(i) 𝑊 (0) = 0 a.s.,c 𝑡 𝑡
(ii) 𝑊 (𝑡) − 𝑊 (𝑠) is 𝑁(0, 𝑡 − 𝑠) for all 𝑡 ≥ 𝑠 ≥ 0, 𝑢(𝑡, 𝑋𝑡 ) − 𝑢(0, 𝑋0 ) = − 𝑓 𝑑𝑠 + [∇𝑢(𝑠, 𝑋𝑠 )]𝑇 𝜎(𝑠, 𝑋𝑠 )𝑑𝑊𝑠 . (2.5)
∫0 ∫0
(iii) for all times 0 < 𝑡1 < 𝑡2 < ⋯ < 𝑡𝑛 , the random vari-
ables 𝑊 (𝑡1 ), 𝑊 (𝑡2 ) − 𝑊 (𝑡1 ), … , 𝑊 (𝑡𝑛 ) − 𝑊 (𝑡𝑛−1 ) are independent Let
(‘‘independent increments’’). 𝑇 𝑇
𝑌𝑡 = 𝑔(𝑋𝑇 ) + 𝑓 (𝑠, 𝑋𝑠 , 𝑌𝑠 , 𝑍𝑠 )𝑑𝑠 − (𝑍𝑡 )𝑇 𝑑𝑊𝑠 (2.6)
∫𝑡 ∫𝑡
Note that 𝑁(𝜇, 𝜎 2 ) represents a normal distribution with mean 𝜇 and
standard deviation 𝜎. 𝑍𝑡 = 𝜎 𝑇 (𝑡, 𝑋𝑡 )∇𝑢(𝑡, 𝑋𝑡 ). (2.7)
Then
Definition 2.2. For a vector 𝜇 ∈ 𝐿1 and a matrix 𝜎 ∈ 𝐿2 , 𝑋(𝑡) is an
Itô’s process if 𝑑𝑌𝑡 = −𝑓 (𝑡, 𝑋𝑡 , 𝑌𝑡 , 𝑍𝑡 )𝑑𝑡 + (𝑍𝑠 )𝑇 𝑑𝑊𝑡 (2.8)
𝑡 𝑡
𝑋𝑡 = 𝑋0 + 𝜇𝑑𝑠 + 𝜎𝑑𝑊 . (2.1) In addition, by Itô’s formula, we have that
∫0 ∫0 { }
1
𝑑(𝑢(𝑡, 𝑋𝑡 )) = 𝑢𝑡 + ∇𝑢 ⋅ 𝜇 + 𝑇 𝑟(𝜎𝜎 𝑇 𝐇𝑥 𝑢) 𝑑𝑡 + [∇𝑢]𝑇 𝜎𝑑𝑊 . (2.9)
We say that 𝑋(𝑡) has a stochastic differential form 2
𝑑𝑋𝑡 = 𝜇𝑑𝑡 + 𝜎𝑑𝑊𝑡 (2.2) • If 𝑢(𝑡, 𝑥) is a solution to the semilinear parabolic (1.1),
1
for 0 ≤ 𝑡 ≤ 𝑇 . 𝑢𝑡 + ∇𝑢 ⋅ 𝜇 + 𝑇 𝑟(𝜎𝜎 𝑇 𝐇𝑥 𝑢) = −𝑓 (𝑡, 𝑋𝑡 , 𝑌𝑡 , 𝑍𝑡 ).
2
Thus,
c
A property which is true except for an event of probability zero is said to
hold almost surely (usually abbreviated ‘‘a.s’’). 𝑑(𝑢(𝑡, 𝑋𝑡 )) = − 𝑓 (𝑡, 𝑋𝑡 , 𝑌𝑡 , 𝑍𝑡 )𝑑𝑡 + [∇𝑢]𝑇 𝜎𝑑𝑊𝑡 . (2.10)
2
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Note the definitions of 𝑌𝑡 and 𝑍𝑡 , This is a time-discretized version of (2.4). By using this equation,
𝑇 when given the terminal condition, we can propagate in time provided
𝑑(𝑢(𝑡, 𝑋𝑡 )) = −𝑓 (𝑡, 𝑋𝑡 , 𝑌𝑡 , 𝑍𝑡 )𝑑𝑡 + (𝑍𝑡 ) 𝑑𝑊𝑡 = 𝑑𝑌𝑡 , (2.11)
that ∇𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ) can be approximated numerically in each time step.
Thus, Approximation to ∇𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ) at each time step is done by approxi-
mating the function
𝑇 𝑇
𝑢(𝑡, 𝑋𝑡 ) = 𝑌𝑡 = 𝑔(𝑋𝑇 ) + 𝑓 (𝑠, 𝑋𝑠 , 𝑌𝑠 , 𝑍𝑠 )𝑑𝑠 − (𝑍𝑡 )𝑇 𝑑𝑊𝑠 , 𝑥 → 𝜎 𝑇 (𝑡, 𝑥)∇𝑢(𝑡, 𝑥) (3.3)
∫𝑡 ∫𝑡
𝑇 𝑇
𝑢(0, 𝑋0 ) = 𝑌0 = 𝑔(𝑋𝑇 ) + 𝑓 (𝑠, 𝑋𝑠 , 𝑌𝑠 , 𝑍𝑠 )𝑑𝑠 − (𝑍𝑡 )𝑇 𝑑𝑊𝑠 . at each time step 𝑡 = 𝑡𝑛 by a multilayer feed-forward neural network:
∫0 ∫0
𝜎 𝑇 (𝑡𝑛 , 𝑋𝑛 )∇𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ) = (𝜎 𝑇 ∇𝑢)(𝑡𝑛 , 𝑋𝑛 ) = (𝜎 𝑇 ∇𝑢)(𝑡𝑛 , 𝑋𝑡𝑛 |𝜃𝑛 ) (3.4)
(2.12)
for 𝑛 = 1, … , 𝑁 − 1, where 𝜃𝑛 represents parameters of the neural
Therefore,
network approximating 𝜎 𝑇 (𝑡, 𝑥)∇𝑢(𝑡, 𝑥) at 𝑡 = 𝑡𝑛 . We associate a sub-
𝑢(𝑡, 𝑋𝑡 ) − 𝑢(0, 𝑋0 ) network, at each time step 𝑡𝑛 . We stack all these sub-networks together
𝑡 to form a deep neural network as a whole. This network takes the
=− 𝑓 (𝑠, 𝑋𝑠 , 𝑢(𝑠, 𝑋𝑠 ), 𝜎 𝑇 (𝑠, 𝑋𝑠 )∇𝑢(𝑠, 𝑋𝑠 ))𝑑𝑠
∫0 paths {𝑋𝑡𝑛 }0≤𝑛≤𝑁 and {𝑊𝑡𝑛 }0≤𝑛≤𝑁 as the input data and gives the final
𝑡 output, denoted by 𝑢({𝑋̂ 𝑡𝑛 }0≤𝑛≤𝑁 , {𝑊𝑡𝑛 }0≤𝑛≤𝑁 ), as an approximation to
+ [∇𝑢(𝑠, 𝑋𝑠 )]𝑇 𝜎(𝑠, 𝑋𝑠 )𝑑𝑊𝑠 𝑢(𝑡𝑁 , 𝑋𝑡𝑁 ).
∫0
𝑡 𝑡 The error, which is the difference between the approximation and
=− 𝑓 𝑑𝑠 + (𝑍𝑡 )𝑇 𝑑𝑊𝑠 the given terminal condition 𝑔(𝑥) = 𝑢(𝑇 , 𝑥), defines the loss function as
∫0 ∫0
follows,
This is (2.4). [ ( ) ]
• If 𝑢(𝑡, 𝑋𝑡 ) is a solution of (2.4), then 𝑙(𝜃) = E |𝑔(𝑋𝑡𝑛 ) − 𝑢̂ {𝑋𝑡𝑛 }0≤𝑛≤𝑁 , {𝑊𝑡𝑛 }0≤𝑛≤𝑁 |2 (3.5)
𝑡 𝑡
𝑢(𝑡, 𝑋𝑡 ) =𝑢(0, 𝑋0 ) − 𝑓 𝑑𝑠 + (𝑍𝑡 )𝑇 𝑑𝑊𝑠 , where 𝜃 is the total set of parameters. We refer to18 for more de-
∫0 ∫0 (2.13) tails on the neural network, where the authors presented a neural
𝑑𝑢(𝑡, 𝑋𝑡 ) = − 𝑓 𝑑𝑡 + (𝑍𝑡 )𝑇 𝑑𝑊𝑡 . network architecture as shown in Fig. 1. The network employs 𝑁 −
Thus, by (2.6) we have that 1 fully-connected feed-forward neural network which consists of 4
𝑇 𝑇
layers, 1 output layer, and 𝑑 + 10 hidden units in each hidden layer.
𝑢(𝑇 , 𝑋𝑡 ) = 𝑢(0, 𝑋0 ) − 𝑓 𝑑𝑠 + (𝑍𝑡 )𝑇 𝑑𝑊𝑠 Through extensive experiments, we realized increasing the number of
∫0 ∫0 hidden layers did not improve our numerical accuracy much. Therefore,
= 𝑢(0, 𝑋0 ) + 𝑔(𝑋𝑇 ) − 𝑌0 = 𝑔(𝑋𝑇 ). (2.14) we continued to employ the same parameters used in the network
throughout the paper.
Thus, 𝑢(𝑇 , 𝑥) = 𝑔(𝑥). On the other hand, recall Itô’s lemma,
The BSDE solver used in this paper was collected from18 . It was
combining (2.9) and (2.13), we have that
modified and implemented using Python version 3.9.12, TensorFlow
{ } version 2.9.1, NumPy version 1.21.5, and various other generally use
1
𝑢𝑡 + ∇𝑢 ⋅ 𝜇 + 𝑇 𝑟(𝜎𝜎 𝑇 𝐇𝑥 𝑢) 𝑑𝑡 + [∇𝑢]𝑇 𝜎𝑑𝑊𝑡 = −𝑓 𝑑𝑡 + (𝑍𝑡 )𝑇 𝑑𝑊𝑡 ,
2 packages. The numerical experiments were performed on a PC with
1 an AMD Ryzen 2700X boost clocked at 4.0 GHz, an NVIDIA GeForce
𝑢𝑡 + ∇𝑢 ⋅ 𝜇 + 𝑇 𝑟(𝜎𝜎 𝑇 𝐇𝑥 𝑢) = −𝑓 ,
2 RTX 2060 Super, along with 16 GB of RAM at 1367 MHz. The focus
1
𝑢𝑡 + ∇𝑢 ⋅ 𝜇 + 𝑇 𝑟(𝜎𝜎 𝑇 𝐇𝑥 𝑢) + 𝑓 = 0. of the numerical experiment is on reaction–diffusion equations. The
2
first experiment was done on a 5D test problem, and the second on
(2.15) the Allen–Cahn equation using various parameters and dimensions.
Thus, we have a solution to (1.1), 𝑢(𝑡, 𝑥). □
4. 5D test problem
To approximate a solution to PDE in (1.1), especially in higher-
dimensional spaces, (2.4) made it possible for us to find a way of We investigate the following reaction–diffusion equation
computing values of 𝑢 at the terminal time 𝑇 at any spatial point, where { 𝜕𝑢(𝑡, 𝑥)
𝑢(0, 𝑋0 ) is a given initial condition. = 𝛥𝑢(𝑡, 𝑥) − .2𝑢 − 5𝑒−.2𝑡 , 𝑡>0
𝜕𝑡 (4.1)
𝑢(0, 𝑥) = (𝑥1 + ⋯ + 𝑥5 )∕2 = ‖𝑥‖ ∕2,
2 2 2
3. Numerical solution to the BSDE
for 𝑥 ∈ R5 and 𝑡 ∈ [0, 𝑇 ]. The analytical solution to (4.1) is given by
We solve the (2.4) numerically to compute an approximation for 1 2
𝑢(0, 𝑋0 ). Let 𝑢(0, 𝑋0 ) = 𝜃𝑢0 and ∇𝑢(0, 𝑋0 ) = 𝜃∇𝑢0 be parameters of the 𝑢(𝑡, 𝑥) = (𝑥 + ⋯ + 𝑥25 )𝑒−.2𝑡 . (4.2)
2 1
numerical procedure. First, we need a time discretization to propagate
In order to convert (4.1) into the form of (1.1) we consider a time
in time and then use a neural network to approximate derivatives in
reversal mapping 𝑡 ↦ 𝑇 − 𝑡 for 𝑇 > 0. This leads us to the following
spatial variables during each time step.
equation with a terminal condition
An explicit Euler’s method is used for time discretization. We apply
{ 𝜕𝑢
temporal discretization to (2.4) and partition the time interval [0, 𝑇 ] + 𝛥𝑢 + .2𝑢 + 5𝑒−.2(𝑇 −𝑡) = 0,
to 0 = 𝑡0 < 𝑡1 < ⋯ < 𝑡𝑁 = 𝑇 . Consider the Euler’s method for 𝜕𝑡 (4.3)
𝑢(𝑇 , 𝑥) = (𝑥21 + ⋯ + 𝑥25 )∕2 = ‖𝑥‖2 ∕2.
𝑛 = 1, … , 𝑁 − 1:
√
𝑋𝑡𝑛+1 − 𝑋𝑡𝑛 ≈ 𝜇(𝑡𝑛 , 𝑋𝑡𝑛 )𝛥𝑡𝑛 + 𝜎(𝑡𝑛 , 𝑋𝑡𝑛 )𝛥𝑊𝑛 . (3.1) This matches the semi-linear parabolic form of (1.1) with 𝜎 = 2𝐼5 ,
𝜇(𝑡, 𝑥) = 0, and 𝑓 (𝑡, 𝑢) = .2𝑢 + 5𝑒−.2(𝑇 −𝑡) , where 𝐼5 denotes a 5 × 5
where 𝛥𝑡𝑛 = 𝑡𝑛+1 − 𝑡𝑛 and 𝛥𝑊𝑛 = 𝑊𝑡𝑛+1 − 𝑊𝑡𝑛 . Substituting (3.1) into identity matrix. Table 1 shows the network parameters used unless
(2.4) gives us stated otherwise. Initial starting range shows where the initial guess
𝑢(𝑡𝑛+1 , 𝑋𝑡𝑛+1 ) − 𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ) of the solution exists. Number of time intervals is the number of steps
( ) used in the explicit Euler method, which is also the number of layers in
≈ −𝑓 𝑡𝑛 , 𝑋𝑡𝑛 , 𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ), 𝜎 𝑇 (𝑡𝑛 , 𝑋𝑡𝑛 )∇𝑢(𝑡𝑛 , 𝑋𝑡𝑛 ) 𝛥𝑡𝑛 (3.2) the network. Learning rates represent our choices of what rate to use
𝑇
+ [∇𝑢(𝑡𝑛 , 𝑋𝑡𝑛 )] 𝜎(𝑡𝑛 , 𝑋𝑡𝑛 )𝛥𝑊𝑛 corresponds to the learning rate boundaries, i.e., 5×10−3 is the learning
3
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Fig. 2. Left: Approximated solution 𝑢(𝑇 = 0.4, 𝑥 = (0.8, 0.5, 1.0, 0.0, 0.2)) to (4.3) using BSDE solver for different iterations of the network. Right: Relative errors of the approximation
from the BSDE solver at different iterations of the network.
Table 1 between those runs is plotted. For each 𝑇𝑛 we plot the relative error on
Network parameters used to produce Fig. 2 and Fig. 3. the right of the approximated solutions. Note that the relative error at
Initial starting range [0.5, 0, 6] 𝑇 = 0.0 is 0, and is excluded it from the plot.
Number of time intervals 30
The results show that the BSDE solver can perform extremely well,
Learning rates 5 × 10−2 , 4 × 10−3
Learning rate boundary 500 especially only using 30 time steps. This is one of the advantages of the
Number of iterations 1000 BSDE solver, i.e., the accuracy is not restricted to small time steps in
explicit Euler method as it was a limitation for traditional numerical
techniques.
rate for the first 500 iterations, and a smaller learning rate, 4 × 10−3 , is 4.1. 5D test problem: Adaptive deep neural network
used for the remaining iterations.
It is possible to approximate solutions to (4.1) at any point. In When using this solver, the quickest way to achieve convergence on
the following, we present an approximated solution of (4.1) using the a solution is to initialize the starting point between a range you believe
BSDE solver and its performance at the point (𝑇 = 0.4, 𝑥 = (0.8, 0.5, 1.0, the solution lies in. The issue with this is, you may not always know
0.0, 0.2)). The analytical solution to (4.1) is 0.965𝑒−0.08 at this point where the solution is. We consider (4.1) again and focus on the point
obtained using (4.2). (𝑇 = 0.2, 𝑥 = (0.8, 0.5, 1.0, 0.0, 0.2)) as seen in Fig. 3. The analytical
The relative error used in the plots throughout the paper is defined solution was found to be 0.927162 at this point. Next, we compare
as, the approximation from the solver with the analytical solution to see if
|Analytical Solution − BSDE Solution| the BSDE solver is able to find a good approximation even with a very
Relative Error = (4.4)
Analytical Solution distant initialization. In a situation like this, it is a good idea to heavily
modify the learning rates.
Fig. 2 shows the approximated solution on the left and the relative We explore the solution given from these parameters for a single
error on the right computed by the BSDE solver at the point (𝑇 = run. For case one, our network is initialized at 817.345 as its starting
0.4, 𝑥 = (0.8, 0.5, 1.0, 0.0, 0.2)) for (4.1). The shaded region depicts the solution. Plots of the first 400 iterations, the last 1000 iterations, and
mean ± one standard deviation from five independent runs. The figures the relative error over all iterations for a single run can be found in
show that the algorithm converges to an accurate solution (less than Fig. 4. Case 2 represents a slightly closer initialization, which is at
0.1% relative error) after around 500 iterations, although five iterations 9.706. We use the network parameters found in Table 2, to achieve
already give less than 10% relative error. the solution. Fig. 5 represents the plots of the BSDE solution for the
The time evolution of (4.1), using the same 𝑥-point used in Fig. 2 first 200 iterations, the last 500 iterations, and the relative error over
is presented below. We use the following recursive relationship for the all iterations.
time value to be used: 𝑇0 = 0, 𝛥𝑇 = 0.1, 𝑇𝑛 = 𝑡0 + 𝑛𝛥𝑇 , 𝑛 = 1, … , 6. It can be seen from Figs. 4 and 5 that using both initializations
Each 𝑇𝑛 is run for 5 independent runs, then the average (by the solver) the network converges to the same approximate solution. With an
4
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Fig. 3. Analytical and approximated solution of the test problem (4.1) at 𝑥 = (0.8, 0.5, 1.0, 0.0, 0.2) for the time evolution, along side the corresponding relative error.
Table 2
Network parameters used to solve the test problem (4.1) at distant initial guesses.
Initial starting range Learning rates
Learning rate boundary
30, 3, 0.5, 0.3, 0.05, 0.01, 0.006, 0.003
Case 1 [800, 900]
250, 500, 600, 1250, 1500, 1700, 1850, 6000
5, 0.5, 0.06, 0.004, 0.0004
Case 2 [9.5,10]
50, 150, 300, 1200, 1500
Fig. 4. Left: First 400 iterations with an initialization at 817.345. Middle: The last 1000 iterations with an initialization at 817.345. Right: The Relative error over all iterations
with an initialization at 817.345.
Fig. 5. Left: First 200 iterations with an initialization at 9.706. Middle: The last 500 iterations with an initialization at 9.706. Right: The Relative error over all iterations with
an initialization at 9.706.
initialization of 817.345, the network begins to make an accurate solution. Another major network parameter that we will explore in this
approximation around the 175th iteration. When the initialization is subsection is the activation function.
slightly closer to the analytical solution the network makes an accurate
We reconsider the time evolution of (4.1) for 𝑥 = (0.8, 0.5, 1.0, 0.0,
approximation around the 75th iteration. Both plots show oscillatory
0.2). In all previous examples, the ReLU activation function was used
behaviors of the network’s solution during these single runs.
to compute results. We present the following activation functions to be
used for experimentation in this subsection, with input 𝑧:
4.2. 5D test problem: Hyper-parameter comparison
• Sigmoid takes a real value as input and outputs another value
As seen in the previous subsection, we decided to change the between 0 and 1:
initialization zone, the number of iterations, and learning rates to
1
see if we could still retrieve a good approximation of the analytical 𝜎(𝑧) = (4.5)
1 + 𝑒−𝑧
5
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Table 3
Network parameters used in the BSDE solver for the
Allen–Cahn equations in 1D, 3D and 60D.
Initial starting range [0.4, 0.5]
Number of time intervals 30
Learning rates 4 × 10−2 , 4 × 10−4
Learning rate boundary 1000
Number of iterations 2000
6
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Fig. 7. Left: Time evolution of the analytical solution and approximations using different activation functions for the 5D test problem (4.1) at 𝑥 = (0.8, 0.5, 1.0, 0.0, 0.2). Right:
Corresponding relative errors obtained by using different activation functions with respect to time.
It can be seen from both Figs. 11 and 12 that with the given
parameters, increasing the dimensions of the problem provides similar
behaviors and shapes to the solution. When looking at our various
potential functions graph in Fig. 8, for 𝑢 = 1 or in the surrounding
region, 𝑓𝛼 (𝑢) approaches zero along with becoming a local minimum
(mentioned in Section 5). Examining Fig. 12 we see that when 𝜖 = 1∕3,
the solution increases much quicker than when 𝜖 = 1. We believe that
for larger time intervals the solution will flatten out over time and
converge to a single value. We can also see that for larger 𝜖 the solution
tends to increase much slower (than using 𝜖 < 1).
It should be noted that there was an issue when using the parame-
ters inTable 3 for the point 𝑡 = 0.16, 𝑥 ∈ R60 in Fig. 12. Attempting
to run this point in the solver arose an issue where the network’s
loss returned ‘‘NaN’’ (Not a number). When examining the iteration
before this occurred it seemed as if the loss was strangely growing and
Fig. 8. Higher order polynomial double well potential functions (5.2) with 𝛼 = 2, 4, moving towards infinity. We solved this issue by changing the network
6, and 8.
parameters, only for this single point. The number of time intervals was
increased from 30 to 300 and a constant learning rate of 0.04 was used
to run the simulation for 100 iterations. This fixed the issue and gave us
Allen–Cahn equation can be explicitly found and the analytical solution a solution that follows the correct overall behavior of the solution. The
to the initial condition problem is given in13 as follows: code for these numerical experiments can be obtained upon request.
( ( 𝑥 − 0.5 − 𝑠𝑡 ))
1
𝑢(𝑡, 𝑥) = 1 − tanh √ , 𝑡 > 0, (5.5)
6. Conclusion
2 2 2𝜖
√
where 𝑠 = 3∕( 2𝜖). The initial condition can be given by plugging√ 𝑡=0 In this paper, we focus on 1D, 3D, 5D, and 60D numerical simula-
into the analytical solution, 𝑢(0, 𝑥) = 12 (1 − tanh((𝑥 − 0.5)∕(2 2𝜖))). tions of reaction–diffusion equations:
We present numerical experiments for 1-dimensional problem for
a. 5D Test Problem: We started with creating a test 5D problem,
𝑥 ∈ [0, 4] when 𝜖 = 1, 1∕3, 1∕6, 1∕9 and 𝑡 = 0.0113 . The analytical
with a known analytical solution to test the efficiency and accu-
solution, approximated solution, and the relative errors are shown in
racy of the deep neural network. We analyzed the time evolution
Fig. 9 on the left, middle, and right respectively. It can be seen that
of our 5D reaction–diffusion equation at the origin. The relative
the approximation gets better when 𝜖 is larger. Furthermore, the BSDE
errors were about 0.1%, and in some cases reaching as small
solver can handle 𝜖 as small as around 1∕9 when 𝛼 = 2. Time evolution
as 0.01%. This behavior of outstanding relative errors echoes
of the approximated solutions 𝑢(𝑡, 𝑥 = 0) for 𝑡 ∈ [0, 0.16] is presented
throughout the remaining experiments.
on the left of Fig. 10 for different values of 𝛼. As we are not aware
b. The Neural Network: The network typically performs best when
of other analytical solutions for varying 𝛼, we use the network’s loss
the initial guess of the solution is chosen in a range that is
values as a replacement for the relative error. Right of Fig. 10 shows
believed to be the analytical solution. However, through ma-
the loss profile.
jor hyper-parameter changes, such as the number of iterations
and the learning rates, we were able to show that the network
5.2. Allen-Cahn: 3D and 60D, varying 𝛼 and 𝜖 converges towards the analytical solution even when initialized
far from the true solution. Throughout the iterations of the net-
Next, we experiment with the higher dimensional Allen–Cahn equa- work, the relative error generally stayed bounded between 1%
tion (5.1), for 𝑥 ∈ R3 and 𝑥 ∈ R60 . We approximate 𝑢(𝑇 = 0.16, 0) and 0.01%, further showing the capacity of the deep neural
using two sets of parameters on AC equations. Approximated solutions network. To keep with the theme of modification towards hyper-
and loss profiles over [0, 0.16] are illustrated. Fig. 11 shows numerical parameters, we altered the activation function used. The solver
results using the first set of parameters, which is 𝜖 = 1 and 𝛼 = 2, 4 and originally uses the well-known ReLU function and we then tried
Fig. 12 corresponds to the second set of parameters, which is 𝛼 = 2 and other activation functions such as the sigmoid, the hyperbolic
𝜖 = 1, 1∕3. tangent, ELU, and the softplus functions. In most cases, the ReLU
7
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Fig. 9. AC 1D for 𝜖 = 1, 13 , 16 , 1
9
when 𝑡 = 0.01: Left: analytical solution Middle: BSDE approximation. Right: relative error.
Fig. 10. Left: Time evolution of (5.4) with 𝜖 = 1 and for 𝛼 = 2, 4, 6, and 8. Right: Corresponding loss values.
Fig. 11. Top Left: Time evolution of 3D AC with 𝜖 = 1 and 𝛼 = 2, 4 Top Right: Calculated loss by the network for the time evolution of 3D AC with 𝜖 = 1 and 𝛼 = 2, 4 Bottom
Left: Time evolution of 60D AC with 𝜖 = 1 and 𝛼 = 2, 4 Bottom Right: Calculated loss by the network for the time evolution of 60D AC with 𝜖 = 1 and 𝛼 = 2, 4.
8
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
Fig. 12. Top Left: Time evolution of 3D AC with 𝛼 = 2 and 𝜖 = 1, 1∕3 Top Right: Calculated loss by the network for the time evolution of 3D AC with 𝛼 = 2 and 𝜖 = 1, 1∕3 Bottom
Left: Time evolution of 60D AC with 𝛼 = 2 and 𝜖 = 1, 1∕3 Bottom Right: Calculated loss by the network for the time evolution of 60D AC with 𝛼 = 2 and 𝜖 = 1, 1∕3.
performed the best and most consistently, besides a few excep- loss and solution become very inconsistent, erratic, and unstable.
tions where the hyperbolic tangent, softplus sigmoid function The ability of the solver to converge quickly to a fair approx-
outperformed at certain points in time. We also studied another imation of the solution when the initial guess is closer to the
reaction–diffusion equation, the Allen–Cahn (AC) equation in analytical solution is very impressive. One downside of the solver
both low and high-dimensional space. in its current state is that it can only find the solution at a specific
c. The Allen–Cahn Equation: The Allen–Cahn equation contains point in time and space, rather than providing solutions over a
two major parameters that we decided to perform analysis on,
region 𝐷 ⊂ R. Ref.21 provided a solution for such problems over
𝜖 the interaction length, and 𝛼, a potential function parameter.
a region, however, it is time-consuming but still is reasonable for
Our first analysis was done in 1D, finding the spatial solution of
the AC equation with 𝛼 = 2 and varying 𝜖 at a final time of 𝑡 = realistic high-dimensional problems.
0.01, to which there is a known analytical solution. The relative
A major benefit of this solver and the method of finding local solu-
error for these approximations was found to be outstandingly
tions to second-order semilinear parabolic PDEs is that computational
ranging between 1% and 0.00001%. Furthermore, we observed
that the smaller the 𝜖 was, the larger the error endured. speed is not affected by the size of the spatial dimension. From what
When working in 3D and 60D Allen Cahn equations under two dif- we have experimented with, the speed and accuracy of the network are
ferent situations we found the loss values of the approximations not solely determined by the dimensions. The network is flexible with
to commonly be between 10−2 to 10−4 in some cases reaching as the ability to input various types of semilinear parabolic PDEs for it to
small as 10−6 . solve.
Through extensive experiments and comparisons of our approxi-
mations with analytical solutions to the 1D Allen–Cahn equation
(AC), we found that the algorithm is extremely accurate, even Declaration of competing interest
with small interaction length parameter 𝜖. In higher-dimensional
space, we focus on the time evolution of the solution at the origin
The authors declare that they have no known competing finan-
using various 𝜖 and order of the potential function 𝛼. The solution
cial interests or personal relationships that could have appeared to
to the AC as a function of time at the origin changes slightly as
influence the work reported in this paper.
the dimension increases. The algorithm was able to approximate
the solution with an interaction length of 𝜖 = 1∕9 for the final
time 𝑇 = 0.16 even in 60D. As the order of the potential function
Data availability
increases, the solution increases at a slower rate.
We have experienced a limitation with the solver when working
with the Allen–Cahn equation. When using 𝜖 < 1∕9 the network’s Data will be made available on request.
9
S. Koohy, G. Yao and K. Rubasinghe Partial Differential Equations in Applied Mathematics 7 (2023) 100499
10