Vimp Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Euro.

Jnl of Applied Mathematics: page 1 of 15 


c The Author(s), 2020. Published by Cambridge University Press.1
doi:10.1017/S0956792520000182

Solving parametric PDE problems with


artificial neural networks
Y U E H A W K H O O1 , J I A N F E N G L U2 and L E X I N G Y I N G3
1 Department
of Statistics, University of Chicago, IL 60615, USA
email: ykhoo@uchicago.edu
2 Department of Mathematics, Department of Chemistry and Department of Physics, Duke University, Durham, NC

27708, USA
email: jianfeng@math.duke.edu
3 Department of Mathematics and ICME, Stanford University, Stanford, CA 94305, USA

email: lexing@stanford.edu

(Received 30 August 2019; revised 21 April 2020; accepted 27 May 2020)

The curse of dimensionality is commonly encountered in numerical partial differential equations


(PDE), especially when uncertainties have to be modelled into the equations as random coefficients.
However, very often the variability of physical quantities derived from PDE can be captured by a
few features on the space of the coefficient fields. Based on such observation, we propose using
neural network to parameterise the physical quantity of interest as a function of input coefficients.
The representability of such quantity using a neural network can be justified by viewing the neural
network as performing time evolution to find the solutions to the PDE. We further demonstrate
the simplicity and accuracy of the approach through notable examples of PDEs in engineering and
physics.

Key words: Neural-network, parametric PDE, uncertainty quantification

2020 Mathematics Subject Classification: 65Nxx

1 Introduction
Uncertainty quantifications in physical and engineering applications often involve the study
of partial differential equations (PDE) with random fields of coefficients. To understand the
behaviour of a system in the presence of uncertainties, one can extract PDE-derived physical
quantities as functionals of the coefficient fields. This can potentially require solving the PDE
exponential number of times numerically even with suitable discretisation of the PDE domain
and of the range of random variables. Fortunately in most PDE applications, often these func-
tionals depend only on a few characteristic ‘features’ of the coefficient fields, allowing them to
be determined from solving the PDE a limited number of times.
A commonly used approach for uncertainty quantifications is Monte-Carlo sampling. An
ensemble of solutions is built by repeatedly solving the PDE with different realisations. Then
physical quantities of interest, for example, the mean of the solution at a given location, can
be computed from the ensemble of solutions. Although being applicable in many situations, the

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
2 Y. Khoo et al.

computed quantity is inherently noisy. Moreover, it lacks the ability to obtain new solutions
if they are not sampled previously. Other approaches exploit the low underlying dimensional-
ity assumption in a more direct manner. For example, the stochastic Galerkin method [13, 16]
expands the random solution using certain prefixed basis functions (i.e. polynomial chaos
[18, 19]) on the space of random variables, thereby reducing the high-dimensional problem to
a few deterministic PDEs. Such type of methods requires careful treatment of the uncertainty
distributions, and since the basis used is problem independent, the method could be expensive
when the dimensionality of the random variables is high. There are data-driven approaches
for basis learning such as applying Karhunen–Loève expansion to PDE solutions from dif-
ferent realisations of the PDE [3]. Similar to the related principal component analysis, such
linear dimension-reduction techniques may not fully exploit the nonlinear interplay between
the random variables. At the end of day, the problem of uncertainty quantification is one of
characterising the low-dimensional structure of the coefficient field that gives the observed
quantities.
On the other hand, the problem of dimensionality reduction has been central to the fields of
statistics and machine learning. The fundamental task of regression seeks to find a function hθ
parameterised by a parameter vector θ ∈ Rp such that

f (a) ≈ hθ (a), a ∈ Rn . (1.1)

However, choosing a sufficiently large class of approximation functions without the issue
of over-fitting remains a delicate business. As an example, in linear regression, the standard
procedure is to fix a set of basis (or feature maps) {φk (a)} such that

f (a) = βk φk (a) (1.2)
k

and determine the parameter βk ’s from sampled data. The choice of basis is important to the
quality of regression, just as in the case of studying PDEs with random coefficients. Recently,
deep neural networks have demonstrated unprecedented success in solving a variety of diffi-
cult regression problems related to pattern recognitions [8, 11, 15]. A key advantage of neural
network is that it bypasses the traditional need to handcraft basis for spanning f (a), but instead
learns the optimal basis that satisfies (1.1) directly from data. The performance of neural network
in machine learning applications, and more recently in physical applications such as represent-
ing quantum many-body states (e.g. [17, 2]), prompts us to study its use in the context of solving
PDE with random coefficients. More precisely, we want to learn f (a) that maps the coefficient
vectors a in a PDE to some physical quantity described by the PDE.
Our approach to solve for quantities arise from PDE with random coefficients consists of the
following simple steps:

• Sample the random coefficients (a in (1.1)) of the PDE from a user-specified distribution. For
each set of coefficients, solve the deterministic PDE to obtain the physical quantity of interest
(f (a) in (1.1)).
• Use a neural network as the surrogate model hθ (a) in (1.1) and train it using the previously
obtained samples.
• Validate the surrogate forward model with more samples. The neural network is now ready
for applications.

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 3

Though being a simple method, to the best of our knowledge, dimension reduction based on neu-
ral network representation has not been adapted to solving PDE with uncertainties. We consider
two simple but representative parametric PDE tasks in this work: elliptic homogenisation and
nonlinear Schrödinger eigenvalue problem. The main contributions of our work are as follows:

• We provide theoretical guarantees on neural network representation of f (a) through explicit


construction for the parametric PDE problems under study;
• We show that even a rather simple neural network architecture can learn a good representation
of f (a) through training.

We note that our work is different from [10, 14, 12, 9], which solve deterministic PDE numer-
ically using a neural network. The goal of these works is to parameterise the solution of a
deterministic PDE using neural network and replace Galerkin-type methods when performing
model reduction. It is also different from [6] where a deterministic PDE is solved as a stochastic
control problem using neural network. In this paper, the function that we want to parameterise is
over the coefficient field of the PDE.
The advantages of having an explicitly parameterised approximation to f (a) are numerous,
which we will only list a couple here. First, the neural network parameterised function can serve
as a surrogate forward model for generating samples cheaply for statistical analysis. Second, the
task of optimising some function of the physical quantity with respect to the PDE coefficients can
be done with the help of a gradient calculated from the neural network. To summarise, obtaining
a neural network parametrisation could limit the use of expensive PDE solvers in applications.
We demonstrate the success of neural network in two PDE applications. In particular, we
consider solving for the effective conductance in inhomogeneous media and the ground state
energy of a nonlinear Schrödinger equation (NLSE) having inhomogeneous potential. These are
important physical models with wide applications in physics and engineering.
In Section 2, we provide background on the two PDEs of interest. In Section 3, the theoretical
justification of using neural network (NN) to represent the physical quantities derived from the
PDEs introduced in Section 2 is provided. In Section 4, we describe the neural network architec-
ture for handling these PDE problems and report the numerical results. We finally conclude in
Section 5.

2 Two examples of parametric PDE problems


This section introduces the two PDE models – the linear elliptic equation and the NLSE – we
want to solve for. We focus on the map from the coefficient field of these equations to certain
physical quantities of interest. In both cases, the boundary condition is taken to be periodic for
simplicity.

2.1 Effective coefficients for inhomogeneous elliptic equation


Our first example will be finding the effective conductance in a non-homogeneous media. For
this, we consider the elliptic equation

∇ · a(x)(∇u(x) + ξ ) = 0, x ∈ [0, 1]d (2.1)

with periodic boundary condition where ξ ∈ Rd , ξ 22 = 1 ( · 2 is the Euclidean norm). To


ensure ellipticity, we consider the class of coefficient functions

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
4 Y. Khoo et al.

A = {a ∈ L∞ ([0, 1]d ) | λ1 ≥ a ≥ λ0 > 0}. (2.2)

For a given ξ , we want to obtain the effective conductance functional Aeff : A → R defined by
 
Aeff (a) = a(x)∇ua (x) + ξ 22 dx = − ua (x)∇ · a(x)(∇ua (x) + 2ξ ) − a(x) dx, (2.3)
[0,1]d [0,1]d

where ua satisfies (2.1) (the subscript ‘a’ in ua is used to denote the dependence of the solution
of (2.1) on the coefficient field a). The second equality follows from integration by parts.
In practice, to parameterise Aeff as a function of the coefficient field a(x), we discretise the
domain using a uniform grid with step size h and grid points xi = ih, where the multi-index i ∈
{(i1 , . . . , id )}, i1 , . . . , id = 1, . . . , n with n = 1/h. In this way, we can think about the coefficient
field a(x) and the solution u(x) evaluated on the grid points both as vectors with length nd . More
precisely, let the action of Laplace operator on u (the term ∇ · a(x)∇u(x)) be discretised using
central difference as
d 1
(a + ai )(ui+ek − ui ) + 12 (ai−ek + ai )(ui−ek − ui )
2 i+ek
(2.4)
k=1
h2

for each i, where {ek }dk=1 denotes the canonical basis in Rd . Here ai := a(i/n) and ui := u(i/n).
Then the discrete version of (2.1) is obtained as

(La u + bξ ,a )i :=
d
(ai+ek + ai )ui+ek + (ai + ai−ek )ui−ek − (ai−ek + 2ai + ai+ek )ui
k=1
2h2
d
ξk (ai+ek − ai−ek )
+ = 0, ∀i, (2.5)
k=1
2h

where the first equality gives the definitions of La and bξ ,a . The discrete version of effective
conductance is obtained as 2E(ua ; a), where
1 1
E(u; a) = − u La u − u bξ ,a + d a 1, (2.6)
2 2n
and 1 is the all-one vector.

2.2 NLSE with random potential


For the second example, we want to find the ground state energy E0 of a NLSE with potential
a(x):

−u(x) + a(x)u(x) + σ u(x)3 = E0 u(x), x ∈ [0, 1]d (2.7)

subject to the normalisation constraint



u(x)2 dx = 1. (2.8)
[0,1]d

We take σ = 2 in this work and thus consider a defocusing cubic Schrödinger equation, which
can be understood as a model for soliton in nonlinear photonics or Bose–Einstein condensate
with inhomogeneous media. Similar to (2.5), we solve the discretised version of the NLSE

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 5
d
1  2
n
−Lu + ai ui + σ u3i = E0 ui ∀i, u = 1, (2.9)
nd i=1 i

where
d
ui+ek + ui−ek − 2ui
L := . (2.10)
k=1
h2

Due to the nonlinear cubic term, it is more difficult to solve for the NLSE numerically compared
to (2.1). Therefore in this case, the value of having a surrogate model of E0 as a function of a is
more significant. We note that the solution u to (2.9) (and thus E0 ) can also be obtained from a
variational problem
σ  4
min −u Lu + u diag(a)u + u , (2.11)
u: u22 =nd 2 i i

where the diag(·) operator forms a diagonal matrix given a vector.

3 Theoretical justification of deep neural network representation


The physical quantities introduced in Section 2 are determined through the solution of the PDEs
given the coefficient field. Rather than solving the PDE, we will prove that the map from coef-
ficient field to such quantities can be represented using convolutional NNs. The main idea is to
view the solution u of the PDE as being obtained via time evolution, where each layer of the
NN corresponds to the solution at discrete time step. We focus here the case of solving elliptic
equation with inhomogeneous coefficients. Similar line of reasoning can be used to demonstrate
the representability of the ground state-energy E0 as a function of a using an NN.

Theorem 1 Fix an error tolerance > 0, there exists a neural network hθ (·) with O(nd )
hidden nodes per-layer and O(n2/ ) layers such that for any a ∈ A = {a ∈ L∞ ([0, 1]d ) | a(x) ∈
[λ0 , λ1 ], ∀x, λ0 > 0}, we have

|hθ (a) − Aeff (a)| ≤ λ1 . (3.1)

The proof of Theorem 1 is given in Appendix A. Note that due to the ellipticity assumption a ∈
A, the effective conductivity is bounded from below by Aeff (a) ≥ λ0 > 0. Therefore the theorem
immediately implies a relative error bound
|hθ (a) − Aeff (a)| λ1
≤ . (3.2)
Aeff (a) λ0
We illustrate the main idea of the proof in the rest of the section, and the technical details of
the proof are deferred to the supplementary materials.
First observe that the effective coefficient obeys a variational principle

Aeff (a) = 2 min E(u; a), (3.3)


u

where La , bξ ,a are defined in (2.5). Therefore, to get Aeff (a), we may minimise E(u; a) over the
solution space u, using e.g. steepest descent:
∂E(um ; a)
um+1 = um − t
 ∂u  (3.4)
= um + t La um + bξ ,a ,
Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
6 Y. Khoo et al.

where t is a step size chosen sufficiently small to ensure descent of the energy. Note that the
optimisation problem is convex due to the ellipticity assumption (2.2) of the coefficient field
a (which ensures −u La u > 0 except for u = 1) with Lipschitz continuous gradient; therefore,
the iterative scheme converges to the minimiser with proper choice of step size for any initial
condition. Thus we can choose u0 = 0.
Now we can identify the iteration scheme with a convolutional NN architecture by viewing m
d
as an index of the NN layers. The input of the NN is the vector a ∈ Rn , and the hidden layers are
used to map between the consecutive pairs of (d + 1)-tensors Uim0 i1 ...id and Uim+1
0 i1 ...id
. The zeroth
dimension for each tensor U m is the channel dimension and the last d dimensions are the spatial
dimensions. If we let the channels in each U m be consisted of a copy of a and a copy of um , e.g. let

U0im1 ...id = a(i1 ,...,id ) , U1im1 ...id = um


(i1 ,...,id ) , (3.5)

in light of (3.4) and (2.5), one simply needs to perform local convolution (to aggregate a locally)
and nonlinearity (to approximate quadratic form of a and um ) to get from U1im1 ...id = um (i1 ,...,id )
to U1im+1
1 ...id
= um+1
(i1 ,...,id ) ; while the 0-channel is simply copied to carry along the information of
a. Stopping at m = M layer and letting uM be the approximate minimiser of E(u; a), based on
(3.3), we obtain an approximation to Aeff (a). Note that the architecture of NN used in the proof
resembles a deep ResNet [7], as the coefficient field a is passed from the first to the last layer.
The detailed estimates of the approximation error and the number of parameters will be deferred
to the supplementary materials.
Let us point out that if we take the continuum time limit of the steepest descent dynamics, we
obtain a system of ODE

∂t u = La u + bξ ,a , (3.6)

which can be viewed as a spatial discretisation of a PDE. Thus our construction of the neural
network in the proof is also related to the work [12] where multiple layers of convolutional NN
are used to learn and solve evolutionary PDEs. However, the goal of the neural network here is to
approximate the physical quantity of interest as a functional of the (high-dimensional) coefficient
field, which is quite different from the view point of [12].
We also remark that the number of layers of the NN required by Theorem 1 is rather large. This
is due to the choice of the (unconditioned) steepest descent algorithm as the engine of optimisa-
tion to generate the neural network architecture used in the proof. With a better preconditioner
such as the algebraic multigrid [20], we can effectively reduce the number of layers to O(1) and
thus achieves an optimal count of parameters involved in the NN. In practice, as shown in the
next section by actual training of parametric PDEs, the neural network architecture can be much
simplified while maintaining good approximation to the quantity of interest.

4 Proposed network architecture and numerical results


In this section, based on the discussion in Section 3, we propose using convolutional NN to
approximate the physical quantities given by the PDE with a periodic boundary condition. The
architecture of the neural network is described in Section 4.1, and then the implementation details
and numerical results are provided in Sections 4.2 and 4.3 respectively.

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 7

FIGURE 1. Construction of the NN in the proof of Theorem 1. The NN takes the coefficient field a as
an input and the convolutional layers are used to map from um , a to um+1 , a, m = 0, M − 1. At uM , local
convolutions and nonlinearity are used to obtain E(uM ; a).

Inputs 2@ Outputs
2@n×n 2n–1 × 2n–1 a @n×n a @n×n a @n×n 1@n×n

FIGURE 2. The map between (ut , a) and ut+1 .

Inputs 2@
2@n×n 2n–1 × 2n–1 a @n×n a @n×n a @n×n a @1×1 a @1×1 Outputs

FIGURE 3. The map between (uM , a) and E.

4.1 Architectures
In this subsection, we present two different architectures. Since we have shown theoretically that
the NN proposed in Figure 1 can approximate the effective conductance function, the NN in
Figure 1 will serve as a basis for the first architecture. In the second architecture, we simply use
an NN that incorporates translational symmetry. For most of the numerical tests in subsequent
sections, we report results using the second architecture since its simplicity demonstrates the
effectiveness of an NN. However, results for the first NN architecture are also given in order to
provide numerical evidence for Theorem 1 for the case of determining the effective conductance.
The first architecture is based on a ResNet [7] where the construction of NN is illustrated in
Figure 1. In Figures 2 and 3, the explicit maps between (ut , a) and ut+1 , and between (uM , a)
and E are detailed respectively. For the sake of illustration, we assume a 2D unit square domain,
though it can be generalised to solving PDEs in any dimensions. The input to the NN is a matrix
a ∈ Rn×n representing the coefficient field on grid points, and the output of the network gives
physical quantity of interest from the PDE. Based on (3.4) and (2.5), the cubic function that

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
8 Y. Khoo et al.

FIGURE 4. Single convolutional layer neural network for representing translational invariant function.

maps from (ut , a) to ut+1 is realised via a few convolutional layers as in Figure 2. Since the
function E(uM ; a) has translational symmetry in the sense that

(i+τ1 )(j+τ2 ) ; a(i+τ1 )(j+τ2 ) ) = E(u ; a)


E(uM M
(4.1)

where the additions are done on Zn , a sum-pooling is used in Figure 3 to preserve the translational
symmetry of the function E.
Figure 4 shows the second architecture for the 2D case. When designing this architecture, we
forgo the PDE knowledge used to prove Theorem 1, and simply used the fact that

f (a) := E(ua ; a) (4.2)

has translational symmetry in terms of the input a. Again, the main part of the network is con-
volutional layers with ReLU being the nonlinearity. This extracts the relevant features of the
coefficient field around each grid point that contribute to the final output. The use of a sum-
pooling followed by a linear map to obtain the final output is again based on the translational
τ τ
symmetry of the function f to be represented. More precisely, let aij1 2 := a(i+τ1 )(j+τ2 ) , where the
additions are done on Zn . The output of the convolutional layer gives basis functions that satisfy

φ̃kij (aτ1 τ2 ) = φ̃k(i−τ1 )(j−τ2 ) (a), ∀(τ1 , τ2 ) ∈ {1, . . . , n}2 . (4.3)

When using the architecture in Figure 4, for any τ1 τ2 ,


α n  n   α 
n 
n
τ1 τ2
f (a ) = βk φ̃k(i−τ1 )(j−τ2 ) (a) = βk φk (a), φk := φ̃kij , (4.4)
k=1 i=1 j=1 k=1 i=1 j=1

where βk ’s are the weights of the last densely connected layer. Therefore the translational
symmetry of f is preserved.
We note that all operations in Figures 3 and 4 are standard except the padding operation.
Typically, zero-padding is used to enlarge the size of the input in image classification task,
whereas we extend the input periodically due to the assumed periodic boundary condition.

4.2 Implementation
The neural network is implemented using Keras [4], an application programming interface
running on top of TensorFlow [1] (a library of toolboxes for training neural network). A mean-
squared-error loss function is used and the optimisation is done using the NAdam optimiser [5].

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 9
Table 1. Error in approximating the effective conductance function Aeff (a) in 2D using the archi-
tecture in Figure 1. The definition of α is given in Figure 1. We test the NN with data model 1 and
2, where a1 , . . . , an2 are generated from independent and correlated random field respectively.
The mean and standard deviation of the effective conductance are computed from the samples
in order to show the variability. The sample sizes for training and validation are the same.

Data Training Validation Average No. of No. of


model n α error error Aeff samples param.

1 8 10 2.0e-3 1.8e-3 1.58 ± 0.10 1.2e + 4 4416


1 16 10 1.5e-3 1.4e-3 1.58 ± 0.052 2.4e + 4 4416
2 8 10 1.2-3 1.2e-3 4.87 ± 1.00 1.2e + 4 4416
2 16 10 6.8-4 6.7e-4 4.97 ± 1.01 2.4e + 4 4416

The hyper-parameter we tune is the learning rate, which we lower if the training error fluctu-
ates too much. The weights are initialised randomly from the normal distribution. The input to
the neural network is whitened to have unit variance and zero-mean on each dimension. The
mini-batch size is always set between 50 and 200.

4.3 Numerical examples


4.3.1 Effective conductance
For the case of effective conductance, we assume the following distributions for the input data:

1. The independent and identical random variables ai , i = 1, . . . , nd are distributed according


to U [0.3, 3] where U [λ0 , λ1 ] denotes the uniform distribution on the interval [λ0 , λ1 ].
d d
2. Correlated random variables ai , i = 1, . . . , nd with covariance matrix PPT ∈ Rn ×n :
d
a = Pb ∈ Rn , (4.5)

b1 , . . . , bnd are independently and identically distributed as U [0.3, 5], and the covariance
matrix is defined by
 
xi − xj 2
[PPT ]ij = exp − , i, j = 1, . . . , nd . (4.6)
(2h)2
The results of learning the effective conductance function are presented in Table 1. We use the
same number of samples for training and validation. Both the training and validation error are
measured by

k (hθ (a ) − Aeff (a ))
k k 2
k 2
, (4.7)
k Aeff (a )

where ak ’s can be either the training or validation samples sampled from the same distribution
and hθ is the neural network-parameterised approximation function. For this experiment in d = 2,
we report the results using two different architectures. In Table 1, the results for the architecture
in Figure 1 with M = 5 provide numerical evidence for Theorem 1. In Table 2, the more eco-
nomical NN in Figure 5 is used to demonstrate that by simply considering the symmetry in the

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
10 Y. Khoo et al.
Table 2. Error in approximating the effective conductance function Aeff (a) in 2D using the
architecture in Figure 5. The definition of α is given in Figure 5. The sample sizes for training
and validation are the same.

Data Training Validation Average No. of No. of


model n α error error Aeff samples param.

1 8 16 1.5e-3 1.4e-3 1.58 ± 0.10 1.2e + 4 1057


1 16 16 2.2e-3 1.8e-3 1.58 ± 0.052 2.4e + 4 4129
2 8 16 1.0-3 1.0e-3 4.87 ± 1.00 1.2e + 4 1057
2 16 16 2.5-3 2.5e-3 4.97 ± 1.01 2.4e + 4 4129

FIGURE 5. Neural network architecture for approximating Aeff (a) in the 1D case. Although the layers in
third stage are essentially densely connected layers, we still identify them as convolution layers to reflect
the symmetry between the first and third stages.

function to be approximated (without using PDE knowledge), results with relatively good accu-
racy can already be obtained. The simple NN in Figure 5 gives similar results as the deep NN in
Figure 1, indicating we might be able to further reduce the complexity of the NN in Theorem 1
via a more careful construct.
Before concluding this subsection, we use the exercise of determining the effective conduc-
tance in 1D to provide another motivation for the usage of a neural network. In 1D the effective
conductance can be expressed analytically as the harmonic mean of ai ’s:
  
1 −1
n
1
Aeff (a) = . (4.8)
n i=1 ai

This function indeed approximately corresponds to the deep neural network shown in Figure 5.
The neural network is separated into three stages. In the first stage, the approximation to function
1/ai is constructed for each ai by applying a few convolution layers with size 1 kernel window.
In this stage, the channel size for these convolution layers is chosen to be 16 except the last layer
since the output of the first stage should be a vector of size n. In the second stage, a layer of sum-
pooling with size n window is used to perform the summation in (4.8), giving a scalar output.

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 11
Table 3. Error in approximating the lowest energy level E0 (V ) for n = 8, 16 discretisation using
the architecture in Figure 5.

Training Validation No. of No. of


n α error error Average E0 samples param.

8 5 4.9 × 10−4 5.0 × 10−4 10.48 ± 0.51 4800 331


16 5 1.5 × 10−4 1.5 × 10−4 10.46 ± 0.27 1.05 × 104 1291

FIGURE 6. The first stage’s output of the neural network in Figure 5 fitted by β1 /x + β2 .

The third and first stages have the exact same architecture except the input to the third stage is a
scalar. For training 2560 samples are used and another 2560 samples are used for validation. We
let ai ∼ U [0.3, 1.5], giving an effective conductance of 0.77 ± 0.13 for n = 8. A validation error
of 4.9 × 10−4 is obtained with the neural network in Figure 5, while with the network in Figure 4
the accuracy is 5.5 × 10−3 with α = 16. As a sanity check, Figure 6 shows that the output from
the first stage is well fitted by the reciprocal function.
We remark that although incorporating domain knowledge in PDE to build a sophisticated
neural network architecture would likely boost the approximation quality, such as what we do
in the constructive proof for Theorem 1, even a simple network as in Figure 4 can already give
decent results.

4.3.2 Ground state energy of NLSE


We next focus on the 2D case in the NLSE example. The goal here is to obtain a neural network
2
parametrisation for E0 (a), with input now being a ∈ Rn with i.i.d. entries distributed according to
U [1, 16]. In order to generate training samples, for each realisation of a, the nonlinear eigenvalue
problem (2.9) subject to the normalisation constraint (2.8) is solved by a homotopy method. First,
the case σ = 0 is solved as a standard eigenvalue problem. Then σ is changed from 0 to 2 with
a step size equal to 0.4. For each σ , Newton’s method is used to solve the NLSE for ua (x) and

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
12 Y. Khoo et al.

E0 (a), using ua (x) and E0 (a) corresponding to the previous σ value as initialisation. The results
are presented in Table 3.

5 Conclusion
In this note, we present a method based on deep neural network to solve PDE with inhomoge-
neous coefficient fields. Physical quantities of interest are learned as a function of the coefficient
field. Based on the time-evolution technique for solving PDE, we provide theoretical motiva-
tion to represent these quantities using an NN. The numerical experiments on diffusion equation
and NLSE show the effectiveness of simple convolutional neural network in parametrising such
function to 10−3 accuracy. We remark that while many questions should be asked, such as what
is the best network architecture and what situations can this approach handle, the goal of this
short note is simply to suggest neural network as a promising tool for model reduction when
solving PDE with uncertainties.

Conflict of interest
None.

References
[1] ABADI, M., AGARWAL, A., BARHAM, P., BREVDO, E., CHEN, Z., CITRO, C., CORRADO, G. S.,
DAVIS, A., DEAN, J., DEVIN, M., GHEMAWAT, S., GOODFELLOW, I., HARP, A., IRVING, G.,
ISARD, M., JIA, Y., JOZEFOWICZ, R., KAISER, L., KUDLUR, J., LEVENBERG, M., MANE, D.,
MONGA, R., MOORE, S., MURRAY, D., OLAH, C., SCHUSTER, M., SHLENS, J., STEINER, B.,
SUTSKEVER, I., TALWAR, K., TUCKER, P., VANHOUCKE, V., VASUDEVAN, V., VIEGAS, F.,
VINYALS, O., WARDEN, P., WATTENBERG, M., WICKE, M., YU, Y. & ZHENG, X. (2016)
Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint
arXiv:1603.04467.
[2] CARLEO, G. & TROYER, M. (2017) Solving the quantum many-body problem with artificial neural
networks. Science 355(6325), 602–606.
[3] CHENG, M., HOU, T. Y., YAN, M. & ZHANG, Z. (2013) A data-driven stochastic method for elliptic
PDEs with random coefficients. SIAM/ASA J. Uncertainty Quant. 1(1), 452–493.
[4] Chollet, F. (2017) Keras (2015). http://keras.io.
[5] Dozat, T. (2016) Incorporating Nesterov momentum into ADAM. In: Proceedings of the ICLR
Workshop.
[6] HAN, J., JENTZEN, A. & WEINAN E. (2017) Overcoming the curse of dimensionality: solving high-
dimensional partial differential equations using deep learning. arXiv preprint arXiv:1707.02568.
[7] HE, K., ZHANG, X., REN, S. & SUN, J. (2016) Deep residual learning for image recognition. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
[8] HINTON, G. E. & SALAKHUTDINOV, R. R. (2006) Reducing the dimensionality of data with neural
networks. Science 313(5786), 504–507.
[9] KHOO, Y., LU, J. & YING, L. (2018) Solving for high dimensional committor functions using artificial
neural networks. arXiv preprint arXiv:1802.10275.
[10] LAGARIS, I. E., LIKAS, A. & FOTIADIS, D. I. (1998) Artificial neural networks for solving ordinary
and partial differential equations. IEEE Trans. Neural Networks 9(5), 987–1000.
[11] LECUN, Y., BENGIO, Y. & HINTON, G. (2015) Deep learning. Nature 521(7553), 436–444.
[12] LONG, Z., LU, Y., MA, X. & DONG, B. (2017) PDE-net: learning PDEs from data. arXiv preprint
arXiv:1710.09668.

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 13

[13] MATTHIES, H. G. & KEESE, A. (2005) Galerkin methods for linear and nonlinear elliptic stochastic
partial differential equations. Comput. Methods Appl. Mecha. Eng. 194(12), 1295–1331.
[14] RUDD, K. & FERRARI, S. (2015) A constrained integration (CINT) approach to solving partial
differential equations using artificial neural networks. Neurocomputing 155, 277–285.
[15] SCHMIDHUBER, J. (2015) Deep learning in neural networks: an overview. Neural Networks 61, 85–
117.
[16] STEFANOU, G. (2009) The stochastic finite element method: past, present and future. Comput.
Methods Appl. Mecha. Eng. 198(9), 1031–1051.
[17] TORLAI, G. & MELKO, R. G. (2016) Learning thermodynamics with Boltzmann machines. Phys. Rev.
B 94(16), 165134.
[18] WIENER, N. (1938) The homogeneous chaos. Am. J. Math. 60(4), 897–936.
[19] XIU, D. & KARNIADAKIS, G. E. (2002) The Wiener–Askey polynomial chaos for stochastic
differential equations. SIAM J. Sci. Comput. 24(2), 619–644.
[20] XU, J. & ZIKATANOV, L. (2017) Algebraic multigrid methods. Acta Numerica 26, 591–721.

A Appendix

A.1 Proof of representability of effective conductance by NN


As mentioned previously in Section 3, the first step of constructing an NN to represent the effec-
tive conductance is to perform time-evolution iterations in the form of (3.4). However, since
at each step we need to approximate the map from um to um+1 in (3.4) using NN, the pro-
cess of time-evolution is similar to applying noisy gradient descent on E(u; a). More precisely,
after performing a step of gradient descent update, the NN approximation incurs noise to the
update, i.e.

v 0 = u0 = 0, um+1 = v m − t∇E(v m ), v m+1 = um+1 + tεm+1 . (A.1)

Here E(u; a) is abbreviated as E(u), and εm+1 is the error for each layer of the NN in
approximating each exact time-evolution iteration um+1 .
Now let the spectral norm of La and L†a satisfy

La 2 ≤ λa , L†a 2 ≤ 1/μa , (A.2)

and for the case considered, λa = O(λ1 n2 ) and μa = (λ0 ). Assuming

εm+1 2 ≤ c∇E(v m )2 , 1 εm+1 = 0, m = 0, . . . M − 1, (A.3)

the following lemma can be obtained.

Lemma 1 The iterations in (3.4) satisfy


t
E(v m+1 ) − E(v m ) ≤ − ∇E(v m )22 , (A.4)
2
  c2
if t ≤ δ, δ = 1 − 1 2
2(1−c) λa
, λa = (1 + )λ .
1−c a
Furthermore,

t 
M−1
∇E(v m+1 )22 ≤ E(v 0 ) − E(v M ) ≤ E(v 0 ). (A.5)
2 m=0

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
14 Y. Khoo et al.

Proof From Lipschitz property of ∇E(u) (A.2),


λa m+1
E(v m+1 ) − E(v m ) ≤ ∇E(v m ), v m+1 − v m  + v − v m 22
2
= ∇E(v m ), v m − t(∇E(v m ) + εm+1 ) − v m 
λa
+ v m − t(∇E(v m ) + εm+1 ) − v m 22
2
tλa
= − t(1 − )∇E(v m )22
2
tλa m+1 λa t2 m 2
+ t(1 − )ε , ∇E(v m ) + ε 2
2 2
tλa
≤ − t(1 − )∇E(v m )22
2
tλa ctλa
+ ct(1 − + )∇E(v m )22
2 2
 tλa 
= − t (1 − c) − (1 − c + c2 ) ∇E(v m )22
2
 1 − c + c2 tλa 
= − t(1 − c) 1 − ∇E(v m )22
1−c 2
 tλa 
= − t(1 − c) 1 − ∇E(v m )22 . (A.6)
2
 
Letting t ≤ 1 − 1 2
2(1−c) λa
, we get

t
E(v m+1 ) − E(v m ) ≤ − ∇E(v m )22 . (A.7)
2
Summing the LHS and RHS and using the fact that E(u) ≥ 0 give (A.5). This concludes the
lemma.

Theorem 2 If t satisfies the condition in Lemma 1, given any > 0, |E(v M ) − E(v)| ≤ for
λ21 2
M = O((λ21 + λ0
+ λ1 ) n ).

Proof Since by convexity

E(u∗ ) − E(v m ) ≥ ∇E(v m ), u∗ − v m , (A.8)

along with Lemma 1,


t
E(v m+1 ) ≤ E(u∗ ) + ∇E(v m ), v m − u∗  − ∇E(v m )22
2
1 
= E(u∗ ) + 2t∇E(v m ), v m − u∗  − t2 ∇E(v m )22
2t 
+ v m − u∗ 22 − v m − u∗ 22
1
= E(u∗ ) + (v m − u∗ 22 − v m − t∇E(v m ) − u∗ 22 )
2t
1
= E(u∗ ) + (v m − u∗ 22 − v m+1 − tεm+1 − u∗ 22 )
2t
1
= E(u∗ ) + (v m − u∗ 22 − v m+1 − u∗ 22
2t
+ 2tεm+1 , v m+1 − u∗  − t2 εm+1 22 )
1  m
= E(u∗ ) + v − u∗ 22 − v m+1 − u∗ 22 + t2 εm+1 22
2t

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182
Neural networks for parametric PDE 15

+ 2tεm+1 , v m − u∗  − 2tεm+1 , ∇E(v m )
1  m
≤ E(u∗ ) + v − u∗ 22 − v m+1 − u∗ 22 + t2 εm+1 22
2t 
+ 2tεm+1 2 (v m − u∗ 2 + ∇E(v m )2 )
1  m
≤ E(u∗ ) + v − u∗ 22 − v m+1 − u∗ 22 + t2 εm+1 22
2t 
+ 2t(1 + 2/μa )εm+1 2 ∇E(v m )2
1  m
≤ E(u∗ ) + v − u∗ 22 − v m+1 − u∗ 22 + c2 t2 ∇E(v m )22
2t 
+ 2c(1 + 2/μa )t∇E(v m )22 . (A.9)

The last second inequality follows from (A.2), which implies La u2 ≥ μa u2 if u 1 = 0. More
precisely, the fact that v 0 = 0, ∇E(u) 1 = 0 (follows from the form of La and bξ ,a defined in
(2.5)), and εm 1 = 0 ∀m (due to the assumption in (A.3)) implies v m 1 = 0, hence μ2a v m −
u∗ 2 ≤ ∇E(v m ) − ∇E(u∗ )2 = ∇E(v m )2 . Reorganising (A.9) we get
 
1  2 
E(v m+1 ) − E(u∗ ) ≤ v m − u∗ 22 − v m+1 − u∗ 22 + ct ct + 2(1 + ) ∇E(v m )22 .
2t μa
(A.10)
Summing both left- and right-hand sides results in

1 
M−1

E(v ) − E(u ) ≤
M
E(v m+1 ) − E(u∗ )
M m=0

 0   
1 v − u∗ 22 2c 2
≤ + ct + 2(1 + ) E(v ) ,
0
(A.11)
M 2t t μa
where the second inequality follows from (A.5). In order to derive a bound for v 0 − u∗ 22 , we
appeal to strong convexity property of E(u):
μa 0 μa 0
E(v 0 ) − E(u∗ ) ≥ ∇E(u∗ ), v m − u∗  + v − u∗ 22 = v − u∗ 22 (A.12)
2 2
for 1, v 0 − u∗  = 0. Then

   
1 ∗ 1 2c 2
E(v ) − E(u ) ≤
M
+ ct + 2(1 + ) E(v 0 ). (A.13)
M μa t t μa
a 1
Since E(v 0 ) = 2nd
= O(λ1 ), along with λa = O(λ1 n2 ), μa = (λ0 ), we establish the claim.

Downloaded from https://www.cambridge.org/core. Caltech Library, on 04 Jul 2020 at 06:12:05, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0956792520000182

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy