2412.14683v1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Numerical Robustness of PINNs for Multiscale

Transport Equations

Alexander Jessera,b,∗, Kai Kryckic , Ryan G. McClarrend , Martin Franka


arXiv:2412.14683v1 [math.NA] 19 Dec 2024

a Karlsruhe Institute of Technology, Scientific Computing Center

(SCC), Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany


b Aachen Institute for Nuclear Training GmbH, Cockerillstrasse 100

(DLZ), 52222, Stolberg(Rhld.), Germany


c FH Aachen - University of Applied Sciences, Department of Aerospace

Engineering, Hohenstaufenallee 6, 52064, Aachen, Germany


d University of Notre Dame, Department of Aerospace and Mechanical Engineering, 257

Fitzpatrick Hall, 46556, Notre Dame, USA

Abstract
We investigate the numerical solution of multiscale transport equations using
Physics Informed Neural Networks (PINNs) with ReLU activation functions.
Therefore, we study the analogy between PINNs and Least-Squares Finite Ele-
ments (LSFE) which lies in the shared approach to reformulate the PDE solu-
tion as a minimization of a quadratic functional. We prove that in the diffusive
regime, the correct limit is not reached, in agreement with known results for
first-order LSFE. A diffusive scaling is introduced that can be applied to over-
come this, again in full agreement with theoretical results for LSFE. We provide
numerical results in the case of slab geometry that support our theoretical find-
ings.
Keywords: Physics Informed Neural Networks, Diffusive Regime, Multiscale
Transport Equations, Numerical Analysis

1. Introduction

Physics Informed Neural Networks (PINNs) have recently attracted a great


deal of attention. First proposed in [1] in 2019, PINNs provide a method to
solve partial differential equations (PDEs) using (deep) neural networks (NN)
by incorporating the corresponding equation in a loss function in a training pro-
cess. Successively, points in the computational domain of the PDE are chosen
and the weights of the NN are adapted to solve the PDE pointwise, resulting
in an approximation of the solution on the complete computational domain. In

∗ Corresponding author
Email addresses: alexander.jesser@kit.edu (Alexander Jesser), krycki@fh-aachen.de
(Kai Krycki), rmcclarr@nd.edu (Ryan G. McClarren), martin.frank@kit.edu (Martin
Frank)

Preprint submitted to Journal of Computational Physics December 20, 2024


that way, the numerical solution of the PDE corresponds to training of a NN.
First computational packages are available which therefore provide easy to use
black-box solver for PDEs, what makes the method attractive for users.

Although already heavily used in various applications [2, 3, 4, 5, 6], only


a limited number of results regarding the numerical analysis of PINNs exist
so far [7]. For the classes of linear second order elliptic and parabolic PDEs
convergence of the discretized solution to the PDE solution can be shown in a
strong sense [8]. A more detailed numerical analysis of PINNs (convergence,
error bounds, stability etc.) needs to be developed based on these first results
in order to gain a deeper understanding of this undoubtedly powerful and useful
method.
With this paper, we contribute to the numerical analysis of PINNs by study-
ing an analogy between PINNs and the Least Squares Finite Element method
(LSFE). For a specific class of PDEs (neutron transport equations) we study
similarities based on an existing theory for LSFE for numerical solution in the
so-called diffusive regime (cf. [9], [10], [11]). The LSFE is a numerical approach
used to solve partial differential equations and involves reformulating the prob-
lem into a minimization of a least-squares functional. A numerical solution is
obtained by solving for a finite element representation of the corresponding vari-
ational formulation.

The analogy between PINNs and the LSFE lies in their shared approach
of reformulating the solution of differential equations as the minimization of a
(typically quadratic) functional. In LSFE, this involves the construction of a
least-squares functional based on the residual of the differential equation, which,
when minimized, yields the solution to the problem in a variational form. Sim-
ilarly, PINNs define a loss function based on the residuals of the differential
equation and corresponding boundary conditions. Minimizing this loss function
drives the neural network to approximate the solution. Hence, although not
explicitly stated in variational terms, the minimization of the loss function in
PINNs can be viewed as finding an approximate solution that satisfies the dif-
ferential equation in an integral sense across the domain. Using an activation
function such as ReLU for PINNs, the corresponding neural network can be
interpreted as a piecewise linear approximation of the solution comparable to
the finite element representation.

For this reason, we expect both methods to show comparable behavior in


situations where numerically resolving multiscale effects in a PDE becomes dif-
ficult. A prominent and widely studied example is neutron transport in the
so-called diffusive regime, which is briefly introduced in the following.

Neutron Transport in the Diffusive Regime


The transport of neutrons in a general background medium is governed by
the linear Boltzmann Equation [12]. For a time-independent source-term in the

2
case of mono-energetic neutrons and isotropic scattering the steady-state form
reads
Z
σs
Ω · ∇ψ(x, Ω) + σt ψ(x, Ω) = ψ(x, Ω′ ) dΩ′ + Q(x, Ω), (1)
4π 4π
where ψ(x, Ω) is the angular flux of neutrons at position x and direction Ω, Ω·∇
is the transport operator, σt denotes the total and σs the scattering macroscopic
cross-section.
R S(x, Ω)′ is the source term at position x in direction Ω and the
σs ′
integral 4π 4π
ψ(x, Ω ) dΩ represents the isotropic scattering contribution.

To simplify the analysis, it is common to investigate the slab geometry case


[13] (one space and directional dimension), where all relevant effects in the dif-
fusive regime can be shown. Here, the medium is assumed to be infinite in
two dimensions (e.g., the y- and z-directions) and finite in the x-direction. The
medium is therefore bounded by two parallel planes which form a slab through
which the neutrons travel. In the following, we denote the x-coordinates of
the planes that limit the slab by xl for the left plane and xr for the right
plane. The (one-dimensional) spatial coordinate x is used to describe the posi-
tion of a neutron within the slab, the direction of neutron movement becomes
a one-dimensional angle µ = cos(θ), where θ is the angle between the neutron’s
direction of travel and the x-axis. Fig. 1 depicts the slab geometry case.

Figure 1: In slab geometry, transport is projected onto the x-axis. The problem is symmetric
in the y/z-plane.

In the following, we assume an isotropic source, i.e. Q independent of µ. The


steady state, single energy group neutron transport equation in slab geometry
is then formulated as
LΨ(x, µ) = Q(x) (2)
with source term Q and operator
 

LΨ(x, µ) = µ + σt (I − P ) + σa P Ψ(x, µ), (3)
∂x

3
where I denotes the identity operator and P the projection onto the space of
all L2 -functions independent of angle

1 1
Z
(P ψ)(x) = ψ(x, µ)dµ. (4)
2 −1

Boundary conditions can be introduced by a boundary operator B, so that

BΨ(x, µ) = g(x, µ) ∀(x, µ) ∈ Γ− , (5)

with Γ− = {(x, µ) ∈ ∂D × [−1, 1] : n(x) · µ < 0}, where ∂D is the boundary of


the computational domain D and n is the outer normal vector. The boundary
conditions in this form specify the flux distribution entering D over ∂D. We
introduce the boundary conditions in detail later in Section 5.
The multiscale effects in this setting originate from two different scales relevant
to this problem: the mean-free path is small compared to the size of the phys-
ical domain. The numerical instability of PINNs that we discuss in this paper
is a consequence of the different scales and the stiffness they induce. In the
diffusive regime scattering dominates the effects of free transport or absorption
and the mean-free path of particles is small. The diffusive regime can typically
be characterized by a small parameter ε > 0, which can be interpreted as the
ratio of the mean-free path to the physical size of the domain of computation.
Introducing such ε > 0 we scale the cross sections and source term accordingly
[14] as
1
σt → , σa → αε, Q → εQ, (6)
ε
where α is assumed to be O(1). By applying this scaling on Eq. (2) we obtain

Lε Ψε (x, µ) = εQ(x) (7)

where the neutron flux in the diffusion limit is denoted as Ψε and the scaled
operator Lε reads
 
ε ∂ 1
Lε Ψ (x, µ) = µ + (I − P ) + εαP Ψε (x, µ). (8)
∂x ε

It can be shown [15] that in the limit of ε → 0 the solution of the transport
equation Ψε converges to a function ϕ0 , which is independent of µ and the
solution of an associated diffusion equation
∂ 1 ∂ϕ0
− (x) + σa ϕ0 (x) = Q(x). (9)
∂x 3σt ∂x
For numerical methods it is therefore of key importance to reproduce this be-
havior on a discretized level. A method that ensures this is called asymptotic
preserving [16]. For the LSFE method a corresponding analysis has been carried
out (cf. [9], [10], [11]) and we build upon these results for the investigation of

4
PINNs. A related analysis using PINNs for multiscale time dependent linear
transport equations has been carried out in [17]. Using LSFE without addi-
tional scaling (see Section 4), the numerical solution produces an incorrect limit
as ε → 0. We show in this paper that the same happens when PINNs are applied
to solve this class of equations and that the same scaling as derived for LSFE
can be applied to ensure a correct limiting behavior for PINNs, underlining the
close connection between the two numerical methods.

The paper is structured as follows. Sections 2 and 3 introduce PINNs and


LSFE respectively as numerical methods for the solution of PDEs. Here, we
also introduce the notations of the methods that we use in the following. The
main result is stated in Section 4. We prove that numerical solutions of PINNs
in the diffusive regime do not converge to the correct solution, and show that
a diffusive scaling can be applied to overcome this, in full agreement with the
theoretical results for LSFE. Numerical results are presented in Section 5, and
consequences of our results as well as future research directions are discussed in
the final Section 6.

2. Physics Informed Neural Networks

Physics Informed Neural Networks, PINNs for short, are a numerical method
for solving general partial differential equations [1, 2]. A neural network serves
as an interpolation function for the solution of the PDE. It is obtained by
successively selecting points in the computational domain and adjusting the pa-
rameters of the neural network to obtain the desired output, which is commonly
referred to as training the NN. The method is therefore mesh-free and can serve
as a black-box solver for a PDE, since only the form of the equation needs to
be known. The mathematical formulation of the method is briefly presented
below.
In the following, we use a notation similar to [8]. Let ΨM (x, µ) : R2 → R1
be a feed-forward neural network (FNN) with M layers. Then its output is
recursively defined by

Ψ0 (x, µ) = (x, µ) ∈ R2 (10)


m m m−1 m Ml
Ψ (x, µ) = W σ(Ψ (x, µ)) + b ∈R , 2≤m≤M −1
M M M −1 M 1
Ψ (x, µ) = W σ(Ψ (x, µ)) + b ∈R

with the weight matrices W m ∈ RNm ,Nm−1 , the biases bm ∈ RNm and the ac-
tivation function σ. Let ⃗n = {N0 , ..., NM } ∈ NM with θ = {W m , bm }1≤m≤M
be the parameter space of the network. Since a neural network ΨM (x, µ) de-
pends on the architecture ⃗n and its parameters θ, we can denote it as ΨM =
ΨM (x, µ, ⃗n, θ). For a fixed ⃗n, we write ΨM (x, µ, θ).
A numerical solution of a PDE using such a FNN can be obtained by incor-
porating the PDE residual into the loss function during the training process,
thus obtaining a ’physics informed’ neural network. In the following, we assume

5
that a unique classical solution of the PDE exists. The PDE and its boundary
conditions are given by

L[Ψ](x, µ) = Q(x) ∀x ∈ D, B[Ψ](x, µ) = g(x, µ) ∀(x, µ) ∈ Γ− . (11)

Here L denotes a differential operator, Q is a source term and B is a boundary


operator. Hence, the PINN maps the spatial variable x to an approximate
solution of the PDE via ΨM (x, µ) ≈ Ψ(x, µ). The loss function of the neural
network is given by
1 X
Loss[Ψ] = (L[Ψ](x, µ) − Q(x))2 , (12)
|TD |
(x,µ)∈TD

where TD denotes a set of training points within the domain D. This set is
often generated using a quasi-random low-discrepancy sequence like the Sobol
sequence [18]. The training process of the PINN is the equivalent to solving the
PDE in classical methods. It is a minimization of the loss function in the set of
functions V which is defined by

V = {ΨM (·, ·, θ) : R2 → R1 |θ = {W m , bm }1≤m≤M }. (13)

In general, V is not a vector space. The boundary conditions are incorporated


by a restriction of the set of functions so that

Vb = {ΨM (·, ·, θ) : R2 → R1 |θ = {W m , bm }1≤m≤M


where BΨM (x, µ) = g(x, µ) ∀(x, µ) ∈ Γ− }. (14)

Alternatively, the boundary condition can be included in the loss function. We


will describe this approach in Subsec. 5.2. For now, we can write the training
process of the PINN as a minimization of the loss function in the set of functions
Vb :

Ψmin = min Loss[Ψ]. (15)


Ψ∈Vb

Equivalently, we can interpret it as a minimization of the loss function in the


parameter space with gradient-based optimizers such as Adams [19] or L-BFGS
[20]:

Ψmin = min Loss[Ψ](θ). (16)


θ

It is important to note that ΨM (x, µ) using the rectified linear unit (ReLU) as
activation function is piecewise linear. Moreover, it can be shown that Neural
Networks with ReLU activation functions and sufficiently many layers can re-
produce all piecewise linear basis functions from the finite element method [21].

The problem considered in this work depends, in principle, on the architec-


ture of the underlying NN, including the number of layers and neurons. How-
ever, our analysis remains valid regardless of the specific architecture, provided

6
that a feed-forward NN is used, which is sufficiently large to accurately approx-
imate the solution to the PDE. The theoretical analysis focuses on networks
with ReLU activation functions, while the case of hyperbolic tangent activation
functions is also explored in the numerical experiments. In general, using differ-
ent activation functions (or a combination of them) or Non-Feedforward Neural
Networks might lead to different results in the diffusion limit.

3. Least-Squares Finite Elements


The Least-Squares Finite Element (LSFE) method relies on a finite element
discretization of a variational formulation of the underlying PDE. The least-
squares formulation of Eq. (2) (neutron transport in slab geometry) is given
by
Z xr Z 1
min F [Ψ] with F [Ψ] = (LΨ(x, µ) − Q(x, µ))2 dµdx (17)
Ψ∈U xl −1

where the boundary condition (Eq. 5) is incorporated by a restriction of the


function space:
∂u
Ub = {u(x, µ) ∈ L2 (D) : µ ∈ L2 (D) where Bu(x, µ) = 0 ∀(x, µ) ∈ Γ− }
∂x
(18)
For Ψ to be a minimizer of F , it is a necessary condition that the first variation
vanishes for all admissible u ∈ Ub , resulting in the problem: find Ψ ∈ U such
that
a(Ψ, u) = l(u) (19)
for all admissible u ∈ Ub with the bilinear form
Z xr Z 1
a(Ψ, u) := LΨ(x, µ)Lu(x, µ)dµdx (20)
xl −1

and the functional


Z xr Z 1
l(u) = Q(x, µ)Lu(x, µ)dµdx. (21)
xl −1

The system is then discretized by replacing the function space U with a finite
dimensional subspace Uh ⊂ U . Consequently, we obtain a finite dimensional
linear system for the discretized neutron flux Ψh :
a(Ψh , uh ) = l(uh ) ∀uh ∈ Uh . (22)
If {λp } is a basis of Uh , we can represent the solution as an expansion in said
basis:
X
Ψh = cp λp , (23)
p

7
where cp are the expansion coefficients. This allows us to write Eq. (22) as a
linear system

AΨλ = b (24)

where the components of A and l are given by A = a(λp , λq )pq and b = l(λp )p ,
while Ψλ is the vector of expansion coefficients cp .
For the spatial discretization in the variable x first-order Lagrange polynomials
are used as basis functions. The discretization in the angular variable µ is with
the method of moments is described in Subsec. 5.1.
A general introduction to the least-squares finite element methods can be found
in [22]. Details about least-squares finite element solutions of the neutron trans-
port equation in the diffusive regime are discussed in [10].
A key difference between LSFE and PINNs is that, after parametrization, LSFE
requires the solution of a linear problem (Eq. 24), whereas PINNs requires the
solution a nonlinear optimization problem (Eq. 16). This means that for the
PINN we look for weights and biases, and for LSFE we look for basis expansion
coefficients.

4. PINNs in the asymptotic limit

In the diffusive limit ε → 0 as introduced in Section 1 the neutron transport


equation becomes singular and numerical methods often fail to produce valid
approximations in this regime. Fig. 2 shows the behavior of a numerical solution
obtained with a PINN with ReLU activation functions and a first-order LSFE
solution in the asymptotic limit (ε = 10−4 ), compared to a Monte-Carlo (MC)
reference solution. This testcase and its setting will be discussed in detail in
Subsec. 5.3.
It can be seen that the neutron fluxes resulting from both numerical meth-
ods are close to zero and far away from the reference solution. The numerical
analysis of the LSFE method yields an explanation to why this behavior occurs,
and the methodological similarities between LSFE and PINNs suggested a cor-
responding behavior for PINNs.
In the following, we provide a numerical analysis that shows that a PINN with
ReLU activation functions does in general not yield a correct solution in the
diffusion limit.

We can express the solution of the transport equation by an expansion in


the Legendre moments in angle. For n ∈ N, let Pn (µ) be the normed n-th
order Legendre polynomial on the interval [−1, 1].
√ In this section, we use the
normed Legendre polynomials defined as pn (µ) = 2n + 1Pn (µ) to simplify the
coefficients. Let
Z 1
ϕn (x) = pn (µ)ψ(x, µ)dµ (25)
−1

8
Figure 2: Results for PINN with ReLU activation functions and first order LSFE in the
asymptotic limit (ε = 10−4 ) compared to a MC reference solution. The setting is discussed
in detail in Subsec. 5.3.

be the n-th angular moment of ψ. Then the expansion of the neutron flux in
angle is given by

X
Ψ(x, µ) = pn (µ)ϕn (x). (26)
i=0

Furthermore, we note that in the diffusion limit, it is sufficient to take only the
first two moments of the expansion into account [23].
In this situation we consider the set of functions
Vh = {vh ∈ Vb : vh (x, µ) = ϕ0 (x) + µϕ1 (x), where ϕ0 , ϕ1 ∈ P(Th )}, (27)
where P(Th ) denotes the space of piecewise linear functions on the partition
Th of the slab. Here the term partition corresponds to an interval within the
slab, ie.e [x1 , x2 ] ⊂ [xl , xr ]. In the diffusive regime, the training of a PINN
corresponds to the solution of the following minimization problem:
1 X
Ψεmin = min (Lε Ψε (xi , µj ) − εQ(xi , µj ))2 (28)
Ψε ∈Vh |TD |
i,j

In the following, we use the fact that for all ξ > 0 it is possible to find a number
of training points |TD (ξ)| so that the absolute difference between the discrete
sum divided by the number of training points and the integral is smaller than
the parameter ξ:
Z xr Z 1
1 X
ε 2
(Lε Ψ (xi , µj ) − εQ(xi , µj )) − (Lε Ψε (x, µ) − εQ(x, µ))2 dµdx < ξ.
|TD (ξ)| i,j xl −1

(29)

9
This is ensured by the sequence we use to construct the training points. The
training points are generated with a Sobol sequence which is constructed in
a way that the sum divided by the number of (training) points converges to
the integral [18]. Hence, if we replace in the following the term ’minimum’ by
’ξ-suboptimal solution’, e.g. the L2 -norms difference of the minimum and the
ξ-suboptimal solution is smaller than ξ, the results remain valid. For the sake
of readability, in the following, we neglect this sublety and we assume that a
function Ψεmin that minimizes Eq. (30) also minimizes Eq. (28) and use the
term ’minimum’. Hence, instead of Eq. (28) we directly consider the following
continuous minimization problem:
Z xr Z 1
min F [Ψ] with F [Ψ] = (Lε Ψε (x, µ) − εQ(x, µ))2 dµdx. (30)
Ψ∈Vh xl −1

To summarize, in the following analysis Ψεmin denotes the neural network trained
on a Sobol sequence TD being a solution of Eq. (28). This neural network is
an approximate solution of the neutron transport equation and is furthermore
a piecewise linear, hence integrable function. By making use of the property of
ξ-suboptimality we also consider Ψεmin to minimize the continuous functional
Eq. (30).

4.1. Characterization of the minimizer


In the following, we formally expand the solution Ψ in powers of ε and de-
rive relations between the expansion coefficients. We then use these relations to
prove that the PINN method with ReLU activation functions yields an incorrect
solution in the diffusion limit. For our analysis we build on the theory developed
in [24] for LSFE, which we adapt for PINNs.

Lemma 1. Let the Functional F and the set of functions Vh be as given in Eqs.
(30) and (27). Suppose Ψεmin minimizes F restricted to Vh . Suppose further
that ε ≤ 1 and that Ψεmin has an expansion
Ψεmin (x, µ) = ϕε0 (x) + µϕε1 (x) (31)
with

X ∞
X
ϕε0 (x) = εν ην (x) and ϕε1 (x) = εν δν (x), (32)
ν=0 ν=0

where ην and δν are independent of ϵ. Then we have:


δ0 (x) = 0 and η0′ (x) = −δ1 (x). (33)
Proof. By inserting Eq. (31) into Eq. (8) and using that P is a projection
onto the zeroth moment (P Ψ = ϕ0 ), while I − P projects onto moment one and
all higher order moments, we obtain
∂ϕε0 ∂ϕε 1
Lε Ψεmin = µ + µ2 1 + µϕε1 + εαϕε0 (34)
∂x ∂x ε

10
Inserting Eq. (32) and sorting the terms in the order of ε yields
1
Lε Ψεmin = [µδ0 ] + [µη0′ + µ2 δ0′ + µδ1 ] + O(ε). (35)
ε
By inserting Eq. (35) into Eq. (30), we obtain a representation of F as a power
series

X
F (Ψεmin ) = εν Fν (Ψh ) (36)
ν=−2

with
Z xr Z 1
F−2 (Ψεmin ) = µ2 δ02 (x)dµdx
xl −1
Z xr
2
= δ02 (x)dx (37)
3 xl

and
Z xr Z 1
F−1 (Ψεmin ) = 2 µ2 η0′ (x)δ0 (x) + µ3 δ0 δ0′ + µ2 δ0 (x)δ1 (x)dµdx
x −1
Z lxr
4
= η0′ (x)δ0 (x) + δ0 (x)δ1 (x)dx. (38)
3 xl

Since 0 ∈ Vh and Ψεmin minimizes F , we obtain for |ε| ≤ 1


Z xr Z xr
F (Ψεmin ) ≤ F (0) = ε2 q(x)2 dx ≤ q(x)2 dx (39)
xl xl

Therefore, we must have F−2 (Ψεmin ) = 0 and F−1 (Ψεmin ) = 0, since otherwise
F (Ψ) diverges for ε → 0, which would contradict Eq. (39). We conclude that

δ0 (x) = 0 (40)

Using Eq. (40), we can restrict the set of functions to



X ∞
X
Wh = {wh ∈ V : wh (z, µ) = εν ην (x) + εν δν (x),
ν=0 ν=1
wh (xl ) = 0 for µ < 0, wh (xr ) = 0 for µ > 0}. (41)

A necessary condition for the minimum is that the derivative of F with


respect to Ψ is zero:
Z xr Z 1
(Lε Ψεmin (x, µ) − εQ(x, µ))Lε Ψεmin (x, µ)dµdx = 0. (42)
xl −1

11
We have
∂ϕε0 ∂ϕε 1
Lε Ψεmin = µ + µ2 1 + µϕε1 + εαϕε0
∂x ∂x ε
= [µη0′ + µδ1 ] + ε[µη1′ + µ2 δ1′ + µδ2 + αη0 ] + O(ε2 ) (43)

and by inserting Eq. (43) into Eq. (42) we obtain


Z xr Z 1
µ2 (η0′ + δ1 )2 dµdx + εI1 + O(ε2 ) = 0 (44)
xl −1

with
Z xr Z 1
I1 = 2 µ[αη0 η0′ + αδ1 η0 ] + µ2 [η0′ η1′ + η0′ δ2 + δ1 η1′ + δ1 δ2 ] + µ3 [δ1′ η0′ + δ1 δ1′ ]dµdx.
xl −1
Z xr Z 1
= 2µ2 [η0′ η1′ + η0′ δ2 + δ1 η1′ + δ1 δ2 ]dµdx. (45)
xl −1

In Eq. (44) O(1) and O(ε) terms show up only on the left-hand side. Therefore,
both terms need to vanish, what is implied by the relation

η0′ = −δ1 . (46)

This shows that only terms of order ε2 remain, what ends the proof. □

Using Lemma 1, we can prove the main theorem on PINN solutions in the
diffusion limit.

Theorem 1. Let the functional F and the set of functions Vh be as given in


Eqs. (30) and (27). Suppose that Ψεmin minimizes F restricted to Vh . Suppose
further that ε ≤ 1 and that Ψεmin has an expansion as defined in Eqs. (31) and
(32).
Then we have:

Ψεmin → 0 as ε → 0 pointwise, for all x ∈ D. (47)

Proof. If both ην (x) and δν (x) are continuous piecewise linear functions, it
follows from Eq. (46) that η0 is a linear function. With the boundary conditions
we obtain η0 = 0. Therefore, we have Ψεmin → 0 as ε → 0.. Since all η, δ are
continuous, the convergence holds pointwise, for all x ∈ D. □

As a consequence, PINNs using feed forward neural networks with ReLU


activation functions do not give a correct approximation of the neutron flux ϕ
in the diffusion limit, except for Q = 0.

12
4.2. Scaling for the numerical solution
As shown in the main theorem, directly solving the neutron transport equa-
tion using a PINN with ReLU activation functions leads to a method that does
not preserve the diffusion limit. Solving the neutron transport equation using
first order least-squares finite elements leads to similar results ([10], [25]).
In [10], a scaling of the equations that solves this issue is proposed. In the
following, we also use this scaling for PINNs.
Recall that
1 1
Z
P = dµ, (48)
2 −1
Using this operator, which is a projection onto the first Legendre moment, we
define the scaling operator
S = P + τ (I − P ). (49)
By applying S on both sides of Eq. (7), we obtain
∂Ψ τ
Sµ + (I − P )Ψ + εαP Ψ = εQ. (50)
∂x ε
We use the same expansion for Ψ as in Eq. 31) to project on the first two
Legendre moments. The systems of scaled equations then reads
1 ∂ϕε1
+ αεϕε0 = εQ
3 ∂x
∂ϕε
τ 0 + τ ϕε1 = 0 (51)
∂x
In the following, we are only interested in the behavior of the leading orders, so
we take to terms in Eqs. (32) only up to order O(ε) in account. In addition, we
use that δ0 = 0 according to Eq. (33). Therefore, we obtain
ϕ0 = η0 + εη1
ϕ1 = εδ1 . (52)
By inserting Eqs. (52) into Eqs. (51) and neglecting higher order terms, we
obtain
1 ∂δ1
ε + εαη0 = εQ
3 ∂x
∂η0
τ + τ δ1 = 0. (53)
∂x
The case τ = 1 corresponds to the unscaled equation, where the first equation
is O(ε) and the second equation O(1), so that Eqs. (53) are unbalanced for
ϵ → 0. Choosing τ = O(ε) results in a balancing of the terms in orders of ε.
This is a common strategy to improve the convergence behavior. The numerical
results in the following section demonstrate
p that√the scaling leads to the correct
diffusion limit. Here we use τ = σa /σt = αϵ, since this scaling directly
relates to the physical parameters.

13
5. Numerical Results

In this section we present numerical results which illustrate the theory de-
veloped in Sec. 4. It is structured as follows: in Subsec. 5.1 we introduce the
PN method which was used for the angular discretization. In Subsec. 5.2 the
implementation is described. In Subsec. 5.3 and Subsec. 5.4 we present numer-
ical results for PINNs with ReLU activation function and first order LSFE to
demonstrate how the scaling leads to a huge improvement in accuracy for both
methods. In Subsec. 5.5 we demonstrate that the scaling can also improve the
solution for activation functions other than ReLU.

5.1. Angular Discretization


For the angular discretization of Eq. (8), we use the method of moments,
which is a spectral Galerkin method in µ [26]. Let ϕn (x) be the n-th angu-
lar moment of ψ as defined in Eq. (25). We then take the first N Legendre
polynomial moments of Eq. (2). Using the recursion relations for Legendre
polynomials, we obtain the slab geometry PN equations:
∂ϕ1
+ σa ϕ0 = Q (54)
∂x
n ∂ϕn−1 n + 1 ∂ϕn+1
+ + σn = 0 (n > 0). (55)
2n + 1 ∂x 2n + 1 ∂x
We close the equations by setting

ϕn = 0, n > N. (56)

Note while we used the normed Legendre polynomials in the previous section
to simplify the equations, we do not norm the Legendre polynomials here since
otherwise we would not obtain the standard formulation of the PN equations in
this case. In the following we use vacuum boundary conditions, which assume
no incoming flux, and reflective boundary conditions, which assume that all out-
going particles are reflected back into the domain. For an N th order expansion
where N is odd, there are (N + 1)/2 vacuum boundary conditions that read
N Z ±1
X 2n + 1
0= ϕn (x) P2m−1 (µ)Pn (µ)dµ for m = 1, 2, ..., (N + 1)/2 (57)
n=0
2 0

where Pn (µ) is the n-th order Legendre polynomial on the interval [−1, 1]. Re-
flective boundary conditions are obtained by setting the odd moments to zero
at the boundary. More details about the boundary conditions for slab geometry
PN equations can be found in [27]. It is known, that in the limit ε → 0 a solution
of Equation 8 converges to the solution of a corresponding diffusion equation,
which is independent of the angular variable µ. Therefore, in practice a small
N (N = 1 or N = 3) is sufficient as an approximate model.

14
5.2. Implementation
The PINN solver used for the numerical computations was implemented
using the python library DeepXDE [28] with tensorflow [29] as a backend. The
boundary conditions were included in the loss function as an implementation of
the restriction of the underlying set of functions. In order to do this, the term
(N +1)/2
w∂D X X
Loss∂D [Ψ] = (Bm [Ψ](x) − g(x))2 , (58)
|T∂D | m=1 x∈T∂D

where w∂D is a weight, T∂D a set of training points on the boundary and Bm
the operator representing the mth boundary condition as defined in Subsec. 5.1,
was added to the loss function (Eq. 12).
We implemented a LSFE solver using the FEniCS library [30, 31]. We use
PETSc [32] as linear algebra backend in conjunction with hypre [33], a library
of high performance preconditioners. For this work we use a combination of a
gmres solver [34] and an algebraic multigrid preconditioner [35]. The boundary
condition was implemented by adding the term
Z xr Z
ab (Ψ, u) := BΨ(x, µ)Bu(x, µ)dµds (59)
xl ∂D

to the least-squares bilinear form (Eq. 20).


The errors in this section are computed as follows: we define G points that
are equidistantly distributed over the computational domain and compute the
squared difference between the PINN or LSFE neutron flux and the reference
solution divided by the squared reference solution. The relative error is then
given by
G
X (ϕ0,P IN N/LSF E (xg ) − ϕ0,ref (xg ))2
ξrel = . (60)
g=0
(ϕ0,ref (xg ))2

We use the zeroth flux moment since it is identical to the angular-integrated


total flux. Since PINNs may converge to different solutions from different initial
values [36], we train PINNs from random initialization three times. The PINN
results shown in this section are to be understood as the average of the three
networks.

5.3. Asymptotic Test of the Diffusion Limit


The neutron transport equation in the diffusion limit is given by
 
∂ 1
µ + (I − P ) + εαP Ψ(x, µ) = εQ(x, µ). (61)
∂x ε

With the source term

Q(x, µ) = 1 + αϕ0 (x) (62)

15
and vacuum boundary conditions, the neutron flux Ψ(x, µ) tends asymptotically
to the solution of the diffusion equation:
3
ϕ0 (x) = − x2 + 15x. (63)
2
In the following, we investigate the asymptotic behavior of the PINN and LSFE
solutions in the diffusion limit. We choose α = 10−2 . It should be noted that
for ε = 10−2 , this choice leads to a problem described in [25].
For the PINN computations, we use feed forward neural networks, each with 5
hidden layers and 50 nodes per layer. An Adams optimizer with learning rate
lr = 2.5 · 10−4 is used for the training process. A total of 300 training points is
used. For the LSFE computations, we use a mesh with 20 first order Lagrange
Finite elements.
The results for three different values of ε are depicted in Fig. 3, while the rela-
tive errors can be found in Tab. 1.
For ε = 10−2 , all errors are below 1%. The scaling reduces the already small
error even further for both PINN and LSFE. For ε = 10−3 , the unscaled solu-
tions deviate significantly from the reference. While the error for the PINNs is
36.8%, the LSFE error is 18.9%. The scaling however reduces the error for both
methods massively, the scaled PINN error is only 0.3%, while the scaled LSFE
error is 0.5%. For ε = 10−4 , the unscaled solutions are close to zero. This is
expected, since the unscaled solution is supposed to converge to zero as ε → 0
according to the analysis in Sec. 4. The scaled solutions however have still a
small error, 0.6% for the PINN and 1.8% for the LSFE.
This results are in accordance with the theoretical predictions in Sec. 4. As
expected, for smaller values of ε the unscaled versions of PINN and LSFE con-
verge against zero. The scaled solution as also always better than the unscaled
solution. The results for both methods are similar, however the exact numerical
values deviate, as one would expect for two methods with vast differences in the
implementation.

Table 1: Errors in Fig. 3

ε 10−2 10−3 10−4


PINN unscaled 0.6% 36.8% 100.0%
PINN scaled 0.1% 0.3% 0.6%
LSFE unscaled 0.7% 18.9% 95.8%
LSFE scaled 0.3% 0.5% 1.8%

5.4. Diffusive Test with an Interface


In this subsection we investigate a problem with an internal interface. The
left side of the slab consists of a pure absorber (σa = σt = 2, x ∈ (0, 2)),
while the right side is a strong scatterer (σt = 100, σa = 10−4 , x ∈ (2, 10))
with only a very small absorption cross section. A constant source Q = 1 is
applied on the left side of the slab, i.e. for x ∈ (0, 2). On the left boundary,

16
(a) ε = 10−2

(b) ε = 10−3

17
(c) ε = 10−4

Figure 3: Results for PINN and LSFE in an asymptotic test of the diffusion limit for three
different values of ε.
reflective boundary conditions are imposed, while on the right boundary vacuum
boundary conditions are used.
For the PINN computations, we use the same architecture and learning rate as
in Subsec. 5.3. For this testcase, 450 training points within the domain are
used. For the LSFE computations we use a mesh with 50 first order Lagrange
Finite elements.
The reference solution was computed with the OpenMC Monte Carlo Code [37].
It should be noted that OpenMC does not approximate the neutron angle, but
includes the full angular dependence, e.g. all Legendre moments. Since the
higher order moments are small in the diffusive regime, we do not expect large
deviations, but we still use the P3 -approximation for the angular discretization
to minimize potential errors.
Fig. 4 depicts the results for the test with an internal interface. In both cases,
the scaling leads to a huge improvement of the solution. While the unscaled
PINN solution deviates from the reference by 10.6%, the scaled PINN solution
deviates by only 1.4%. The LSFE are similar. While the error for the unscaled
LSFE is 9.9%, the error of the scaled solution is only 0.6%. We again see the
similarity of the PINN und LSFE solutions.

Figure 4: Results for PINN and LSFE in a diffusive test with an internal interface. In both
cases, the scaling leads to a huge improvement of the solution.

5.5. Diffusive Test with Hyperbolic Tangent Activation Functions


While the analysis in Subsec. 4.1 shows that PINNs with ReLU activation
functions lead to a method that does not preserve the diffusion limit, the scaling
introduced in Subsec. 4.2 to correct this does not assume the usage of a specific
activation function. Therefore, it is close at hand to investigate the effect of
the scaling on PINN with a different activation function. In this subsection, we
investigate the problem with the internal interface introduced in Subsec. 5.4 for
PINN with hyperbolic tangent activation functions. We use the same number
of training points as before. As an optimizer L-BFGS is used since it showed

18
better convergence properties for this testcase than the Adams optimizer. Fig.
5 depicts the results. The unscaled solution deviates form the reference by
23.4% and is therefore worse than the unscaled solution obtained with ReLU
activation functions. The error of the scaled solution is only 0.5% and therefore
smaller than in the case with ReLU activation functions. This shows that the
proposed scaling leads to a vast improvement of the solution not only for ReLU
activation functions, but also hyperbolic tangent.

Figure 5: Results for PINN with hyperbolic tangent activation functions for the diffusive test
with an internal interface discussed in Subsec. 5.4. The scaling leads to a huge improvement
of the solution.

6. Discussion
In this paper, we investigated the numerical stability of PINNs for multiscale
transport problems. As an exemplary problem we studied the example of the
neutron transport in the so-called diffusive regime. We used an analogy be-
tween PINNs and LSFE that lies in their shared approach of reformulating the

19
solution of differential equations as the minimization of a (typically quadratic)
functional. By making use of this analogy we were able to build on a theory for
LFSE and adapt it for PINNs. It was shown that PINNs with ReLU activation
functions yield incorrect approximate solutions in the diffusion limit. It was
also demonstrated that a scaling can be applied to PINNs that leads to the cor-
rect diffusion limit. These theoretical findings were underlined with numerical
results.

A formal proof for the convergence of the scaled solution remains a goal for
future research. For LSFE, a corresponding result exists. The proof uses the
fact that the quadratic functional needs to be minimized for all admissible test
functions. Then, by choosing specific test functions, in addition to the rela-
tions in [Proposition 1] it can be shown, that the leading order in the formal
expansion of ϕε0 satisfies a variational form of the diffusion equation. Details
can be found in [24]. Since in the PINN context no direct analogon for the test
functions exists, this cannot be mimicked with PINNs. Therefore, a different
way to formally prove the convergence of the scaled PINN solution needs to be
found.
Here, we see the possibility to build on further results of the existing theory
for LSFE and adapt it for PINNs, too. For instance, there are cases known for
LSFE were scalings other than the one we applied to PINNs are used. Taking
additional parameters, such as the medium optical thickness or the local cell
size, into account, these scalings provide a further improved accurracy. In [38],
as an example, such a scaling is introduced which is especially useful for opti-
cally thin materials.

We showed that unscaled PINNs with ReLU activation functions do not


converge to the correct diffusion limit. First order LSFE exhibit the same
behavior. For LSFE, the already existing theory shows that using higher order
finite elements instead of a scaling can be sufficient to migitate this issue, even
though applying the scaling still improves the results numerically [24]. It is not
apparent whether a PINN analogy to higher order finite elements exists and how
it would look like. However, it is clear that simply choosing other activation
functions is not sufficient, as our testcase with the hyperbolic tangent shows.

Acknowledgments
The work of A. J. was performed within the project RAPID, funded by
the German Federal Ministry of Education and Research (BMBF) under the
grant number 033RK094B. The work of K.K. was performed in parts within
the project RAPID, funded by the German Federal Ministry of Education and
Research (BMBF) under grant number 033RK094A. R.G.M. was supported by
the International Atomic Energy Agency under contract No. USA-26472 ”Uni-
versity of Notre Dame Contribution to the AI for Fusion Coordinated Research
Project”.
The authors are solely responsible for the content of this publication.

20
References

[1] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks:


A deep learning framework for solving forward and inverse problems in-
volving nonlinear partial differential equations, Journal of Computational
Physics 378, 1 (2019) 686–707. doi:https://doi.org/10.1016/j.jcp.
2018.10.045.
[2] G. Karniadakis, I. Kevrekidis, L. L. et al., Physics-informed machine learn-
ing, Nature Review Physics 3 (2021) 422––440. doi:https://doi.org/10.
1038/s42254-021-00314-5.
[3] S. Cai, Z. Wang, S. Wang, P. Perdikaris, G. Karniadakis, Physics-informed
neural networks for heat transfer problems, J. Heat Transfer 143 (6) (2021).
doi:https://doi.org/10.1115/1.4050542.
[4] Cai, Shengze and Mao, Zhiping and Wang, Zhicheng and Yin, Minglang and
Karniadakis, George, Physics-informed neural networks (pinns) for fluid
mechanics: a review, Acta Mechanica Sinica 37 (12) (2021). doi:10.1007/
s10409-021-01148-1.
[5] S. Cuomo, V. S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, F. Pic-
cialli, Scientific machine learning through physics–informed neural net-
works: Where we are and what’s next, Journal of Scientific Computing
88 (3) (2022). doi:10.1007/s10915-022-01939-z.
[6] R. M. S. Mishra, Physics informed neural networks for simulating radiative
transfer, Journal of Quantitative Spectroscopy and Radiative Transfer 270
(2021) 107705. doi:https://doi.org/10.1016/j.jqsrt.2021.107705.

[7] T. De Ryck, S. Mishra, Numerical analysis of physics-informed neural net-


works and related models in physics-informed machine learning, Acta Nu-
merica 33 (2024) 633–713. doi:10.1017/S0962492923000089.
[8] Y. Shin, J. Darbon, G. Karniadakis, On the convergence of physics informed
neural networks for linear second-order elliptic and parabolic type pdes,
Communications in Computational Physics 28 (5) (2020) 2042–2074. doi:
https://doi.org/10.4208/cicp.OA-2020-0193.
[9] T. Manteuffel, K. Ressel, Multilevel methods for transport equations in
the diffusive regime, NASA Langley Research Center, The Sixth Copper
Mountain Conference on Multigrid Methods, Part 2 (1993).

[10] T. Manteuffel, K. Ressel, Least-squares finite element solution of the


neutron transport equation in diffusive regimes, SIAM Journal on
Numerical Analysis 35 (2) (1998). doi:https://doi.org/10.1137/
S0036142996299708.

21
[11] T. Manteuffel, K. Ressel, G. Starke, A Boundary Functional for the Least-
Squares Finite- Element Solution of Neutron Transport Problems, SIAM
Journal on Numerical Analysis 37 (2) (2000) 556–586. doi:https://doi.
org/10.1137/S0036142998344706.
[12] C. Cercignani, The Boltzmann Equation and Its Applications, Springer,
1988.
[13] E. Lewis, W. Miller, Computational Methods of Neutron Transport, Wiley,
1984.
[14] E. W. Larsen, The asymptotic diffusion limit of discretized transport
problems, Nuclear Science and Engineering 112 (4) (1992) 336—-346.
doi:https://doi.org/10.13182/NSE92-A23982.
[15] E. W. Larsen, Diffusion theory as an asymptotic limit of transport theory
for nearly critical systems with small mean free path, Ann. Nuclear En-
ergy 7 (1980) 249–255. doi:https://doi.org/10.1016/0306-4549(80)
90072-9.
[16] J. Hu, S. Jin, Q. Li, Chapter 5 - asymptotic-preserving schemes for multi-
scale hyperbolic and kinetic equations, in: R. Abgrall, C.-W. Shu (Eds.),
Handbook of Numerical Methods for Hyperbolic Problems, Vol. 18 of Hand-
book of Numerical Analysis, Elsevier, 2017, pp. 103–129. doi:https:
//doi.org/10.1016/bs.hna.2016.09.001.

[17] S. Jin, Z. Ma, K. Wu, Asymptotic-preserving neural networks for multiscale


time-dependent linear transport equations, Journal of Scientific Computing
94 (01 2023). doi:10.1007/s10915-023-02100-0.
[18] I. Sobol, The distribution of points in a cube and the accurate evaluation
of integrals, Zh. Vychisl. Mat. i Mat. Phys. 7 (1967) 784–802.
[19] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in:
International Conference for Learning Representations, 2015.
[20] R. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound
constrained optimization, SIAM Journal on Scientific Computing 16 (1995)
1190–1208. doi:10.1137/0916069.
[21] J. He, L. Li, J. Xu, C. Zheng, Relu deep neural networks and linear finite
elements, Journal of Computational Mathematics 38 (3) (2020) 502–527.
doi:https://doi.org/10.4208/jcm.1901-m2018-0160.

[22] P. Bochev, M. Gunzburger, Least-Squares Finite Element Methods, Ap-


plied Mathematical Sciences, Springer New York, 2009.
[23] K. Case, P. Zweifel, Linear Transport Theory, Addison-Wesley Publishing
Company, 1967.

22
[24] K. J. Ressel, Least-squares finite-element solution of the neutron transport
equation in diffusive regimes, Phd thesis, University of Colorado, Denver
(1994).
[25] E. Varin, G. Samba, Spherical harmonics finite element transport equation
solution using a least-squares approach, Nuclear Science and Engineering
151 (2) (2005) 167–183. doi:10.13182/NSE05-A2538.
[26] J. S. Hesthaven, S. Gottlieb, D. Gottlieb, Spectral Methods for Time-
Dependent Problems, Cambridge Monographs on Applied and Computa-
tional Mathematics, Cambridge University Press, 2007.

[27] S. P. Hamilton, T. M. Evans, Efficient solution of the simplified pn equa-


tions, Journal of Computational Physics 284 (2015) 155–170. doi:https:
//doi.org/10.1016/j.jcp.2014.12.014.
[28] L. Lu, X. Meng, Z. Mao, G. E. Karniadakis, DeepXDE: A deep learning
library for solving differential equations, SIAM Review 63 (1) (2021) 208–
228. doi:10.1137/19M1274067.
[29] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Cor-
rado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp,
G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-
enberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster,
J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke,
V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,
Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heteroge-
neous systems, software available from tensorflow.org (2015).
URL https://www.tensorflow.org/
[30] M. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg,
C. Richardson, J. Ring, M. Rognes, G. Wells, The FEniCS Project Version
1.5, Archive of Numerical Software 3 (100) (2015) 9–23.
[31] A. Logg, K. Mardal, G. Wells, Automated Solution of Differential Equa-
tions by the Finite Element Method, Springer, 2012.

[32] S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman,


L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik,
M. Knepley, D. May, L. C. McInnes, R. Mills, T. Munson, K. Rupp,
P. Sanan, B. Smith, S. Zampini, H. Zhang, H. Zhang, PETSc Users Man-
ual, Tech. Rep. ANL-95/11 - Revision 3.14, Argonne National Laboratory
(2020).

[33] R. Falgout, U. Meier Yang, hypre: A library of high performance precondi-


tioners, in: International Conference on Computational Science, Springer,
Amsterdam, The Netherlands, April 21—24, 2002.

23
[34] Y. Saad, M. Schultz, GMRES: a generalized minimal residual algorithm
for solving nonsymmetric linear systems, Siam Journal on Scientific and
Statistical Computing 7 (1986) 856–869.
[35] J. Ruge, K. Stüben, Algebraic multigrid, in: S. McCormick (Ed.), Multigrid
methods, volume 3 of Frontiers in Applied Mathematics, SIAM, 1987, pp.
73–130.
[36] G. Pang, L. Lu, G. E. Karniadakis, fpinns: Fractional physics-informed
neural networks, SIAM Journal on Scientific Computing 41 (4) (2019)
A2603–A2626. doi:10.1137/18M1229845.

[37] P. Romano, N. Horelik, B. Herman, A. Nelson, B. Forget, K. Smith,


Openmc: A state-of-the-art monte carlo code for research and develop-
ment, Annals of Nuclear Energy 82 (2015) 90–97.
[38] W. Zheng, R. G. McClarren, Accurate least-squares pn scaling based on
problem optical thickness for solving neutron transport problems, Progress
in Nuclear Energy 101 (2017) 394–400. doi:https://doi.org/10.1016/
j.pnucene.2017.06.001.

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy