A Jacobian-Free Newton-Krylov Method For Time-Implicit Multidimensional Hydrodynamics

A&A 586, A153 (2016)
DOI: 10.1051/0004-6361/201527339
Astronomy
&
c ESO 2016
Astrophysics
A Jacobian-free Newton-Krylov method for time-implicit

multidimensional hydrodynamics
Physics-based preconditioning for sound waves and thermal diffusion
M. Viallet1 , T. Goffrey2 , I. Baraffe2,3,1 , D. Folini3 , C. Geroux2 , M. V. Popov3 , J. Pratt2 , and R. Walder3
1
Max-Planck Institut fr Astrophysik, Karl Schwarzschild Strasse 1, 85741 Garching, Germany
e-mail: mviallet@mpa-garching.mpg.de
2
College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, EX4 4QL, UK
3
cole Normale Suprieure, Lyon, CRAL, UMR CNRS 5574, Universit de Lyon, Lyon France
Received 10 September 2015 / Accepted 7 December 2015
ABSTRACT
This work is a continuation of our efforts to develop an efficient implicit solver for multidimensional hydrodynamics for the purpose
of studying important physical processes in stellar interiors, such as turbulent convection and overshooting. We present an implicit
solver that results from the combination of a Jacobian-free Newton-Krylov method and a preconditioning technique tailored to the
inviscid, compressible equations of stellar hydrodynamics. We assess the accuracy and performance of the solver for both 2D and
3D problems for Mach numbers down to 106 . Although our applications concern flows in stellar interiors, the method can be applied
to general advection and/or diffusion-dominated flows. The method presented in this paper opens up new avenues in 3D modeling of
realistic stellar interiors allowing the study of important problems in stellar structure and evolution.
Key words. hydrodynamics methods: numerical stars: interiors
1. Introduction cost of a restricted range of applications due to the underly-

ing approximations. Ideally, one seeks a way to efficiently solve
The transport of heat, chemical species, and angular momen- the hydrodynamical equations regardless of the wide range of
tum in stellar interiors is governed by three-dimensional, non- physical conditions characterizing stellar interiors (e.g., density
linear (magneto-)hydrodynamical processes that develop over a stratification and a wide range of Mach numbers).
wide range of temporal and spatial scales. The study of these
processes with numerical simulations is a powerful way to im- The MUlti-dimensional Stellar Implicit Code (MUSIC) fol-
prove our understanding of stellar structure and stellar evolution. lows a different approach by solving implicitly the fully com-
Unfortunately, the integration of the compressible hydrodynami- pressible hydrodynamical equations (Viallet et al. 2011, 2013).
cal equations with time explicit methods comes with a constraint The challenge for an implicit solver lies in the necessity of
on the time step resulting from the propagation of sound waves. solving a large nonlinear system at each time step. In Viallet
This is the well-known Courant-Friedrich-Lewy (CFL) stability et al. (2013), the best performance was obtained with Newton-
condition. We define CFLhydro as the ratio between the time step Krylov methods, which combine the Newton-Raphson method
and the largest explicit time step allowed by the CFL condition: with an iterative linear solver. It was shown that the iterative
solver requires preconditioning in order to achieve fast con-
(|u| + cs ) t vergence for large CFLhydro . In fact, within the framework of
CFLhydro = max , (1) Newton-Krylov methods, the preconditioner is the crucial ingre-
x dient of the implicit solver. One of the important performance
where t is the time step, x the mesh spacing, cs the adia- bottlenecks that was identified by the authors, particularly when
batic sound speed, and u the flow velocity. Time-explicit meth- considering three-dimensional calculations, is the inefficiency
ods require CFLhydro . 1. This results in values of the time of black-box algebraic preconditioning techniques such as in-
step that are smaller than the typical time scale of the relevant complete LU (ILU) factorizations for large CFL number com-
processes (e.g., the convective turnover time scale), making this putations. Furthermore, the memory requirement for the storage
approach computationally demanding. Nevertheless, an explicit of the Jacobian matrix and the ILU factorization increases sig-
time-integration method remains the method of choice for mul- nificantly in 3D, restricting the range of problems that can be
tidimensional hydrodynamics in the astrophysical community addressed with such a method.
(see, e.g., Bazn et al. 2003; Meakin & Arnett 2007; Mock In this paper, we present an implicit solver which aims at
et al. 2011; Herwig et al. 2014). A way to overcome this limita- overcoming these limitations. This is achieved by combining a
tion is to rely on sound-proof models, which filter sound waves. Jacobian-free Newton-Krylov (JFNK) method with a precondi-
Popular sound-proof models are the Boussinesq, the anelastic, or tioner that is tailored for our physical equations, as described
the pseudo-incompressible models (see, e.g., Glatzmaier 2013, in more detail in Sect. 2. In Sect. 3, we design semi-implicit
for a review). The use of such models, however, comes at the schemes that treat sound waves and thermal diffusion implicitly;
Article published by EDP Sciences A153, page 1 of 17

A&A 586, A153 (2016)
in Sect. 4, we show how these semi-implicit schemes can be uti- We define the nonlinear residual function as
lized to form an efficient preconditioner for the Newton-Krylov U(X) U(X n ) 1
FU (X) = RU (X) + RU (X n ) ,

method. We present in Sect. 5 results that illustrate the perfor- (9)
t 2
mance of the solver for idealized test problems and for stellar
interiors. We conclude in Sect. 6. so that
FU (X n+1 ) = 0 (10)
2. Numerical description defines the solution at time step n + 1. Equation (10) is solved
with a Newton-Raphson method. At each Newton iteration, a
MUSIC solves the equations describing the evolution of den- linear problem of the form
sity, momentum, and internal energy, taking external gravity and
thermal diffusion into account: JX = FU (X) (11)
t = (u), (2) must be solved, where J = FU /X is the Jacobian matrix.

The use of a Krylov iterative method like GMRES (Saad &
t (e) = (eu) p u + (T ), (3) Schultz 1986) is a standard practice for solving Eq. (11) when
t (u) = (u u) p + g, (4) the matrix is large. However, Viallet et al. (2013) find that, for
CFLhydro & 10, the iterative method requires preconditioning to
where is the density, e the specific internal energy, u the veloc- remain effective. ILU factorizations can perform an adequate job
ity, p the gas pressure, T the temperature, g the gravitational ac- at modest CFL numbers (CFLhydro . 100), but becomes ineffi-
celeration, and the thermal conductivity. The system of equa- cient at larger values of the CFL number. Furthermore, we find
tions is closed with an equation of state (EOS). For stellar in- that such a preconditioning technique significantly increases the
teriors, these equations describe radiation-hydrodynamics in the memory requirements.
diffusion limit. This is appropriate when the plasma is optically In this work, we adopt an approach in which the Jacobian
thick. In this case, the thermal conductivity due to photons is matrix is never explicitly formed. Jacobian-free Newton-Krylov
given by methods are a popular choice for the resolution of large nonlin-
ear system of equations, see Knoll & Keyes (2004) for a review.
16T 3
= , (5) Since we do not form the Jacobian matrix, algebraic precondi-
3 tioning techniques, such as ILU factorizations, have to be aban-
doned. Preconditioning the JFNK method, particularly when the
where is the Rosseland mean opacity, and the Stefan-
CFL number is large, remains important for performance. One of
Boltzmann constant. Furthermore, the EOS includes the con-
the main goals of this paper is the design of an efficient precon-
tribution of radiation to the internal energy and pressure.
ditioner adapted specifically to the physics of stellar interiors.
Optionally, MUSIC can solve the total energy equation in place
From a physical point of view, at large CFL numbers
of the internal energy equation:
(CFLhydro & 100) waves propagate over a large portion, if
t (t ) = (t u + pu) + u g + (T ), (6) not all of the computational domain during a single time step.
Effectively, it is as if information propagates at an infinite speed,
where t = e + u2 /2 is the specific total energy. as in parabolic problems. This changes the mathematical nature
We follow the method of lines and perform the spatial dis- of the problem, i.e., from hyperbolic to parabolic, resulting in
cretization independently of the time discretization (see, e.g., numerical stiffness. To be efficient, the numerical method has
LeVeque 2007). The spatial discretization is performed using to take this property into consideration. Multigrid methods at-
a finite volume method with staggered velocity components lo- tempt to exploit this property by exchanging information be-
cated at cell interfaces. The numerical fluxes are calculated using tween the large and small scales, see for example Kifonidis &
an upwind, monotonicity preserving method of van Leer (1974). Mller (2012). We adopt another approach which consists of us-
The resulting scheme is second-order in space and total variation ing legacy methods, known as semi-implicit (SI) schemes, as
diminishing. preconditioners for the Krylov solver. This strategy is known
The conserved variables, for which the conservation as physics-based preconditioning (PBP), as the preconditioner
Eqs. (2)(4) and (6) are solved, are represented as the column is tailored for the physical problem, see, e.g., Mousseau et al.
vector U = (, e u) when solving the internal energy equa- (2000), Knoll & Keyes (2004), Reisner et al. (2005), Park et al.
tion, or U = (, t , u) when solving the total energy equation. (2009). SI schemes treat implicitly only the terms that are re-
In MUSIC, the unknowns are different from the conserved vari- sponsible for numerical stiffness. The scheme derived this way
ables U, and are represented as the column vector X = (, e, u) is numerically stable for CFLhydro greater than one, as the stiff
when solving the internal energy equation or X = (, t , u) when physics is treated implicitly. However, the accuracy of the so-
solving the total energy equation. lution obtained by the SI scheme is usually quite poor due to
The spatial discretization yields a system of ordinary the approximations involved in the derivation of the scheme
differential equations: (see Sect. 3). Good accuracy can be achieved by embedding
such a scheme within a Newton-Krylov method as a precondi-
dU tioner. This work closely follows Park et al. (2009), adapting
= RU (X), (7) their method to our numerical scheme and physical equations.
dt
where RU contains the flux differencing and source terms.
The time discretization is carried out using the second-order 3. Semi-implicit schemes for gas dynamics
Crank-Nicolson method: In this section, we derive SI schemes for the hydrodynamic
t Eqs. (2)(4) and (6) which treat sound waves and thermal diffu-
U(X n+1 ) = U(X n ) + RU (X n+1 ) + RU (X n ) .

(8) sion implicitly. The remaining terms (e.g., advection) are treated
2
A153, page 2 of 17
M. Viallet et al.: A JFNK method for time-implicit multi-D hydrodynamics
explicitly. In this section, our only concern is to design schemes 3.2.2. Total energy equation
that are stable and inexpensive, rather than accurate. Later, we
will use these schemes as preconditioners for a fully implicit and When solving for the total energy equation, U = (, t , u) and
accurate method. X = (, t , u), the matrices take the form:
p
2 p

t u 1 p u p
V e e e e
3.1. Equations for p, e, and u
= (t u2 )/ ,

1/ u/ (22)
U
Our SI schemes are derived from the evolution equations for the u/ 0 1/

primitive variables V = (p, e, u). These are
t p + u p = 1 p u + (3 1) T , and

(12)
p 1 p 1

p 1 p
t e + u e = u + T ,

(13) 0
X e e e

= u .

0 1 (23)
1
t u + u u = p + g, (14) V
0 0 1
where 1 and 3 are the generalized adiabatic indices for a gen-
eral equation of state. For a perfect gas without any internal de- 3.3. SI scheme for sound waves
grees of freedom, these adiabatic indices reduce to 1 = 3 = ,
where is the usual adiabatic index. The detailed derivation of We first design a SI scheme that treats sound waves implicitly.
the pressure equation, Eq. (12), is given in Appendix A.1. In Sect. 3.3.1 we start with deriving the propagation equation for
To simplify the notation and without loss of generality, we adiabatic acoustic fluctuations, which identifies the terms in the
will consider the one-dimensional version of these equations: equations that need to be treated implicitly.
t p + u x p = 1 p x u + (3 1) x x T ,

(15)
p 1 3.3.1. Propagation equation for acoustic fluctuations
t e + u x e = x u + x x T ,

(16)
In this section we neglect thermal diffusion and gravity. We lin-
1 earize the 1D Eqs. (15) and (17) around a uniform background
t u + u x u = x p g, (17)
state:
where we assumed that g = ge x , where e x is a unity vector = 0 + 0 , (24)
in the x-direction. Extension of the numerical scheme to higher
dimensions is straightforward. p = p0 + p0 , (25)
u = 0 + u. (26)
3.2. Transformation matrices Keeping only linear terms in the perturbations, we obtain:
Having introduced the conserved variables U, the independent
variables X, and the primitive variables V, we will need the trans- t p0 = 1 p0 x u, (27)
formation matrices V/U and X/V, which are defined as: 1
t u = x p0 . (28)
V 0
V = U, (18)
U Next, we take the derivatives of Eq. (27) with respect to t and
X
X = V. (19) Eq. (28) with respect to x. We substitute the result of the differen-
V tiation of Eq. (28) into the result of the differentiation of Eq. (27)
These matrices are given below for both the case of the internal to eliminate tx u and obtain the wave equation that describes the
and total energy equations, and the details of their derivation is adiabatic propagation of sound waves:
postponed to Appendix B.
2t p0 a2 2x p0 = 0, (29)
3.2.1. Internal energy equation
where a = 1 p0 /0 is the adiabatic sound speed.
p
When solving for the internal energy equation, U = (, e, u) The terms on the r.h.s of Eqs. (27) and (28) are responsi-
and X = (, e, u), the transformation matrices take the following ble for the propagation of sound waves. To overcome the corre-
form: sponding CFL limit, we will treat them implicitly in the follow-

e p
p
1 p 0
ing section.
V e e e
= e/ 1/ 0
, (20)
U 3.3.2. Pressure equation
u/ 0 1/
and To treat sound waves implicitly, Sect. 3.3.1 suggests that we
p 1

p 1 p treat the 1 p x u term in the pressure equation (Eq. (15))
e e e 0 implicitly, with a simple backward Euler method:
X

= 0 .

0 1 (21)
V p n
0 0 1 + 1 pn x un+1 = u x p
t
The required derivatives are those typically provided by n
EOS routines. + (3 1) x x T . (30)
A153, page 3 of 17
A&A 586, A153 (2016)
Here p = pn+1 pn , n being the temporal index. We use Picard derivatives, one obtains the following system of equations:
linearization in order to keep the scheme linear1 . All other terms
in the equation are treated explicitly using the forward Euler p
a2 t2x p = F p (pn ), (36)
method. Using Eq. (28), we approximate un+1 with t
1 e pn
u n+1
= u t n x pn+1 ,
n
(31) t n 2 2x p = F e (en , pn ), (37)
t ( )
which we substitute in Eq. (30) to obtain u 1

+ n x p = F u (un , pn ), (38)
t
p n
a2 t2x pn+1 = u x p 1 pn x un
t where we introduced the following residual functions:
n
+ (3 1) x x T , (32) p pn
F p (p) = a2 t2x p + u x p|n + 1 pn x un
t
where a = 1 pn /n is the adiabatic sound speed evaluated at
p
3 1 x x T |n ,

time step n. The right hand side of Eq. (32) corresponds to the (39)
explicit discretization of the original equation, but on the left e en pn
hand side a Laplacian operator illustrates the parabolic character F e (e, p) = t n 2 2x p + u x e|n
t ( )
of this equation.
pn 1
+ x un n x x T |n ,

(40)
3.3.3. Internal energy equation n
We approach the internal energy equation, Eq. (16), in the same u un 1

F u (u, p) = + n x p + u x u|n + g. (41)
way as the pressure equation. Advection and thermal diffusion t
terms are discretized using an explicit scheme, and the com-
pressional work is discretized using an implicit scheme. This The solution at time n+1 satisfies F p (pn+1 ) = 0, F e (en+1 , pn+1 ) =
produces: 0, and F u (un+1 , pn+1 ) = 0.
We write the system in a matrix form:
e pn 1
+ x un+1 = u x e|n + n x x T |n ,

(33)
t n JV V = F V (V n ), (42)
where e = en+1 en . Again, we used Picard linearization to with V = (p, e, u) and F V = (F p , F e , F u ). This formulation of
discretize the compressional work. We use again Eq. (31) to the equations is known as the form. The block structure
eliminate un+1 in Eq. (33) to obtain of JV is:
e pn pn
J p,p 0 0

t n 2 2x pn+1 = u x e|n n x un
t ( ) JV = Je,p Je,e 0 .

(43)

1 Ju,p 0 Ju,u

+ n x x T |n .

(34)

In this form, the system can be solved by operator splitting: the
The resulting equation is similar in form to the implicit version pressure Eq. (36) is first solved for p, e is deduced from the
of the pressure equation, Eq. (32), as it contains the Laplacian of internal energy Eq. (37), and u is deduced from the velocity
the pressure field. Eq. (38). Note that in the energy equation, one can use the equal-
ity
3.3.4. Velocity equation 1 p
t2x p = + Fp , (44)
We discretize the pressure gradient in the velocity equation, a2 t
Eq. (17), implicitly, using Picard linearization. All remaining which is obtained from the pressure equation, rather than writing
terms are discretized explicitly. We obtain the Laplacian of p explicitly. In Sect. 3.5, we discuss how we
u 1 solve numerically the parabolic equation for p.
+ x pn+1 = u x u|n g, (35)
t n
3.4. SI scheme for sound waves and thermal diffusion
where u = un+1 un .
When thermal diffusion is important, it can cause numerical stiff-
ness. In this case, it also needs to be treated implicitly. This can
3.3.5. -form of the equations be easily implemented in the framework of the previous section:
By replacing qn+1 = q + qn for all implicit terms in all that is required is to treat thermal diffusion implicitly in the
Eqs. (32), (34), (35), rather than only those terms involving time pressure and internal energy equations. Equation (32) now be-
comes:
1
Picard linearization refers to the fact that we write pn x un+1 instead p
of pn+1 x un+1 when applying the implicit discretization. This is an ap- a2 t2x pn+1 (3 1) x n x T n+1 =

proximation, but it has the advantage of keeping the scheme linear in t n
the new variables. u x p 1 pn x un , (45)
A153, page 4 of 17
and Eq. (34) becomes: Heroux et al. 2005). Specifically, MUSIC uses the iterative lin-
ear solver GMRES implemented in the package AztecOO to
e pn 1 solve the parabolic system. The convergence of the linear solver
t n 2 2x pn+1 n x n x T n+1 =

t ( ) is checked based on the criterion:
n
p
u x e|n n x un . (46) ||Px b||2 < 0 ||b||2 , (56)

where P is the system matrix, x the solution vector, b the r.h.s.
In both Eqs. (45) and (46), we use Picard linearization to treat
of the linear system, and 0 controls the accuracy of the solu-
the diffusion term.
tion. When setting 0 106 , we find that the preconditioner has
The new system for variables V in -form is
the same performances as when we use a direct solver to solve
p the parabolic system. However, in practice it is not necessary to
a2 t2x p (3 1) x n x T = F p ,

(47) solve the parabolic problem with such accuracy, as the precon-
t
e pn 1 ditioner is only meant to provide an approximate solution of the
t n 2 2x p n x n x T = F e ,

(48) problem. The results presented in this work were obtained by
t ( ) adopting a value 0 = 104 . This value ensures an accuracy that
u 1 is sufficient for the purpose of preconditioning. It is possible that
+ x p = F u , (49)
t n the performance could be improved by adopting even larger val-
ues of 0 , as the decrease in the quality of the preconditioner
where the residuals F are unchanged, and given in could be mitigated by the decrease in its computational cost.
Eqs. (39)(41). This is left for future investigation. A multi-level preconditioner
We use the linearized equation-of-state to express T as is applied to speed up convergence of the linear solver. We use
the ML package of the Trilinos library to setup a multi-level pre-
1 T
T = e + , (50) conditioner (Gee et al. 2006). Our preconditioner is based on the
cv e default parameters provided for the smoothed-aggregation setup
in the ML package (parameters set SA) with the following two
where cv = e/T | is the specific heat capacity at constant vol-
modifications. First, instead of using the default method to es-
ume2 . In general, the contribution due to density fluctuations is timate the eigenvalues of the matrix we use the 1-norm of the
much smaller3 and we neglect them: matrix. The default method of estimating the eigenvalues used
1 a method based on a conjugate-gradient solution of the system,
T e. (51) seeded by a random vector. This random vector caused simula-
cv
tions continued after restarting to differ from simulations without
This approximation is used to replace T with e in the previous the restart, removing the ability to reproduce results. Second, we
system to obtain reduce the damping factor for the precondition from the default
of 1.33 to 1.2, which in our case, results in fewer iterations for
p n the parabolic solver to converge.
a2 t2x p (3 1) x x e = F p , (52)
t cv
e p n
1 n 3.6. Time-stepping with SI schemes
t n 2 2x p n x x e = F e , (53)
t ( ) cv The SI schemes designed in this section can be used as
u 1 time-stepping methods to solve Eqs. (2)(4) and (6). The
+ n x p = F u . (54)
t time-marching algorithm is:
We now have a system of two coupled parabolic equations, as 1. Given the solution at time step n, X n , compute FU (X n ) =
seen from the block structure of the matrix JV : RU (X n ) (see Eq. (9));
2. Transform FU into a residual for the primitive variables V:
J p,p Jp,e 0

V
JV = Je,p Je,e 0 . (55) F V =

FU ; (57)

Ju,p 0 Ju,u
U
3. Compute JV (V n ) corresponding to the desired SI scheme and
The solution strategy of a system of equations in which ther- solve
mal diffusion terms are treated implicitly is therefore more com-
plicated than in the previous section: the two coupled parabolic JV V = F V (58)
Eqs. (52) and (53) have to be solved jointly for e and p, and
finally u is obtained from Eq. (54). for V;
4. Transform V into X:
3.5. Numerical solution of the parabolic system X
X = V; (59)
V
For both SI schemes presented previously, the numerical so-
lution of a system of linear parabolic equations is required. 5. Set X n+1 = X n + X.
This is accomplished in MUSIC using the Trilinos library (see
The scheme is linear (we used Picard linearization to deal with
2
Here it is understood that partial derivatives are evaluated at time nonlinear terms) and only first-order in time (we used the for-
step n. ward/backward Euler methods). However the CFL limit is less
3
It is zero for a perfect gas, for which e = e(T ). restrictive as the terms associated with acoustic fluctuations were
A153, page 5 of 17
A&A 586, A153 (2016)
discretized implicitly. Similarly, thermal diffusion does not im- We introduce the advective CFL number:
ply any stability restriction on the time step if the second SI u t
scheme is used. However, since the advective terms were dis- CFLadv = , (69)
cretized explicitly, a time step restriction based on the flow speed x
remains. where t is the time step and x the mesh spacing. For a vortex
advected at u , the advective CFL number provides a measure
of the number of grid cells crossed per time step t.
3.7. 2D isentropic vortex test To characterise the accuracy of the SI scheme, we perform
In this section, we test the SI scheme for sound waves. We use a temporal convergence study. The resolution of the domain is
the isentropic vortex advection test originally proposed in Yee set to 642 and we choose different time steps in order to cover a
et al. (2000), and we adopt a setup similar to the one used by broad range of CFLadv , between 102 and 3. This corresponds
Kifonidis & Mller (2012) and Viallet et al. (2013) to test the to CFLhydro as large as 4 105 . Later, we will compare the result
accuracy of the SI scheme. with a more accurate time integration method. We do not study
The initial state consists of an isentropic vortex (i.e., zero convergence with spatial resolution here, as our spatial method
entropy perturbation) embedded in an uniform flow of norm remains the same in all schemes presented in this work, and
u = 1. We use a Cartesian system of coordinates where the is unchanged as compared to previous publications. A spatial
x-axis is taken in the direction of the flow. The vortex corre- resolution study is presented in Viallet et al. (2011).
sponds to the following perturbations in the state variables: We evolve the isentropic vortex varying both the Mach num-
ber and CFLadv , and monitor the numerical errors. We expect two
1r2 behavioral regimes. At low values of CFLadv , the error should be
(u, v) = e 2 (y, x), (60)
2 approximately independent of the time step, as the spatial error
( 1)2 1r2 dominates. At higher values of CFLadv , the temporal error should
T = e , (61) dominate, and be proportional to t as the SI scheme is first-
82
order in time. The results are presented in Fig. 1. The expected
where r = x2 + y2 , T = p/ (we set the gas constant R = 1 and
p
behavior is recovered, although the flat regime at low values of
work with dimensionless quantities), is the adiabatic index, the time step is not clearly seen. Temporal truncation errors re-
and the vortex strength. We use = 1.4 and = 0.75, with main significant for small time steps, as a result of the approx-
initial conditions imations introduced when designing the SI scheme. The first-
1 order character of the temporal discretization appears clearly for
= (T + T ) 1 , (62) larger values of the time step. However, the most important con-
u = u + u, (63) clusion is that the numerical error is independent of the Mach
v = v, (64) number. Effectively, we achieved our goal of designing a scheme
that is independent of the stiffness of the background pressure
1 field. We stress that this behavior is observed over a range of
e= , (65)
1 Mach numbers spanning five orders of magnitude.
Finally, although we successfully removed the stability con-
where the subscript indicates the background value. The straint on the time step caused by sound waves, there is still a
sound speed of the background is c = T . The maximum CFL-like condition based on the advective velocity. Such a sta-
velocity of the vortex is vmax = max ||u|| = /2. We define the bility limit is not evident from Fig. 1, as only a few models are
vortex Mach number as Ms = vmax /c = /(2 T ). By vary- computed for the largest time steps. Empirically, we determined
ing T , we change the Mach number of the flow. We consider that the SI scheme becomes unstable for CFLadv & 0.2.
T = 1, 102 , 106 , 1010 , which corresponds to Ms = 101 , 102 ,
104 , 106 respectively.
The computations are performed on a 2D Cartesian domain 4. Jacobian-free Newton-Krylov method
[4, 4] [4, 4]. Initially, the vortex is centered on the origin. and physics-based preconditioning
The vortex is advected until t = 0.4. The exact solution cor-
responds to the initial vortex profile being shifted by a distance 4.1. Newton-Krylov method
equal to 0.4 in the x direction. To test the accuracy of the scheme, To solve the nonlinear system of equations, FU (X n+1 ) = 0,
we compare the velocity in the direction of advection, u, with the resulting from our fully implicit method we perform Newton-
expected analytic solution u0 . Kifonidis & Mller (2012) and Raphson iterations. The Newton-Raphson procedure is initiated
Viallet et al. (2013) used the density field to monitor the error, by taking an initial guess for the solution, typically X (0) = X n . At
but here the background density is changed when T is changed, the kth Newton-Raphson iteration, the solution of a linear system
which is not the case for the velocity field. We monitor three is required:
different norms of the error:
1 X J (k) X (k) = FU (X (k) ), (70)
L1 error : ||u u0 ||1 = |ui, j u0i, j |, (66)
N x Ny i, j where X = X(k) (k+1) (k)
X . The variable X (k)
is the solution at
s iteration k, and
1 X
L2 error : ||u u ||2 =
0
(ui, j u0i, j )2 , (67) FU (k)
N x Ny i, j J (k) = (X ) (71)
X
L error : ||u u0 || = max |ui, j u0i, j |, (68) is the Jacobian matrix at iteration k.
i, j
The components of X and FU can have considerably differ-
where N x , Ny are the grid dimensions, and the indices i, j range ent numerical values as they represent different physical quan-
over the simulation grid. tities in different units. For instance, densities can have typical
A153, page 6 of 17
CFLhydro CFLhydro
1 101 100 101 1 100 101 102
10 10
L1-error L1-error
2
L2-error 2
L2-error
10 10
L-error L-error
Norms of the error
Norms of the error

103 103
104 104
t t
105 105
1 2
Ms = 10 Ms = 10
6 6
10 10
102 101 100 101 102 101 100 101
CFLadv CFLadv
CFLhydro CFLhydro
101 102 103 104 103 104 105 106
101 101
L1-error L1-error
L2-error L2-error
102 102
L-error L-error
Norms of the error
Norms of the error

103 103
104 104
t t
105 105
Ms = 104 Ms = 106
106 106
102 101 100 101 102 101 100 101
CFLadv CFLadv
Fig. 1. Convergence tests of the SI scheme treating sound waves implicitly advection of an isentropic vortex at different Mach numbers. The
continuous lines show the norm of the errors measured in the velocity component parallel to the direction of advection.
values around 104 g/cc, whereas specific internal energies have sound speed using the parameters 1 , 2 in the definitions of L
values around 1014 erg/g. Also, due to the stratification of stellar and R. The work described in Viallet et al. (2011) and Viallet
interiors, some variables, such as the density, can vary by several et al. (2013) used 1 = 2 = 1. After testing, we found that
orders of magnitude throughout the domain. Such a wide range 1 = 105 and 2 = 1 gives good performance for a wide range
of values can cause numerical difficulties due to round-off errors. of Mach numbers, typically 106 . Ms . 101 , see discussion
Therefore, before the system (70) can be solved, it is necessary in Sect. 5.
to scale it. We introduce two diagonal matrices L and R to scale The Newton-Raphson procedure is terminated when the rel-
Eq. (70): ative corrections fall below a certain value :

L1 J (k) R R1 X (k) = L1 FU (X (k) ). (72) ||R1 X (k) || < . (73)
As L and R are diagonal matrices, we use the same symbol to In Viallet et al. (2013), it was shown that the nonlinear tolerance
represent their diagonal entries as a vector. The size of these vec- has to be chosen small enough so that the truncation errors of
tors is equal to the number of variables multiplied by the number the scheme dominate the numerical error. We follow their recom-
of cells. Each cell is treated in the same way, and the definitions mendation and set = 106 . Finally, if Eq. (73) is already ful-
of R and L only differ for different variables: filled at the first iteration, we enforce a second Newton-Raphson
L = (k) , R = (k) , iteration. For the sake of clarity, we drop from now on the super-
script k of the outer nonlinear iteration of the Newton-Raphson
Le = (k) e(k) , Re = e(k) , procedure, and we do not carry the scaling matrices L and R in
Lu = (k) max(|u(k) |, 1 c(k) Ru = max(|u(k) |, 2 c(k) the notation in the rest of the paper.
s ), s ),
We use the GMRES method to solve iteratively Eq. (72). We
where c(k)
s is the adiabatic sound speed computed from the solu-
start from an initial guess X0 , and we define the initial residual
tion at iteration k. R represents the typical value of the unknown as r0 = FU (X) JX0 . In practice, we choose X0 = 0 so that
vector X (k) , and attempts to remove both the effects of units and r0 = FU (X). At the pth iteration, the GMRES method seeks
stratification. We follow a similar idea for L and use the typical an approximation X p of the solution by solving a minimization
value of the conserved variables to scale the residual vector FU . problem in the pth Krylov space K p of J:
However, as velocities can be arbitrarily small, it is necessary
to introduce a minimum velocity, here measured relative to the X p K p (J) = span r0 , Jr0 , . . . , J p1 r0 . (74)
A153, page 7 of 17
A&A 586, A153 (2016)
The dimension of the search space increases at each iteration un- The key part of the right-preconditioning process is the ap-
til convergence is achieved. The convergence of the linear solver plication of JM 1 on a Krylov vector v, provided by GMRES.
is tested with the criterion This operation is required at each iteration to build the succes-
sive Krylov spaces. In right-preconditioning, JM 1 v is computed
||JX p + FU (X)||2 < ||FU (X)||2 , (75) in two steps:
where is a parameter that determines the accuracy of the solu- 1. Solve Mw = v for w;
tion. Typical values of that are used in this paper are = 102 2. Apply J to w.
and = 104 , see discussion in Sect. 5.
The first step requires the inversion of a linear system; the sec-
ond step requires the action of the Jacobian on the vector w and
4.2. Jacobian-free approach is approximated by a finite-difference formula (Jacobian-free ap-
To build successive Krylov spaces, the GMRES algorithm com- proach).
putes the action of the Jacobian matrix on a vector. This is The basic idea of physics-based preconditioning is to inter-
the only use of the Jacobian operator, and we take advantage pret the system Mw = v in step 1 above as a system correspond-
of the fact that this operation can be approximated by finite- ing to a linear time-stepping scheme written in -form:
differencing: Mw = v MX = G(X), (82)
F(u + u) F(u)
J(u)u , (76) where (M, G) describes a numerical scheme that approximates
the full nonlinear scheme (J, F). Another way to understand
where is a small number. We rely on the implementation of physics-based preconditioning is that Eq. (82) defines a mapping
matrix-free operators available from the Trilinos package NOX. M from residuals to perturbations X. Therefore, the Jacobian
This package contains two preset options for calculating : matrix is always applied to a X to yield a residual vector FU
! which is used to build Krylov spaces.
||u|| The SI schemes designed in Sect. 3 are good candidates for
= + , (77)
||u|| the scheme in Eq. (82). These schemes provide a good approx-
and, imation of the solution (i.e., M J), and most importantly
they remove the numerical stiffness by solving the stiff physics
1012 |u u|
!
(sounds waves and thermal diffusion) implicitly. Physics-based
= + sign (u u) . (78)
uu preconditioner therefore injects physical insight at the heart
of the linear method, improving its convergence. However, the
In both cases, is a small parameter with a default value of 106 . Krylov vector is a residual for variables U, and it needs to be
The standard choice in MUSIC is to use Eq. (77), as it gives the transformed into a residual for variables V before a SI scheme
best results (see discussion in Sect. 5). can be used. Furthermore, the SI scheme provides V, which
In this Jacobian-free approach, the Jacobian matrix is not needs to be transformed into X before J can be applied. As in
needed explicitly, lowering the memory cost of the scheme. the time-stepping algorithm described in 3.6, we use the matrices
Instead, computing the action of the Jacobian on a given vec- derived in Sect. 3.2 to do these transformations. The complete
tor requires one evaluation of the nonlinear residual in Eq. (76), algorithm to use the SI as a preconditioner is:
assuming that F(u) has been already computed and stored.
When J has a large condition number, the Krylov method Input: the GMRES method provides a vector v K p . v can
fails to converge in an acceptable number of iterations (a few be interpreted as a residual vector for the conservative vari-
dozen) as the Krylov space is dominated by the direction of the ables U, which we denote FU ;
eigenvector associated with the largest eigenvalue. In such cases, 1. Transform FU into a residual for the primitive variables V:
preconditioning is necessary. In this work, we use the SI schemes V
presented in Sect. 3 as preconditioners for the Krylov method. F V = FU ; (83)
U
This is detailed in the next section.
2. Apply the SI scheme to get V:
4.3. Right-preconditioning of GMRES with SI schemes JV V = F V ; (84)
Right-preconditioning of system (72) corresponds to solving the 3. Transform V into X:

equivalent system: X
X = V; (85)
JM 1 X 0 = FU (X),

(79) V
MX = X , 0
(80) 4. Compute JX using the Jacobian-free method.
Output: vector JX, which is provided to GMRES to build
where M is the preconditioning matrix. X 0 is an intermediate successive Krylov spaces.
solution vector, which once known, is used to find the solution
We note that the scheme is not fully matrix free: the SI scheme
X. If the preconditioning matrix is a good approximation of J,
requires the resolution of a linear problem for which the matrix
i.e., JM 1 has a low condition number, the Krylov space of JM 1
is explicitly formed and stored. However, thanks to the simpli-
is better suited to construct an approximation of the solution
fications made in deriving the SI scheme, the matrix system is
X 0p K p (JM 1 ) = span r0 , JM 1 r0 , . . . , JM 1 p1 r0 . significantly smaller and more sparse than the Jacobian matrix.

(81)
This keeps memory demand low.
Once a suitable solution X 0 has been found in the search space, In the remainder of the paper, the combination of the JFNK
based on the same convergence criterion as (75), a final linear method with physics-based preconditioning presented in this
system, Eq. (80), has to be solved to get the actual solution X. section will be referred to as the JFNK+PBP method.
A153, page 8 of 17
CFLhydro CFLhydro
1 101 100 101 1 100 101 102
10 10
L1-error L1-error
2
L2-error 2
L2-error
10 10
L-error L-error
Norms of the error
Norms of the error

103 103
104 104
105 105
Ms = 101 (t)2 Ms = 102 (t)2
106 106
102 101 100 101 102 101 100 101
CFLadv CFLadv
CFLhydro CFLhydro
101 102 103 104 103 104 105 106
101 101
L1-error L1-error
L2-error L2-error
102 102
L-error L-error
Norms of the error
Norms of the error

103 103
104 104
105 105
Ms = 104 (t) 2 Ms = 106 (t)2
106 106
102 101 100 101 102 101 100 101
CFLadv CFLadv
Fig. 2. Same as Fig. 1, but using the JFNK+PBP scheme to advect the isentropic vortex.
5. Results Four free parameters enter the JFNK+PBP method: the

choice of perturbation strategy (Eqs. (77) and (78)); the two
In this section, we assess the performance of our JFNK+PBP scaling coefficients for the velocity components of the solution
method in both 2D and 3D. In Sect. 5.1, we test the accuracy and and residual vectors (parameters 1 and 2 in Sect. 4.1); the tol-
efficiency of the method using idealized tests that use an ideal- erance required for the solution of the Jacobian equation (pa-
gas equation of state and a Cartesian geometry. In Sect. 5.2, we rameter in Eq. (75)). These free parameters were determined
test the method to model stellar interiors in a spherical geometry. by testing.
An important goal of this section is to demonstrate the good per-
formance, robustness and accuracy of the solver for a wide range We find that the accuracy of the solver for the higher end of
of Mach numbers, typically from Ms = 101 down to Ms = 106 . the Mach number range being considered (Ms > 104 ) is good
regardless of the choice of perturbation strategy. However, for
lower Mach numbers (Ms 104 ) we find that only Eq. (77),
5.1. Ideal test cases with = 107 , is able to yield accurate results. When using
Eq. (78), the Jacobian operator is poorly approximated, regard-
5.1.1. 2D isentropic vortex less of the value of , resulting in a failure of the nonlinear
method.
We first investigate the accuracy of the solver by consider- In the scaling of the linear system, we find that a value of
ing the 2D isentropic vortex problem that we used to test the 1 = 105 gives the most consistent errors across Mach num-
SI scheme in Sect. 3.7. We perform the same set of runs with bers for the range being considered, i.e., 106 Ms 101 .
the JFNK+PBP method, and the computed errors are shown in However, this range can be adjusted by tuning 1 to the problem
Fig. 2. Comparing with the error of the semi-implicit scheme at hand, with a higher value producing more accurate solutions
(see Fig. 1), the JFNK+PBP scheme achieves an overall reduc- for high Mach number flows. We find that 2 = 1 enables us
tion in the error. The range of time steps where spatial trunca- to obtain accurate results for the range of Mach numbers being
tion errors dominate is larger, and we observe the second-order considered.
character of the temporal error at large time steps. The use of
a SI scheme as a preconditioner does not impact the overall ac- We find that a linear tolerance = 104 produces solutions
curacy of the JFNK+PBP method. Figure 2 also shows that the with similar errors for the full range of Mach numbers consid-
results of the JFNK+PBP method are independent of the Mach ered in this work. A choice of = 102 produces similar results
number, and this desirable property of the SI scheme has been for the higher Mach numbers, but the quality of the solutions for
inherited by the nonlinear method. low Mach numbers degrades seriously.
A153, page 9 of 17
A&A 586, A153 (2016)
103
2D Isentropic Vortex, no PBP 103
2D Isentropic Vortex, PBP
GMRES iterations
GMRES iterations
102 102
101 Ms = 101 101

Ms = 102
Ms = 104
Ms = 106
100 2 100 2
10 101 100 101 102 103 104 105 106 10 101 100 101 102 103 104 105 106
Hydro CFL Hydro CFL
103
3D Taylor-Green Vortex, no PBP 103
3D Taylor-Green Vortex, PBP
GMRES iterations
GMRES iterations
102 102
101 Ms = 101 101

Ms = 102
Ms = 104
Ms = 106
100 2 100 2
10 101 100 101 102 103 104 105 106 107 10 101 100 101 102 103 104 105 106 107
Hydro CFL Hydro CFL
Fig. 3. Convergence of the GMRES solver without preconditioning (left panels) and with the physics-based preconditioner (right panels). The
upper panels correspond to the 2D isentropic vortex, and the lower panels to the 3D Taylor-Green vortex. In both cases, the Mach numbers
considered are Ms = 101 , 102 , 104 , 106 (the right panels assume the same legend as the left ones). The maximum allowed number of
GMRES iterations was set to 300. The mean values of the number of iterations for convergence is plotted, with shaded areas showing maximum
and minimum values. For each Mach number, the location of CFLhydro corresponding to CFLadv = 1 is shown by a vertical dashed line.
Next, we assess the efficiency of the SI scheme as a of the linear solver is now coming from the unstable behavior of
preconditioner. This is done by considering the number of the SI scheme for too large a CFLadv .
GMRES iterations necessary to reach convergence without pre-
conditioning and with physics-based preconditioning, for differ-
5.1.2. 3D Taylor-Green vortex
ent values of CFLhydro and different Mach numbers (the linear
tolerance is set to = 104 ). When solving the unprecondi- We consider the Taylor-Green vortex problem to test our
tioned linear system, we found that for 1 = 105 the majority of physics-based preconditioner for an adiabatic (i.e., no thermal
linear problems, particularly for higher Mach numbers, fails to diffusion) flow in 3D. We consider a Cartesian domain
converge. Instead, we present for the unpreconditioned case the (x, y, z) [0, 2L]3 , where L is a lengthscale that sets the phys-
convergence behavior for 1 = 1, as a best case scenario. When ical size of the domain. The initial conditions for the velocity
solving the linear system with physics-based preconditioning, field are
we use the optimal parameters described previously. x y z
u x (x, y, z) = u0 sin cos cos , (86)
L L L
The results of convergence tests for the iterative method x y z
uy (x, y, z) = u0 cos sin cos , (87)
are shown in Fig. 3. For these tests, the simulations are run L L L
for 100 time steps. Without preconditioning, the different Mach uz (x, y, z) = 0. (88)
number cases behave similarly: the number of GMRES itera- The domain has a uniform density of 0 . The initial pressure
tions increases rapidly for CFLhydro & 1, and above CFLhydro & field is
10 no convergence is achieved despite the large number of iter-
1 z x y
ations allowed. Such behavior is due to the stiffness of acoustic p(x, y, z) = p0 + 0 u20 2 + cos 2 cos 2 + cos 2 , (89)
waves which increases with CFLhydro . Our physics-based pre- 16 L L L
conditioner is tailored to treat this effect, and the improvement which ensures that
is demonstrated in the right panel of Fig. 3, as compared to the
t u = 0 at t = 0,

(90)
left panel without preconditioning. In each case, the increase in
the number of GMRES iterations takes place at larger values of i.e., the initial conditions do not induce any acoustic modes. The
the time step. With physics-based preconditioning, the failure initial amplitude of the vortex is measured in terms of the Mach
A153, page 10 of 17
number Ms = u0 /cs , where cs = p0 /0 is the adiabatic sound

p
Finally, we monitor the decay of the Taylor-Green vortex for
speed. The adiabatic index is taken as 7/5 = 1.4. We take L the range of Mach numbers explored here. We simulate for a
as our unit of length, u0 as our unit of velocity, and 0 L3 as our fixed time of t = 20, a time at which most of the dissipation
unit of mass. In this normalization, time is measured in units of has occurred. We show in Fig. 4 the evolution of the decay rate
L/u0 , and energy density in units of 0 u20 . We change p0 to vary of kinetic energy. The left panel shows a global view where the
the Mach number in the range 106 Ms 101 . We consider different curves are indistinguishable from each other. The right
a numerical domain with a resolution of 643 . For this test case, panel shows a zoom on the peak of the decay rate. The difference
we define the advective CFL number as between the curves represents less than a percent. In Table 2 we
|u|t record the maximum decay rate and the time at which it occurs.
CFLadv = max , (91) The purpose of Fig. 4 is twofold: firstly, it complements the per-
x formance results presented previously as it shows that the results
where u is the velocity, t the time step, x the mesh spacing. are independent, at the percent level, of the Mach numbers; sec-
Similarly to the 2D isentropic vortex, we first investigate the ondly, it provides confidence in using the code as an ILES tool
efficiency of the physics-based preconditioner in reducing the to model turbulent flows over a wide range of Mach numbers. In
number of iterations required by the linear solver to converge the ILES framework, dissipation of kinetic energy is due to the
to the desired accuracy. As condition (90) is never exactly ful- truncation errors of the scheme, and it is not obvious that these
filled in the discretized problem, some acoustic fluctuations are behave similarly for different Mach numbers.
produced at the first time step. To remove these transients, we
evolve each case for 100 time steps at a fixed CFLhydro = 1. We
then compute another 100 time steps with different values of t, 5.2. Stellar test cases
corresponding to different values of CFLhydro . We monitor the
number of iterations required for convergence, with and without In this section, we examine how the JFNK+PBP method per-
physics-based preconditioning4 . The results are shown in Fig. 3. forms in realistic stellar models. We use the same models as in
The conclusions are very similar to the ones drawn from the Viallet et al. (2013) of a 2D young Sun and a red giant in which
2D vortex test presented in the previous section: physics-based convection is fully developed and has reached a quasi-steady
preconditioning allows for a fast convergence over a broad range state. Both models are first considered in a 2D spherically ax-
of hydrodynamical CFL numbers. Here again, the convergence isymmetric geometry. The red giant model is then considered in
of the linear solver becomes difficult when CFLadv = 1 is ap- a full 3D spherical wedge geometry.
proached.
Next we use the Taylor Green vortex to benchmark the 5.2.1. 2D stellar models
implicit JFNK+PBP method against the second-order accurate
Adams-Bashforth explicit scheme. Starting at t = 0, we evolve We compute 100 time steps of the red giant and young Sun
the vortex using both the JFNK+PBP method and the Adams- models using the JFNK+PBP method. We limit CFLadv (de-
Bashforth method. We run the tests for a fixed wall-clock time fined as in Eq. (91)) to values of 0.5, 1, and 1.5. We com-
of six hours, and record the final time achieved by each method, pare the performance of the JFNK+PBP method with the best
varying the Mach number of the test case. In the explicit case, method identified in Viallet et al. (2013). The results are sum-
the time step is limited by stability to CFLhydro = 0.1; for the marized in Table 3 for the red giant, and Table 4 for the young
implicit case, the time step is limited to CFLadv = 0.5 for ac- Sun. They show that the JFNK+PBP method is less efficient
curacy and for the stability of the underlying SI. The results are than the Broyden+ILU method. It is seen that the JFNK+PBP
recorded in Table 1. The final times obtained with the explicit method is becoming less and less effective when CFLadv in-
solver scale approximately with the Mach number, due to the creases, as the physics-based preconditioner fails as the under-
scaling of the CFL time step with the background sound speed. lying SI becomes unstable. In practice, the JFNK+PBP method
The final times obtained with the JFNK+PBP method show less should not be used with CFLadv larger than one when computing
of a clear pattern, with performance peaking at a Mach number an unsteady flow. This limitation is not very penalizing, as nu-
of Ms = 104 . Nevertheless, the JFNK+PBP method is already merical accuracy is expected to decrease when CFLadv > 1,
more than five times faster than the explicit solver for Ms = 102 . meaning that larger time steps are not desirable anyway5 . For
For lower Mach numbers, the speed-up is larger than two or- both the red giant and young Sun models, the performance
ders of magnitude. We observe a dramatic drop in performance of the JFNK+PBP solver is the same for CFLadv = 0.5 and
at Ms = 106 when using the criterion CFLadv = 0.5 on the CFLadv = 1, as the increase in the time step is compensated
time step. From analyzing the performance of the scheme for by the increase in the number of GMRES iterations per Newton
this run (see Table 1), it appears that the physics-based precon- iteration. One should keep in mind that the performance of the
ditioner becomes less effective, resulting in a very large number JFNK+PBP solver presented here could probably be improved
of GMRES iterations and a substantial loss of performance. Such by fine tuning the parameters discussed in Sect. 5.1.1. For in-
a loss of effectiveness of the preconditioner at a very low-Mach stance, the red giant and young Sun models differ in the av-
number close to CFLadv 1 can be already seen on Fig. 3, and erage Mach number, with the red giant having a larger Mach
seems to highlight the limit of what is currently feasible with number (0.1) than the young Sun (0.01). Although the per-
the solver. We repeated the timing test for this Mach number formance of the JFNK+PBP method could be made closer to the
with CFLHydro = 5 104 , which corresponds to CFLadv 0.05. Broyden method, we could expect the latter to remain the most
This improves the final time by approximately an order of mag- efficient option for these cases.
nitude. It remains the least efficient case, but it is still roughly
three orders of magnitude faster than the corresponding explicit 5
calculation. Based on the 2D vortex advection test, Viallet et al. (2013) showed
that one could use CFLadv 2 without degrading the accuracy too
4
The parameters of the solver (, 1 , 2 , . . . ) are adjusted as discussed much. However, this conclusion might not be adequate for an unsteady,
for the 2D test case. turbulent flow, which could require smaller time step.
A153, page 11 of 17
A&A 586, A153 (2016)
Table 1. Summary of the results for the Taylor-Green vortex tests.
Mach No. Implicit Explicit

CFLhydro CFLadv Newton
t
GMRES
Newton
Parabolic
GMRES Final time Final time
101 7.1e+00 0.5 3.8 16.6 2.4 17.8 34.8
102 7.0e+01 0.5 3.1 15.4 2.5 20.9 3.62
103 8.2e+02 0.5 2.7 16.2 2.5 31.6 0.300
104 1.1e+04 0.5 2.0 16.1 2.8 55.1 3.65(2)
105 8.1e+04 0.5 2.0 22.2 2.9 30.9 3.64(3)
106 4.9e+05 0.49 4.5 291.5 3.0 0.6 3.64(4)
106 5.0e+04 0.05 2.0 7.0 2.8 4.5 3.64(4)
Notes. The columns for the implicit case represent: the average hydrodynamical and advective CFL numbers, the average number of Newton
iterations per time step, the average number of GMRES iterations per Newton iteration, the average number of parabolic iterations for the precon-
ditioner per GMRES iteration.
0.014
Ms =101 Ms =101
0.012 Ms =102 Ms =102
Ms =103 0.0125 Ms =103
0.010 Ms =104 Ms =104
Ms =105 Ms =105
0.008 Ms =106 0.0124 Ms =106
dK
dK
dt
dt
0.006
0.004 0.0123
0.002
0.0122
0.0000 5 10 15 20 8.0 8.2 8.4 8.6
Time(s) Time(s)
Fig. 4. Decay rate of the Taylor Green vortex for different Mach numbers. The right panel shows a zoom on the peak of the decay rate. Time is
measured in units of L/u0 , the decay rate in units of 0 u30 /L.
Table 2. Maximum decay rate measured during the decay of the Taylor- As a result, for the same number of degrees of freedom (i.e.,
Green vortex for different Mach numbers. same matrix size), a 3D computation is inherently more expen-
sive than a 2D computation. Secondly, in 3D, the typical size
Mach No. Time of maximum Value of a problem is much larger than in 2D, essentially due to the
101 8.1656 1.2504(2) larger number of cells, but also due to the extra variable (the
102 8.1695 1.2496(2) third velocity component). For instance, a 1282 computation has
103 8.1695 1.2496(2) 4 1282 = 65 536 degrees of freedom, whereas a 1283 computa-
104 8.1696 1.2496(2) tion has 5 1283 = 10 485 760 degrees of freedom. The compu-
105 8.1695 1.2498(2) tational costs (cpu time+memory) for some of the components
106 8.1681 1.2520(2) of the quasi-Newton methods do not scale linearly with the prob-
lem size. Thus, this increase in degrees of freedom corresponds
Notes. Time is measured in units of L/u0 , the decay rate in units to a prohibitive increase in both cpu time and memory. For these
of 0 u30 /L. reasons, we can only perform a comparison with an extremely
low resolution, not necessarily relevant to the analysis of physi-
cal processes in stars. Since the JFNK+PBP is now the method
5.2.2. 3D red giant models implemented in MUSIC for the purpose of running 3D simula-
tions, we want to illustrate the potential of this method.
The efficient computation of 3D models is the main motivation We do so by performing computations of the red giant model
for moving beyond the framework of quasi-Newton methods. for a grid size of 72 652 (roughly 1.5 million degrees of free-
We cannot, however, meaningfully compare the performance of dom), using both the Broyden and JFNK+PBP methods. For the
the later method to that of the JFNK+PBP method for 3D stellar reasons presented previously, this is the largest problem size that
models, as done in the previous section. As shown previously, we could consider using the serial version of MUSIC on a single
quasi-Newton methods, such as the Broyden method, perform node of the supercomputer Zen at the University of Exeter. Each
well in 2D. However, their cost increases significantly in 3D. node has 12 cores and 24 Gb of RAM, and a full node was re-
The reasons are twofold. Firstly, in 3D, the Jacobian matrix has quested to benefit from the available memory. It is clear that the
a more complex structure than in 2D, due to the third dimen- memory requirement of the ILU factorization restricts the range
sion. This implies an increase in the cost for the construction of accessible resolutions, even if domain decomposition is used
and storage of the Jacobian matrix and its ILU factorization. to distribute the problem among several computer nodes.
A153, page 12 of 17
Table 3. Comparison of the performance of the JFNK+PBP method presented in this paper with the Broyden methods presented in Viallet et al.
(2013), for the 2D red giant test case.
Method CFLhydro CFLadv CFLrad Newton

t
GMRES
Newton
Simulated time
Wall time
CFLadv,max = 0.5
Broyden(102 )+ILU(1) 17.7 0.46 4.8 7.3 4.0 227
JFNK(101 )+ PBP 18.9 0.49 4.9 6.2 7.2 202
JFNK(102 )+ PBP 18.9 0.49 4.9 4.6 18.4 161
JFNK(104 )+PBP 18.9 0.49 4.9 4.6 37.8 80
CFLadv,max = 1
Broyden(101 )+ILU(1) 37.9 0.93 8.2 9.5 4.3 382
JFNK(101 )+PBP 40.2 0.98 8.6 6.6 18.5 200
JFNK(102 )+PBP 40.0 0.98 8.5 5.7 41.3 124
JFNK(104 )+PBP 40.0 0.98 8.5 5.7 200.7 32
CFLadv,max = 1.5
Broyden(101 )+ILU(1) 45.2 1.10 9.6 13.2 4.9 383
JFNK(101 )+PBP 55.2 1.42 11.0 8.9 37.8 121
JFNK(102 )+PBP 55.8 1.43 11.0 7.8 79.3 73
JFNK(104 )+PBP 55.7 1.43 11.0 7.7 252.0 33
Notes. The value of the linear tolerance is given in parenthesis after the name of the method; ILU(k) refers to an incomplete LU factorization of
order k; PBP refers to physics-based preconditioning for sound waves only.
Table 4. Similar to Table 3, for the 2D young Sun models.
Newton GMRES Simulated time

Method CFLhydro CFLadv CFLrad t Newton Wall time
CFLadv,max = 0.5
Broyden(101 )+ILU(2) 235.5 0.50 2.5(7) 6.6 12.3 74
JFNK(101 )+PBP 235.0 0.50 2.5(7) 6.7 10.4 39
JFNK(102 )+PBP 227.4 0.50 2.4(7) 5.2 15.8 35
JFNK(104 )+PBP 235.0 0.50 2.5(7) 5.2 19.9 28
CFLadv,max = 1
Broyden(101 )+ILU(2) 474.8 1.00 5.1(7) 8.2 16.7 106
JFNK(101 )+PBP 474.9 1.00 5.1(7) 7.5 19.5 39
JFNK(102 )+PBP 471.8 0.99 5.1(7) 8.1 20.0 35
JFNK(104 )+PBP 471.8 0.99 5.1(7) 8.0 20.0 34
CFLadv,max = 1.5
Broyden(102 )+ILU(2) 681.3 1.50 7.3(7) 9.9 19.8 121
JFNK(101 )+PBP 662.8 1.45 7.1(7) 15.6 19.9 26
JFNK(102 )+PBP 656.9 1.44 7.1(7) 15.9 20.0 25
JFNK(104 )+PBP 668.0 1.46 7.2(7) 16.0 20.0 25
Table 5. Similar to Table 3, for the 3D red giant models.
Newton GMRES Simulated time

Method CFLhydro CFLadv CFLrad t Newton Wall time
CFLadv,max = 0.5
JFNK(101 )+PBP 107.6 0.50 0.74 10.8 7.8 188
JFNK(102 )+PBP 107.6 0.50 0.74 7.3 13.3 183
JFNK(104 )+PBP 107.6 0.50 0.74 5.1 25.6 147
Broyden(101 ) w/o preconditioner 107.6 0.50 0.74 7.6 219.7 122
Broyden(101 )+ILU(2) 107.6 0.50 0.74 6.8 8.7 109
CFLadv,max = 1
Broyden(101 )+ILU(2) 234.5 1.00 1.4 8.7 14.0 203
JFNK(101 )+PBP 227.4 0.97 1.4 21.5 15.7 109
CFLadv,max = 1.5
Broyden(101 )+ILU(2) 342.5 1.50 1.9 10.9 17.1 257
JFNK(101 )+PBP 227.0 0.97 1.4 25.0 22.5 67
A153, page 13 of 17
A&A 586, A153 (2016)
Test runs are performed similarly to the previous section, and step relies on a semi-implicit solver, which is inexpensive and
the results are summarized in Table 5. The JFNK+PBP method rather inaccurate6 , that treats sounds waves (and thermal dif-
is more efficient for CFLadv = 0.5, but the Broyden method fusion, if required) implicitly in order to overcome the asso-
remains more efficient for CFLadv = 1 and CFLadv = 1.5. ciated CFL limit on the time step (see Sect. 3). Although we
Surprisingly enough, for this particular case the unprecon- aim at using MUSIC to model stellar interiors, the JFNK+PBP
ditioned Broyden method performs almost as well as the method can be applied to general advection and/or diffusion-
preconditioned version, showing that the ILU preconditioner be- dominated problems. Although many approximations enter the
comes inefficient due to its cost. However, we stress that an un- derivation of our SI scheme, they do not restrict its range of
preconditioned Broyden method is not a viable option for sci- applicability.
entific applications. The preconditioned version will not be vi- Section 5 presented the results of extensive tests assessing
able for larger problems, due to the increasing cost for com- the performance and accuracy of the new method. A strong
puting and storing the ILU factorization. We expect a more emphasis was put on exploring a wide range of Mach num-
substantial gain compared to the quasi-Newton methods when bers, namely six orders of magnitude from Ms = 101 down
larger problems will be considered, but the serial tests performed to Ms = 106 . The tests assessed the ability of the physics-
here limit us to relatively small 3D problems. Such small prob- based preconditioner to reduce the number of linear iterations
lems appear already to be on the edge of the capabilities of required by the linear solver. Using the 3D Taylor-Green vor-
quasi-Newton methods. The implementation of the JFNK+PBP tex test, we showed that this solver is computationally efficient,
method in MUSIC now allows us to perform 3D simulations beating the Adams-Bashforth explicit scheme for Ms . 102 .
with resolutions of 5123 of a large fraction (80% in radius) of a We emphasize that the method has several parameters that can
partly convective star, as an initial step toward the study of turbu- be tuned to improve its performance. To achieve the best per-
lent convection and overshooting under realistic stellar interior formance, these parameters should be tuned for the particular
conditions and over relevant physical time scales (Pratt et al., problem being considered. Therefore, we do not claim to have
in prep.) found the best set of parameters, but rather a set that gives
Finally, as stellar models include radiative diffusion, we have very satisfying performances for the various tests performed in
the possibility of using the second version of the physics-based this paper. Furthermore, the performance does not come with
preconditioner, in which thermal diffusion is also treated implic- any loss of accuracy: our method exhibits accuracy that is con-
itly. The stiffness of thermal diffusion is measured by the radia- sistent with the second-order nature of its discretization, and
tive CFL number, defined as: most importantly, the numerical errors are independent of the
t Mach number, at least in the investigated range. However, it
CFLrad = max , (92) appears that Ms 106 is on the edge of the abilities of the
x2 solver, as fine-tuning of some parameters and a reduction of the
where t is the time step, x the mesh spacing, the ther- time step was necessary to pass the tests at such a low Mach
mal diffusivity. As for sound waves, solving thermal diffusion number.
with a time explicit method requires CFLrad . 1 for stabil- The JFNK+PBP method is now the work-horse of the
ity. Implicit methods allow for CFLrad 1, but precondition- MUSIC code, and is used to investigate long-standing prob-
ing is necessary to improve the convergence of the iterative lems in stellar hydrodynamics such as shear mixing and con-
method. For the red giant model, however, our particular treat- vective overshooting. Further developments are now devoted to
ment of the surface implies that the radiative diffusion is not the parallelization of the method, in order to take advantage
very stiff (CFLrad 1), and as such we do not see a substan- of multi-cores/multi-nodes high-performance computers that are
tial difference between the two versions of the physics-based now routinely used in computational physics. Performance and
preconditioner. Concrete examples of stellar cases where pre- scalability tests will be presented elsewhere.
conditioning of thermal diffusion is necessary will be presented
elsewhere.
Acknowledgements. M.V. would like to thank Dana Knoll and Ryosuke Park
for useful discussions on physics-based preconditioning during several visits at
6. Conclusion LANL that motivated this work. M.V. also thanks Hannes Grimm-Strele and
Philipp Edelmann for discussions related to the work presented in this paper.
This work is a continuation of previous efforts devoted to devel- M.V. acknowledges support from a Newton International Fellowship and Alumni
program from the Royal Society during earlier part of this work. Part of this
oping an efficient, accurate fully implicit solver for multidimen- work was funded by the Royal Society Wolfson Merit award WM090065, the
sional hydrodynamics. In Sect. 4 we presented a Jacobian-free Consolidated STFC grant ST/J001627/1STFC, the French Programme National
Newton-Krylov method, which avoids the explicit construction de Physique Stellaire (PNPS) and Programme National Hautes nergies
of the Jacobian matrix. The use of iterative methods to solve (PNHE), and by the European Research Council through grants ERC-AdG
No. 320478-TOFU and ERC-AdG No. 341157-COCO2CASA. The calcula-
the Jacobian equation requires preconditioning at large hydro- tions for this paper were performed on the DiRAC Complexity machine, jointly
dynamical CFL numbers. The main purpose of this paper was funded by STFC and the Large Facilities Capital Fund of BIS, and the University
to present an efficient preconditioner that specifically targets the of Exeter Supercomputer, a DiRAC Facility jointly funded by STFC, the Large
physical processes that are responsible for numerical stiffness, Facilities Capital Fund of BIS, and the University of Exeter.
hence the name of physics-based preconditioners.
This strategy is very different from the more usual algebraic
preconditioners (as, e.g., ILU factorization) which try to ad- References
dress the stiffness of the system by only looking at the Jacobian
matrix structure and numerical values, without considering the Bazn, G., Dearborn, D. S. P., Dossa, D. D., et al. 2003, in 3D Stellar Evolution,
underlying equations. In the context of stellar hydrodynam- eds. S. Turcotte, S. C. Keller, & R. M. Cavallo, ASP Conf. Ser., 293, 1
ics, stiffness results from acoustic perturbations that propagate
on a time scale much shorter than the fluid bulk motion and 6
We applied Picard linearization to derive the scheme and used
possibly from thermal diffusion. Therefore, the preconditioning first-order time discretization methods.
A153, page 14 of 17
Gee, M., Siefert, C., Hu, J., Tuminaro, R., & Sala, M. 2006, ML 5.0 Smoothed Meakin, C. A., & Arnett, D. 2007, ApJ, 667, 448
Aggregation Users Guide, Tech. Rep. Sandia National Laboratories Mock, M., Siess, L., & Mller, E. 2011, A&A, 533, A53
Glatzmaier, G. 2013, Introduction to Modeling Convection in Planets and Stars: Mousseau, V., Knoll, D., & Rider, W. 2000, J. Comput. Phys., 160, 743
Magnetic Field, Density Stratification, Rotation (Princeton University Press) Park, H., Nourgaliev, R. R., Martineau, R. C., & Knoll, D. A. 2009, J. Comp.
Heroux, M. A., Bartlett, R. A., Howle, V. E., et al. 2005, ACM Trans. Math. Phys., 228, 9131
Softw., 31, 397 Reisner, J. M., Mousseau, A., Wyszogrodzki, A. A., & Knoll, D. A. 2005,
Herwig, F., Woodward, P. R., Lin, P.-H., Knox, M., & Fryer, C. 2014, ApJ, 792, Monthly Weather Rev., 133, 1003
L3 Saad, Y., & Schultz, M. H. 1986, SIAM J. Sci. Stat. Comput., 7, 856
Kifonidis, K., & Mller, E. 2012, A&A, 544, A47 van Leer, B. 1974, J. Comput. Phys., 14, 361
Knoll, D. A., & Keyes, D. E. 2004, J. Comput. Phys., 193, 357 Viallet, M., Baraffe, I., & Walder, R. 2011, A&A, 531, A86
LeVeque, R. J. 2007, Finite Difference Methods for Ordinary and Partial Viallet, M., Baraffe, I., & Walder, R. 2013, A&A, 555, A81
Differential Equations: Steady-State and Time-Dependent Problems (SIAM) Yee, H., Vinokur, M., & Djomehri, M. 2000, J. Comp. Phys., 162, 33
A153, page 15 of 17
A&A 586, A153 (2016)
Appendix A: Evolution equation for pressure We have

and acoustic fluctuations
V V X
A.1. Evolution equation for pressure = =
U X U
p p
The linearized equation-of-state yields e e 0 1 0 0

0 1 0 e/ 1/ 0

p p
p = e + e. 0 0 1 u/ 0 1/

(A.1)
e p
p p
e e e 1 e 0
We have the thermodynamic relationships
= e/ 0 .

(B.5)

1/
p p

u/ 0 1/
= 1 3 + 1 ,

(A.2)
e
p
= 3 1 , B.2. Total energy equation

(A.3)
e
In this case U = (, t , u), X = (, t , u), and V = (p, e, u). The
where 1 and 3 are the first and third generalised adiabatic in- transformation matrix U/X is:
dices. We now substitute s with Lagrangian derivatives Dt =
t + u in Eq. (A.1):

1 0 0
U
= t 0 . (B.6)

p X
Dt p = 1 3 + 1 Dt + 3 1 Dt e,

(A.4) u 0

and we use the Lagrangian equations Its inverse is
Dt = u,

(A.5) 1 0 0
X
Dt e = p u + T , = t / 1/ 0 . (B.7)

(A.6) U
u/ 0 1/

to obtain
The transformation matrix from (, e, u) and (, t , u) is
t p + u p = 1 p u + (3 1) T .

(A.7)
1 0 0

t = 0 1 u e , (B.8)
Appendix B: Transformation matrices
u 0 0 1 u

B.1. Internal energy equation

so that the transformation matrix V/X is
In this case U = (, e, u), X = (, e, u), and V = (p, e, u). The
transformation matrix U/X between variables U and X is such p p
1
0 1 0 0
that U = (U/X)X. We have V e e

= 0

0 0
u 1

1
X
1 0 1 0

1 0 0

0 0
U e 0
p p

= .

(B.1) e e 0 1
0 0
X

= 0

u 0

1 u
0 0

1
1 0 0 1

0 0
p p
The inverse transformation is: u p

e e
e

= 0 u .

1

0 0

1 (B.9)
X U 1 e/ 1/ 0

0 0 1
= = . (B.2)
U X
u/ 0 1/ The inverse transformation is
The transformation matrix V/X is p p 1

1 0 0 e e 0
X
= 0 1 u 0 1 0

p p
e e 0 V
001

V 0 1 0 0 0 1
= , (B.3) p 1 p 1 p

X

e e 0

0 0 1
1 0 0 e

= 0 1 u 0

1 0
001

and its inverse X/V is

0 0 1
p 1 p 1 p
p 1 p 1 p
e e 0

e
e e 0

X e

= 0 u .

(B.10)

= 0 .
1

(B.4)
V 0 1

0 0 1
0 0 1
A153, page 16 of 17
Finally, we have
V V X
= =
U X U
p p
e e u p

e
1 0 0
u t / 1/ 0

0 1
u/ 0 1/

0 0 1
p
2 p

t u 1 u p
p
e e e e
= (t u2 )/ 1/ u/ .

(B.11)

u/ 0 1/
A153, page 17 of 17

A Jacobian-Free Newton-Krylov Method For Time-Implicit Multidimensional Hydrodynamics

Uploaded by

Copyright:

Available Formats

A Jacobian-Free Newton-Krylov Method For Time-Implicit Multidimensional Hydrodynamics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Jacobian-Free Newton-Krylov Method For Time-Implicit Multidimensional Hydrodynamics

Uploaded by

Copyright:

Available Formats

A&A 586, A153 (2016)

A Jacobian-free Newton-Krylov method for time-implicit

Received 10 September 2015 / Accepted 7 December 2015

1. Introduction cost of a restricted range of applications due to the underly-

Article published by EDP Sciences A153, page 1 of 17

t = (u), (2) must be solved, where J = FU /X is the Jacobian matrix.

which we substitute in Eq. (30) to obtain u 1

We approach the internal energy equation, Eq. (16), in the same u un 1

Norms of the error

Norms of the error

4.3. Right-preconditioning of GMRES with SI schemes JV V = F V ; (84)

Right-preconditioning of system (72) corresponds to solving the 3. Transform V into X:

Norms of the error

Norms of the error

5. Results Four free parameters enter the JFNK+PBP method: the

101 Ms = 101 101

101 Ms = 101 101

number Ms = u0 /cs , where cs = p0 /0 is the adiabatic sound

Table 1. Summary of the results for the Taylor-Green vortex tests.

Mach No. Implicit Explicit

Method CFLhydro CFLadv CFLrad Newton

Table 4. Similar to Table 3, for the 2D young Sun models.

Newton GMRES Simulated time

Table 5. Similar to Table 3, for the 3D red giant models.

Newton GMRES Simulated time

Appendix A: Evolution equation for pressure We have

B.1. Internal energy equation

The transformation matrix V/X is p p 1

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.