Abstract: This paper provides an overview of the new features of the finite element library
deal.II, version 9.2.
1 Overview
deal.II version 9.2.0 was released May 20, 2020. This paper provides an overview of the
new features of this release and serves as a citable reference for the deal.II software library
version 9.2. deal.II is an object-oriented finite element library used around the world in the
development of finite element solvers. It is available for free under the GNU Lesser General Public
License (LGPL). Downloads are available at and https://github.
The major changes of this release are:
These major changes are discussed in detail in Section 2. There are a number of other noteworthy
changes in the current deal.II release that we briefly outline in the remainder of this section:
– deal.II had decent support for solving complex-valued problems for a while already
(e.g., ones in quantum mechanics – like the equation used in the step-58 tutorial program
covered below – or for time-harmonic problems). However, there were two areas in which
support was missing. First, the UMFPACK direct solver packaged with deal.II did not
support solving complex-valued linear problems. This has now been addressed: UMFPACK
actually can solve such systems, we just needed to write the appropriate interfaces. Second,
the DataOut class that is responsible for converting nodal data into information that can
then be written into files for visualization did not know how to deal with vector- and tensor-
valued fields whose components are complex numbers. An example for this is to solve the
time-harmonic version of the Maxwell equations that has the electric and magnetic fields as
solution. This, too, has been addressed in this release.
– The new DiscreteTime class provides a more consistent, more readable, and less error-
prone approach to control time-stepping algorithms within time-dependent simulations.
While providing a rich read-only interface, the non-const interface of this class is designed
to be minimal to enforce a number of important programming invariants, reducing the
possibility of mistakes in the user code. For instance, DiscreteTime ensures that the final
time step ends precisely on a predefined end time, automatically lengthening or shortening
the final time step.
– A key component of deal.II are the FEValues and FEFaceValues classes that evaluate finite
element functions at quadrature points located on cells and faces of a cell, respectively [13].
This release now contains a class FEInterfaceValues that considers the restriction of the
shape functions from both sides of a face and allows evaluating jumps and averages of
shape functions along this face. These are common components of the bilinear forms of
discontinuous Galerkin schemes (as well as schemes for fourth-order equations, see the
discussion of step-47 below) and greatly simplify the implementation of these methods.
The changelog lists more than 240 other features and bugfixes.
way cannot be adaptively refined after construction, though we plan to improve this for the next
The new fully distributed triangulation class supports 1D, 2D, and 3D meshes including geometric
multigrid hierarchies, periodic boundary conditions, and hanging nodes.
As part of this effort, we ran benchmarks on the TACC Frontera system, where we were able to
apply the matrix-free geometric multigrid framework to a variable viscosity Stokes system and
achieved weak and strong scaling up to 114K MPI ranks with up to 2.1 × 1011 unknowns. This
is likely the largest block system currently solved with deal.II and required various optimiza-
tions and fixes on top of the ones mentioned above: (i) Bug fixes to concurrent point to point
communications. (ii) Fixes to multigrid transfer with adaptive refinement and more than 4 × 109
unknowns. (iii) Fixes to index sets in block indices with more than 4 × 109 unknowns. (iv) Fixes to
computations with more than 4 × 109 active cells. (v) Implementation of IDR(s) solvers to reduce
memory overhead. For more details, see [19].
Since the previous release, deal.II has had support for hp-adaptive finite element methods
on distributed memory systems [6]. We implemented the bare functionality for hp-adaptive
methods with the objective to offer the greatest flexibility in their application. Here, reference
finite elements still had to be assigned manually to each cell, which may not lead to an optimal
mesh and is tedious.
With the current release, hp-adaptive finite element methods have been further expanded: New
features like decision strategies have been added and the user interface has been overhauled,
effectively making hp-methods more attractive to use. We introduced many new functions that
automatize the general workflow for applying hp-decision strategies, which run on top of the
previous low-level implementation for both serial and parallel applications.
The interface is now as simple to use as the one for h-adaptive mesh refinement. Consider the
following (incomplete) listing as an example: We estimate both error and smoothness of the finite
element approximation. Further, we flag certain fractions of cells with the highest and lowest
errors for refinement and coarsening, respectively (here: 30%/3%). From those cells listed for
adaptation, we designate a subset for h- and p-adaptation. The parameters of the corresponding
hp::Refinement function specify the fraction of cells to be p-adapted from those subsets flagged
for refinement and coarsening, respectively (here: 90%/80%), while the remaining fraction will be
h-adapted (here: 10%/20%).
C++ code
Vector <float> estimated_error_per_cell ( n_active_cells );
KellyErrorEstimator :: estimate (
dof_handler , ..., solution , estimated_error_per_cell , ...);
GridRefinement :: refine_and_coarsen_fixed_number (
triangulation , estimated_error_per_cell , 0.3, 0.03);
whether the choice of adaptation in the previous cycle was justified, and provide a criterion for
the choice in the next cycle [54].
In general, p-refinement is favorable over h-refinement in smooth regions of the finite element
approximation [9, Thm. 3.4]. Thus, estimating its smoothness provides a suitable decision in-
dicator for hp-adaptation. For this purpose, we express the finite element approximation in an
orthogonal basis of increasing frequency, and consider the decay of their expansion coefficients
as the estimation of smoothness. This has been implemented for both Fourier coefficients [14]
and Legendre coefficients [53, 41, 42, 26].
functions. In this example, the selected arithmetic type numerical result will be of type double,
and the LLVM JIT optimizer will be invoked. It will employ common subexpression elimination
and aggressive optimizations during compilation. The user then informs the optimizer of all of
the independent variables and the dependent expressions, and invokes the optimization process.
This is an expensive call, as it determines an equivalent code path to evaluate all of the dependent
functions at once. However, in many cases each evaluation has significantly less computational
cost than evaluating the symbolic expressions directly. Evaluation is performed when the user
constructs a substitution map, giving each independent variable a numerical representation,
and passes those to the optimizer. After this step, the numerical equivalent of the individual
dependent expressions may finally be retrieved from the optimizer.
C++ code
using namespace Differentiation ::SD;
This expense of invoking the optimizer may be offset not only by the number of evaluations
performed, but also by the amount of reuse each instance of a BatchOptimizer has. In certain
circumstances, this can be maximized by generalizing the way in which the dependent expressions
are formulated. For example, in the context of constitutive modelling the material coefficients
may be made symbolic rather than encoding these into the dependent expressions as numerical
values. The optimizer may then be used to evaluate an entire family of constitutive laws, and
not a specific one that describes the response of a single material. Thereafter, serializing the
optimizer instance and reloading the contents during subsequent simulations permits the user to
skip the optimization process entirely. Serialization also enables these complex expressions to be
compiled offline.
The class VectorizedArray<Number> is a key component to achieve the high node-level per-
formance of the matrix-free algorithms in deal.II [46, 47]. It is a wrapper class around a
short vector of n entries of type Number and maps arithmetic operations to appropriate single-
instruction/multiple-data (SIMD) concepts by intrinsic functions. The class VectorizedArray
has been made more user-friendly in this release by making it compatible with the STL algo-
rithms found in the header <algorithm>. The length of the vector can now be queried by
VectorizedArray::size() and its underlying number type by VectorizedArray::value_type.
The VectorizedArray class now supports range-based iteration over its entries. In addition
Table 1: Supported vector lengths of the class VectorizedArray and the corresponding instruction-set-
architecture extensions.
double float ISA
VectorizedArray<double, 1> VectorizedArray<float, 1> (auto-vectorization)
VectorizedArray<double, 2> VectorizedArray<float, 4> SSE2/AltiVec
VectorizedArray<double, 4> VectorizedArray<float, 8> AVX/AVX2
VectorizedArray<double, 8> VectorizedArray<float, 16> AVX-512
deal.II now supports ternary operations on vectorized data, where for a given binary compari-
son operation a true or false value is selected. For example, the vectorized equivalent of (left <
right) ? true_value : false_value can be expressed as
C++ code
auto result = compare_and_apply_mask < SIMDComparison :: less_than >(
left , right , true_value , false_value );
which compares every element of the vectorized array individually and selects the corresponding
value from the true_value, or false_value array.
In previous deal.II releases, the vector length was set at compile time of the library to match the
highest value supported by the given processor architecture. Now, a second optional template
argument can be specified as VectorizedArray<Number, size>, where size explicitly controls
the vector length within the capabilities of a particular instruction set. (A full list of supported
vector lengths is presented in Table 1.) This allows users to select the vector length/ISA and, as
a consequence, the number of cells to be processed at once in matrix-free operator evaluations.
For example, the deal.II-based library [56], which solves the 6D Vlasov–Poisson
equation with high-order discontinuous Galerkin methods (with more than a thousand degrees
of freedom per cell), constructs a tensor product of two MatrixFree objects of different SIMD-
vector length in the same application and benefits—in terms of performance—by the possibility
of decreasing the number of cells processed by a single SIMD instruction.
The new interface of VectorizedArray also enables replacement by any type with a matching
interface. Specifically, this prepares deal.II for the std::simd class that is slated to become part
of the C++23 standard. Table 2 compares the deal.II-specific SIMD classes and the equivalent
C++23 classes. These changes also prepare for specialized code paths exploiting vectorization
within an element, see [47].
require communication. Before evaluating the matrix-free operator at these degrees of freedom,
we need to communicate ghost DoFs from other processors. Similarly once the operator has
been evaluated, we need to update the resulting global vector with values from other processors.
The strategy that we are now using consists in splitting the DoFs into three groups: one group
of DoFs that are on the local boundary and two groups each owning half of the interior DoFs.
When evaluating the matrix-free operator, we start the MPI communication to get the ghosted
DoFs and evaluate the operator on one of the two interior DoFs group. When the evaluation
is done, we wait for the MPI communication to be over, and then evaluate the operator on the
boundary DoFs group. When this evaluation is completed, we start the communication to update
the global vector, we evaluate the operator on the last group of DoFs, and finally wait for the MPI
communication to finish.
– step-47 is a new program that solves the biharmonic equation ∆2 u = f with “clamped”
boundary condition given by u = g, ∂u/∂n = h. This program is based on the C0 interior
penalty (C0 IP) method for fourth order problems [17]. In order to overcome shortcomings of
classical approaches, this method uses C0 Lagrange finite elements and introduces “jump”
and “average” operators on interfaces of elements that penalize the jump of the gradient of
the solution in order to obtain convergence to the H2 -regular solution of the equation.
The C0 IP approach is a modern alternative to classical methods that use C1 -conforming
elements such as the Argyris element, the Clough-Tocher element and others, all developed
in the late 1960s. From a twenty-first century perspective, they can only be described as
bizarre in their construction. They are also exceedingly cumbersome to implement if one
wants to use general meshes. As a consequence, they have largely fallen out of favor and
deal.II currently does not contain implementations of these shape functions.
– step-50 is a program that demonstrates the parallel geometric multigrid features in deal.II
as described in [20]. The problem considered is a variable viscosity Laplace equation and
it is solved with three different approaches: (i) using a matrix-based geometric multigrid
based on Trilinos or PETSc; (ii) using a matrix-free geometric multigrid; (iii) using algebraic
multigrid (Trilinos ML). The tutorial demonstrates the superiority of the matrix-free method
for the problem under consideration, and shows that, for matrix-based formulations, the
performance of algebraic and geometric multigrid methods are roughly comparable.
– step-58 is a program that solves the nonlinear Schrödinger equation, which in non-
dimensional form reads
∂ψ 1
−i − ∆ψ + Vψ + κ|ψ|2 ψ = 0,
∂t 2
augmented by appropriate initial and boundary conditions and using an appropriate form
for the potential V = V(x). The tutorial program focuses on two specific aspects for which
this equation serves as an excellent test case: (i) Solving complex-valued problems without
splitting the equation into its real and imaginary parts (as step-29 does, for example).
(ii) Using operator splitting techniques. The equation is a particularly good test case for
this technique because the only nonlinear term, κ|ψ|2 ψ, does not contain any derivatives
and consequently forms an ODE at each node point, to be solved in each time step in
an operator splitting scheme (for which, furthermore, there exists an analytic solution),
whereas the remainder of the equation is linear and easily solved using standard finite
element techniques.
– step-65 presents TransfiniteInterpolationManifold, a manifold class that can propa-
gate curved boundary information into the interior of a computational domain by transfinite
interpolation [32]. This manifold is a prototype for many other manifolds in that it is rel-
atively expensive to compute the new points, especially for higher order mappings. Since
typical programs query higher-order geometries in a large variety of contexts, the contri-
bution of the mapping to the run time can be significant. As a solution, the tutorial also
presents the class MappingQCache, which samples the information of expensive manifolds
in the points of a MappingQ and caches it. The tutorial shows that this makes all queries to
the geometry very cheap.
– step-67 is an explicit time integrator for the compressible Euler equations discretized with
a high-order discontinuous Galerkin (DG) scheme using the matrix-free infrastructure. Be-
sides the use of matrix-free evaluators for systems of equations and over-integration, it also
presents MatrixFreeOperators::CellwiseInverseMassMatrix, a fast implementation of
the action of the inverse mass matrix in the DG setting using tensor products. Furthermore,
this tutorial demonstrates the usage of new pre and post operations, which can be passed
to MatrixFree::cell_loop(), to schedule operations on sections of vectors close to the
matrix-vector product to increase data locality.
– step-69 presents a first-order scheme solving the compressible Euler equations of gas
dynamics with a graph-viscosity stabilization technique. Beside the usual conservation
properties of mass, momentum, and total energy, the method also guarantees that the
constructed solution obeys pointwise stability constraints (in particular positivity of density
and internal energy, and a local minimum principle on the specific entropy). As such step-69
is strictly speaking more a collocation-type discretization than a variational formulation,
even though it is implemented with finite elements.
The time-update at each node requires the evaluation of a right-hand side that depends
(nonlinearly) on information from the previous time-step that spans more than one cell.
Therefore, assembly loops operate directly on the sparsity graph in order to retrieve infor-
mation from the entire stencil associated with each node. From a programming perspective,
step-69 features a number of techniques that are of interest for a wider audience: It discusses
a hybrid thread and MPI parallelized scheme with efficient MPI node-local numbering of
degrees of freedom. It showcases how to perform asynchronous write-out of results using
a background thread with std::async, and discusses a simple but effective checkpointing
and restart technique.
There are also new programs in the code gallery (a collection of user-contributed programs that
often solve more complicated problems than tutorial programs, and intended as starting points
for further research rather than as teaching tools):
– A program that solves the biphasic nonlinear poro-viscoelasticity equations based on Og-
den hyperelasticity, to explore the porous and viscous contributions in brain mechanics
applications. This program was contributed by Ester Comellas and Jean-Paul Pelteret.
The program also serves as a demonstration of the automatic differentiation capabilities of
triangulation . execute_coarsening_and_refinement ()
The introspective nature of the Python language makes it easy to infer the list of supported meth-
ods from the Python objects, for example by typing dir(PyDealII.Release.Triangulation).
The current Python interface does not yet provide access to deal.II’s finite element machinery,
i.e., classes such as DoFHandler, FE_*, FEValues, etc.
The original deal.II paper containing an overview of its architecture is [13]. If you rely on
specific features of the library, please consider citing any of the following:
– For geometric multigrid: [44, 43, 20]; – For integration with CAD files and tools:
– For distributed parallel computing: [12];
– For boundary element computations:
– For hp adaptivity: [14]; [31];
– For partition-of-unity (PUM) and enrich- – For LinearOperator and PackagedOperation
ment methods of the finite element space: facilities: [51, 52];
– For uses of the WorkStream interface: [66];
– For matrix-free and fast assembly tech-
niques: [46, 47]; – For uses of the ParameterAcceptor con-
cept, the MeshWorker::ScratchData base
– For computations on lower-dimensional class, and the ParsedConvergenceTable
manifolds: [25]; class: [62];
– For curved geometry representations and – For uses of the particle functionality in
manifolds: [34]; deal.II: [28].
Please consider citing the appropriate references if you use interfaces to these libraries. We note
that the nanoflann and NetCDF interfaces are now deprecated and will be removed in deal.II
version 9.3.
The two previous releases of deal.II can be cited as [1, 6].
4 Acknowledgments
deal.II is a world-wide project with dozens of contributors around the globe. Other than the
authors of this paper, the following people contributed code to this release:
Pasquale Africa, Ashna Aggarwal, Giovanni Alzetta, Mathias Anselmann, Kirana Bergstrom,
Manaswinee Bezbaruah, Benjamin Brands, Yong-Yong Cai, Fabian Castelli, Joshua Christopher,
Ester Comellas, Katherine Cosburn, Denis Davydov, Elias Dejene, Stefano Dominici, Brett Dong,
Luel Emishaw, Niklas Fehn, Rebecca Fildes, Menno Fraters, Andres Galindo, Daniel Garcia-
Sanchez, Rene Gassmoeller, Melanie Gerault, Nicola Giuliani, Brandon Gleeson, Anne Glerum,
Krishnakumar Gopalakrishnan, Graham Harper, Mohammed Hassan, Nicole Hayes, Bang He, Jo-
hannes Heinz, Jiuhua Hu, Lise-Marie Imbert-Gerard, Manu Jayadharan, Daniel Jodlbauer, Marie
Kajan, Guido Kanschat, Alexander Knieps, Uwe Köcher, Paras Kumar, Konstantin Ladutenko,
Charu Lata, Adam Lee, Wenyu Lei, Katrin Mang, Mae Markowski, Franco Milicchio, Adriana
Morales Miranda, Bob Myhill, Emily Novak, Omotayo Omosebi, Alexey Ozeritskiy, Rebecca
Pereira, Geneva Porter, Laura Prieto Saavedra, Roland Richter, Jonathan Robey, Irabiel Romero,
Matthew Russell, Tonatiuh Sanchez-Vizuet, Natasha S. Sharma, Doug Shi-Dong, Konrad Simon,
Stephanie Sparks, Sebastian Stark, Simon Sticko, Jan Philipp Thiele, Jihuan Tian, Sara Tro, Ferdi-
nand Vanmaele, Michal Wichrowski, Julius Witte, Winnifried Wollner, Ming Yang, Mario Zepeda
Aguilar, Wenjuan Zhang, Victor Zheng.
Their contributions are much appreciated!
deal.II and its developers are financially supported through a variety of funding sources:
D. Arndt and B. Turcksin: Research sponsored by the Laboratory Directed Research and Devel-
opment Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S.
Department of Energy.
W. Bangerth, T. C. Clevenger, and T. Heister were partially supported by the Computational
Infrastructure in Geodynamics initiative (CIG), through the National Science Foundation under
Award No. EAR-1550901 and The University of California – Davis.
W. Bangerth was also partially supported by award OAC-1835673 as part of the Cyberinfrastruc-
ture for Sustained Scientific Innovation (CSSI) program, DMS-1821210, and EAR-1925595.
B. Blais was partially supported by the National Science and Engineering Research Council of
Canada(NSERC) through the RGPIN-2020-04510 Discovery Grant
T. C. Clevenger was also partially supported EAR-1925575 and OAC-2015848.
A. V. Grayver was partially supported by the European Space Agency Swarm DISC program.
Timo Heister was also partially supported by the National Science Foundation (NSF) Award
DMS-2028346, OAC-2015848, EAR-1925575, and by Technical Data Analysis, Inc. through US
Navy STTR Contract N68335-18-C-0011.
L. Heltai was partially supported by the Italian Ministry of Instruction, University and Research
(MIUR), under the 2017 PRIN project NA-FROM-PDEs MIUR PE1, “Numerical Analysis for Full
and Reduced Order Methods for the efficient and accurate solution of complex systems governed
by Partial Differential Equations”.
M. Kronbichler was supported by the German Research Foundation (DFG) under the project
“High-order discontinuous Galerkin for the exa-scale” (ExaDG) within the priority program
“Software for Exascale Computing” (SPPEXA) and the Bayerisches Kompetenznetzwerk für
