Final Report
Final Report
Final Report
MATT BEAL & MIKE BRESKE We consider the task of solving a parabolic problem numerically. This problem is normally solved via time marching. However, we seek a way to solve the problem parallel in time. Solving the evolution problem in parallel may be approached by rst representing the solution of a parabolic partial dierential equation as a contour integral in the left half of the complex plane via a Laplace transform, then by using a change of variables to transform this complex integral over [0, ) into a real integral along the interval [1, 1], and nally, we apply a standard quadrature scheme to this integral. This method requires that we solve a nite set of elliptic partial dierential equations, which are independent and thus naturally parallel. Approach and Methodology [1], [2] We consider the initial value problem ut + Au = f (t) for t > 0, with u(0) = u0
2 ).
(1.1) Begin by
where A is a symmetric, positive denite operator with a compact inverse (i.e. applying the Laplace transform dened by
w(z) = u(z) =
0
ezt u(t)dt
(1.2)
to the initial value problem (1.1). In this way we derive (zI + A)w(z) = u0 + f (z) and thus w(z) may be represented as w(z) = (zI + A)1 (u0 + f (z)), z (1.4) (1.3)
After solving for w(z) we can obtain u(t) by the inverse of the Laplace transform (1.2). Hence, the solution may be written in the form
u(t) =
v(t, y)dy =
(1.5)
where is a conveniently chosen path in the left half of the complex plane. We will use = {z : z = (y) + iy, y R, y increasing}, where is a positive parameter, and : R R is a smooth function. In our application, we chose = y 2 + 2 for y R and suitable parameters R and > 0. Our approach to approximate the solution of (1.1) is then to apply a quadrature scheme to (1.5). We develop a quadrature formula over the real axis R = (, ) by making a transformation to the nite interval (1, 1), and then applying the trapezoidal rule. Under appropriate conditions the resulting quadrature formula has a high order of accuracy. We will then apply this formula to our representation of the solution of the parabolic problem (1.5). To dene an appropriate quadrature
formula, we rst make a change of variables y = y(), where y() is a smooth increasing function mapping (1, 1) to R. More specically, with a positive parameter, we choose y() to be y() = 1 (), where () = log((1 + )/(1 )). (1.6)
Applying the composite trapezoidal rule with spacing 1/N to the integral over (1, 1), we now dene N 1 1 QN, (v) = j v(yj ) (1.7) N
N +1
where
2 j = j/N, j = (j ) = 2/(1 j ), yj = y(j ) = 1 (j )
The method for approximating a solution to (1.1) can be summarized as follows: We use a representation of the solution as an integral along a smooth curve in the left half of the complex plane which is rst transformed to a nite interval, then evaluated to a high accuracy by a quadrature rule. The problem is then reduced to a nite set of elliptic equations with complex coecients which may then be solved in parallel. The method is combined with nite-element discretization in spatial variables. Numerical Example Example 1 We now consider the spatially one-dimensional problem ut uxx = f (x, t), u(x, t) = 0, for x = 0 and , t > 0, for 0 < x < , t > 0, with u(x, 0) = u0 (x), for 0 < x < .
We choose f and u0 so that the exact solution is u(x, t) = (1 + t)et sin x + cos te2t sin 2x giving a forcing term f (x, t) = et sin x + e2t (2cos t sin t)sin 2x and u0 (x) = sin x + sin 2x. Thus f (x, z) = 1 2z + 3 sin x + sin 2x. 1+z (z + 2)2 + 1
To solve the spatial problem in this case we use piecewise linear nite-elements on a uniform mesh with spacing h = /M , with M = 10, 20, 40, 80, where M is the number of basis functions on the interval [0, ] used for the nite-element approximation of (1.4). Table 1 shows the absolute maximum error M with N = 20 for the discretization in time, and M = 10, 20, 40, and 80 for the nite-element discretization.
Table 1: Absolute Maximum Error t 10 0.2 1.322E-02 0.4 7.267E-03 0.6 7.472E-03 0.8 6.231E-03 1.0 4.453E-03 1.2 2.803E-03 1.4 1.472E-03 1.6 1.260E-03 1.8 1.175E-03 2.0 1.216E-03 3.0 1.434E-03 4.0 1.050E-03
2.182E-02 4.675E-03 1.893E-03 1.573E-03 1.124E-03 7.161E-04 3.813E-04 3.207E-04 2.974E-04 3.045E-04 3.584E-04 2.632E-04
2.401E-02 4.277E-03 5.057E-04 4.009E-04 2.792E-04 1.082E-04 9.627E-05 8.034E-05 7.448E-05 7.615E-05 8.953E-05 6.563E-05
2.455E-02 4.176E-03 1.594E-04 1.411E-04 6.706E-05 4.533E-05 2.479E-05 1.991E-05 1.853E-05 1.904E-05 2.231E-05 1.619E-05
Parallel Implementation Our main task is to solve 2N 1 elliptic partial dierential equations in parallel. Each processing core must solve (2N 1/P ) partial dierential equations, where P is the total number of processing cores in the parallel environment. Since 2N 1 is odd and we generally choose P to be a power of two, we choose N such that (2N 1)/P always yields a remainder of one. This is the best load balancing that can be achieved in this case because we are solving an odd number of problems with an even number of cores. In this method, each core generates a set of (2N 1)/P points zj along the contour . Then for each zj , the stiness matrix and load vector are generated, and the displacement vector is computed using the lapack solver ZGESV. The output of the solver is then used to evaluate the solution of (1.4) for a given zj . This solution is used in (1.7) to compute each cores partial sum QN,, (v), where = 1, 2, 3, ..., P . Once this is completed by each core, mpi reduce is called to sum each cores partial sum QN,, (v) to yield the full sum given in (1.7) on the master core. This is our nal approximation to the solution of the parabolic partial dierential equation given in (1.1). Parallelization Results The method described above is naturally parallel due to the 2N 1 complex-valued elliptic problems required for the evaluation of the quadrature scheme being completely independent. Each elliptic problem can be solved individually in a parallel computing environment. As a result, we expect to see a nearly perfect linear speed up when solving the problem in parallel. However, there are several practical problems that arise in the 1-D case that make the expected speedup dicult to attain. First, load balancing across the communicator becomes problematic due to the 2N-1 elliptic problems that must be solved. This number is always odd, and so using the standard grouping of cores (4, 8, 16, etc.) means that there will always be at least 1 remaining problem to be solved by a designated core. Having remainders in the system can add a signicant amount of time to the total needed to solve the linear system and thus aect speedup adversely. This is due to the cores that have already completed computations waiting for the core handling the remaining problems to nish. Second, the linear solver used to solve the system is highly optimized, and the computation time required to solve the linear system for reasonable numbers of basis functions is much less than the communication time, which dominates total time for smaller 1-D systems. In an attempt to attain predicted speedup values, we choose a large number of basis functions in order to strain the linear solver to allow computation time to dominate communication time. Table 2 shows computing time, speedup ratio, and parallel eciency for N = 33 and M = 8000. Table 3 shows experimental fraction and maximum theoretical speedup via Amdahls Law for P = 2, 4, 8, 16, and 32.
Table 2: Computing time (T in s), speedup ratio (), Problem Size P 1 2 4 N = 33 T 217.31 144.21 108.25 M = 8000 1.507 1.332 EP 0.753 0.333
and parallel eciency (EP ) 8 16 32 54.28 28.90 15.26 1.994 1.878 1.893 0.249 0.117 0.592
Table 3: Experimental fraction (expf ) and Amdahls Law estimate for maximum speedup P 2 4 8 16 32 expf 0.327 0.331 0.143 0.075 0.040 M ax.Speedup 1.507 2.008 4.000 7.5202 14.236 We see from Table 3 that for a xed problem size, the experimental serial fraction shows a decreasing trend as the number of cores increases, and the method is in fact strongly scalable. An algorithm is said to be strongly scalable if, for a xed data size, the execution time decreases in inverse proportion to the number of processing cores. From the computation times given in Table 2 and in the gure below, we see that this method is strongly scalable.
Computation Time vs. Processing Cores for N=33 and M=8000 250
150
100
50
When the size of the nite element problem grows larger, the speedup ratios become closer to two, reecting our expectation that as the size of the distributed problems increase, the parallelization becomes more eective. Since the total time for the problem we have selected continues to decrease roughly in proportion to the number of processors used, the parallelization has been successful. We expect that the problems with communication time and load balancing will become less of an issue when the problem is escalated to two and three spatial dimensions.
References
[1] Dongwoo Sheen, Ian H. Sloan, and Vidar Thome. A parallel method for time-discretization of parabolic problems based on contour integral representation and quadrature. Mathematics of Computation, 69(229):177195, 1999. [2] Dongwoo Sheen, Ian H. Sloan, and Vidar Thome. A parallel method for time-discretization of parabolic equations based on laplace transformation and quadrature. IMA Journal of Numerical Analysis, 2003(23):269299, 2003.