Domain Decomposition — 2D Poisson Equation
Elliptic Equation — 2D Poisson Equation
The Model
We will solve the equation
\[ \nabla^2 u = \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial x^2} = S(x,y) \]
We solve for the function $u(x,y)$ in a 2D domain, $x \in [0,1]$ and $y \in [0,1]$. The boundary conditions are assumed of Dirichlet type, namely$ u(0,y)$, $u(1,y)$, $u(x,0)$, and $u(x,1)$ are given. In addition, the function $u(x,y)$ can also be specified on specific points in the interior domain.
In the following discussion, the source $S(x,y)$ is taken to be a simple displaced Gaussian for simplicity.
The domain is discretized on a 2D numerical lattice, with $x_i = (i-1) h$, $y_j = (j-1)h$, and $h = 1.0/(N-1)$. $N$ is the number of grid points in both directions. The Poisson equation is re-written in finite difference form:
\[ \frac{1}{h^2} \left( u_{i+1,j} + u_{i-1,j} + u_{i,j+1} + u_{i,j-1} - 4 u_{i,j} \right) = S_{i,j} \]
This leads to a large set of linear equations to solve for the field values $u_{i,j}$ on the grid within the domain, keeping $u_{i,j}$ fixed at the boundary. This set of equations can be solved by an iterative method, whereby
\[ u_{i,j}^{k+1} = \frac{1}{4} \left( u_{i+1,j}^k + u_{i-1,j}^k + u_{i,j+1}^k + u_{i,j-1}^k - h^2 S_{i,j} \right) \]
is iterated in place until the changes in $u_{i,j}$
\[ diff = Max \| u_{i,j}^{k+1} - u_{i,j}^{k} \| \]
become less than some predefined tolerance criteria. An arbitrary guess for $u_{i,j}$ is assumed to start with. The method is stable, i.e., the solution will converge.
The old and new values of $u_{i,j}$ are mixed to accelerate the convergence, leading to the Successive Over Relaxation scheme.
\[ \begin{aligned} u_{i,j}^{k+\frac{1}{2}} = \frac{1}{4} \left( u_{i+1,j}^k + u_{i-1,j}^{k+\frac{1}{2}} + u_{i,j+1}^k + u_{i,j-1}^{k+\frac{1}{2}} - h^2 S_{i,j} \right) \\ u_{i,j}^{k+1} = \omega u_{i,j}^{k+\frac{1}{2}} - ( 1-\omega) u_{i,j}^k \end{aligned} \]
The accuracy of the solution remains second order in $h$.
Serial Implementation
The code poisson_2d.c implements the above scheme. The steps in the code are:
- Initialize the numerical grid.
- Provide an initial guess for the solution.
- Set the boundary values & source term.
- Iterate the solution until convergence.
- Output the solution for plotting.
Compile and execute with
gcc poisson_2d.c -lm -o poisson_2d ./poisson_2d N Nx Ny | ./plot_image.py -s Nx,Ny -t "2d Poisson"
where N
, Nx
and Ny
are the
(arbitrary) maximum number of iterations and number of grid points
(image size) respectively. Good numbers to use could be 10000, 300,
300. We introduced plot_image.py in Programming Strategies: Color
Images.
The code stops once good convergence (parameter EPS
)
is obtained.
Note:
- The
chi
parameter which implements the over-relaxation approach is chosen arbitrarily for simplicity (see Numerical Receipes for an optimum choice). - The code is written in the simplest possible way.
- The code outputs the field values to pipe into plot_image.py for field display.
Parallel Implementation
Domain decomposition consists in dividing the original grid among the processes with an equal number of grid points in each process for good load balance. We use here the simplest possible division, i.e., in simple stripes. This results in the following arrangement on each processor.
If the number of grid points does not divide equally among the processes, the extra grid points are distributed one each to a subset of processors as indicated schematically in the following figure.
The domain is divided in stripes which correspond to a split of the
first index in the two dimensional arrays containing the fields. This
scheme is adopted to facilitate the transmission of
the GHOST
lines of points. Recall that C stores two
dimensional arrays in row fashion; therefore transmitting
row ip
of the array phi
containing jj
numbers simply requires
MPI_Ssend( &phi[ip][0] , jj, ... )
with no further packing. This is efficient and easy to implement. Note that a FORTRAN programmer would divide the domain in stripes along the second index of the array since FORTRAN stores two dimensional arrays in columns.
The code should follow very much the approach of the 1D code. Some functions, like
decompose_domain()
, could even be borrowed as is since
the stripe decomposition is essentially a 1D domain
decomposition. The flowchart below shows the structure that the code
should have:
Red and Black algorithm
The parallel and serial implementations of the Poisson equation
solver in 2D as implemented above yield the same results to within
the tolerance only. Yet these two implementations are not strictly
identical. This stems from the handling of the GHOST
lines in the parallel implementation.
Consider the update of the field at the location marked 0 below. The Gauss-Siedel solution calls for an in-place update of the field via the finite difference formula in a systematic sweep of the numerical lattice. This is done via the 2D finite difference formula. The values of the field at the points 2 and 3 have already been updated if the sweep originates from the upper-left corner of the lattice, while the field at the point 1 and 4 are yet to be updated.
The sub-domain decomposition requires the transmission of
the GHOST
line of values at each iteration. This is
normally done before each Gauss-Siedel step. The values of the field
in these GHOST
lines are those of the previous
Gauss-Siedel iteration. The value of the field at point "2" below is
no longer the updated one. Therefore the parallel implementation is
strictly not Gauss-Siedel.
The solution to this problem is to modify the Gauss-Siedel algorithm into the the Red and Black algorithm. In this approach the grid is scanned alternatively on the "red" then on the "black" dots. The name is reminiscent of a checker board.
It is clear from the picture that the needed values of the field at
points 1, 2, 3, and 4 are the old field values (black) when updating
the field at point 0, and vice-versa for any red point. The algorithm
then calls fro two subsequent sweep of the lattice alternately on the
red and black points. In a parallel implementation,
the GHOST
lines are exchanged twice at each iteration:
- Exchange the
GHOST
lines. - Update the red points on the lattice.
- Exchange the
GHOST
lines. - Update the black points on the lattice.
This algorithm yields numerically equivalent solutions between the serial and parallel implementations, but might be somewhat slower than the strict Gauss-Siedel.