
At this point it is assumed that you have worked through the first tutorial in which permanent guardcell storage was allocated.
We have arranged the tutorials in this order because we believe this presents a new user with a much more manageable learning curve. The absence of permanent guardcell storage imposes a more complex structure on the user's application. In this tutorial we will show how an application written for permanent guardcell memory allocation can be modified to run without it.
Before we begin to modify code, a brief discussion of the key issues associated with the absence of permanent guardcell storage is necessary.
The absence of permanent guardcell storage requires that blocks acquire their guardcell data when they need it. PARAMESH does this by setting up a data structure for a single working block which does have guardcell storage allocation. Therefore, to advance the solution on a given block, block A say,
We need to store a second snapshot of the entire solution at certain `synchronization' points in the application's flowchart. Why? Suppose we have just one copy of the solution in memory. The working block acquires it's data from this copy, and the time-advance routine writes back into it. Midway through the process of time-advancing the individual blocks, some of the blocks have data which is time-advanced and some do not. The guardcell filling operation will be using a mixture of time levels as it's source data, which is of course unacceptable. Therefore, before any blocks are allowed to advance their solution, we make a copy of the entire solution. The guardcell filling operation uses this copy. This enables the time-advance code to update the original `master' copy, UNK, without interfering with the guardcell data which later blocks will acquire. We provide a set of routines with very simple argument lists to accomplish these extra steps.
Step 1. - Modifications to the Header files.
#define NO_PERMANENT_GUARDCELLS
Step 2. - Modify the time-advance routine.
real :: old_soln(il_bnd1:iu_bnd1,jl_bnd1:ju_bnd1,kl_bnd1:ku_bnd1)
! Store a copy of the current solution in gt_unk
call amr_1blk_copy_soln(-1)
! restrict the data from leaf blocks to their parent blocks
if (.not.advance_all_levels) then
iempty = 0
iopt = 1
call amr_restrict(mype,iopt,iempty)
endif
! exchange data between processors in preparation for guardcell filling
tag_offset = 100
iopt = 1
lguard = .true.
lprolong = .false.
lflux = .false.
ledge = .false.
lrestrict = .false.
lfulltree = .false.
lcc = .true.
lfc = .false.
lec = .false.
lnc = .false.
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call mpi_amr_comm_setup(mype,nprocs, &
lguard,lprolong,lflux,ledge, &
lrestrict,lfulltree, &
iopt,lcc,lfc,lec,lnc,tag_offset)
! Copy data from current block into working block and fill its guardcells
idest = 1
iopt = 1
nlayers = nguard
lcc = .true.
lfc = .false.
lec = .false.
lnc = .false.
l_srl_only = .false.
icoord = 0
ldiag = .true.
call amr_1blk_guardcell(mype,iopt,nlayers,lb,mype, &
lcc,lfc,lec,lnc, &
l_srl_only,icoord,ldiag)
old_soln(:,:,:) = unk1(1,:,:,:,idest)
! set values for unk1
do k=kl_bnd1+nguard*k3d,ku_bnd1-nguard*k3d
do j=jl_bnd1+nguard*k2d,ju_bnd1-nguard*k2d
do i=il_bnd1+nguard,iu_bnd1-nguard
unk1(1,i,j,k,idest) = old_soln(i,j,k) + dt/(dx*dx)* ( &
old_soln(i+1,j,k) + old_soln(i-1,j,k) + &
old_soln(i,j+1,k) + old_soln(i,j-1,k) - &
old_soln(i,j,k)*4.0 )
enddo
enddo
enddo
! Store new solution for this block in secondary copy
call amr_1blk_to_perm( lcc,lfc,lec,lnc,lb,iopt,idest)
! Expose interfaces to paramesh subroutines used here !
use paramesh_interfaces, only : amr_1blk_copy_soln, &
amr_1blk_guardcell, &
amr_1blk_to_perm, &
amr_restrict
use paramesh_mpi_interfaces, only : mpi_amr_comm_setup
integer :: iempty
logical :: lcc,lfc,lec,lnc,ldiag,l_srl_only
logical :: lguard,lprolong,lflux,ledge,lrestrict,lfulltree
integer :: iopt,tag_offset,nprocs,ierr
Explanation:
The call to amr_1blk_copy_soln creates the second
copy of the complete solution, which is needed so that the data used
in guardcell filling is all associated with the same simulation time,
regardless of how many blocks have since updated their solutions. The
call to amr_restrict is needed so that guardcell filling will operate
correctly at refinement jumps in the case where we are only advancing
the solution at the 'leaf block' level. The call to mpi_amr_comm_setup
performs the mpi pre-fetching needed to support the calls to
amr_1blk_guardcell
in the following loop over blocks (see next paragraph). We then loop
over the grid blocks. The
call to amr_1blk_guardcell does two things, it first copies the
solution for the current block, here identified as block LB, into the
working space, and then fills the working space guardcells
appropriately. The logical variable declarations are required for the
arguments to this routine. The working block copy of the solution is
the copy which gets time advanced, hence the change from UNK to
UNK1. The loop limits are changed to use the correct variables
(il_bnd1, iu_bnd1, etc.) since they represent indeces WITH
guardcells. Finally, the solution in the interior of the working block
(ie excluding guard cells) is copied back to the `master' copy, UNK,
by the call to amr_1blk_to_perm. Finally, it is VERY IMPORTANT to
expose the paramesh interfaces by using the 'use paramesh_interfaces
... ' statement in any routines that call paramesh subroutines. This
is good practice and
it is required since some paramesh routines now
have optional arguments.
Explanation of mpi_amr_comm_setup call:
This routine is responsible for communicating data between MPI
processes for, in this case, guardcell filling. The variables that
control this subroutine are set before it is called. These set which
communications operation to perform such as guarcell filling,
prolongation, flux correction, or edge averaging. Here we wish to do
guardcell filling, so 'lguard' is set to .true., while the other
variables are .false. The variable 'iopt' controls whether or not the
work array is being worked on or not. The logical variables 'lcc',
'lfc', 'lec', and 'lnc' control which set of variables are worked on
in the case iopt = 1. Here we are using cell-centered variables, so
lcc is set to .true. We also include a call the MPI routine which
gives us the number of processors since it is an argument to
'mpi_amr_comm_setup' Once again we 'use' the paramesh module which
contains the interface to this routine which is contained in
'paramesh_mpi_interfaces'.
Step 3. - Modify the main program.
do j=1,nyb+2*nguard
write(*,50) j,(unk(1,i,j,1,lb),i=1,nxb+2*nguard)
wherever it appears, becomes
do j=1,nyb+2*nguard*npgs
write(*,50) j,(unk(1,i,j,1,lb),i=1,nxb+2*nguard*npgs)
Explanation:
Our original routine i/o included guardcells. These
are no longer present, so we modify the index ranges to allow for the
absence of guardcells. The parameter NPGS is automatically set to 0
for us when we define the preprocessor variable
NO_PERMANENT_GUARDCELLS in AMRDIR/headers/paramesh_preprocessor.fh.
The calls to 'amr_guardcell' are removed since in
NO_PERMANENT_GUARDCELLS mode we fill guardcells on a block-by-block
basis by calls to 'amr_1blk_guardcell'.
Step 4. - Modify the refinement testing routine.
if (.not.advance_all_levels) then
iopt = 1
iempty = 0
call amr_restrict(mype,iopt,iempty)
end if
lcc = .true.
lfc = .false.
lec = .false.
lnc = .false.
tag_offset = 100
lguard = .true.
lprolong = .false.
lflux = .false.
ledge = .false.
lrestrict = .false.
lfulltree = .false.
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call mpi_amr_comm_setup(mype,nprocs, &
lguard,lprolong,lflux,ledge, &
lrestrict,lfulltree, &
iopt,lcc,lfc,lec,lnc,tag_offset)
iopt = 2
pe = mype ! pe on which block to be treated is stored
lcc = .true. ! fill cell centered guardcell data
lfc = .false. ! do not fill cell-face-centered data
lec = .false. ! do not fill cell-edge-centered data
lnc = .false. ! do not fill cell-corner data
l_srl_only = .false. ! do not limit to same refinement level
! neighbors
icoord = 0 ! apply to all block faces
ldiag = .true. ! include diagonal cells at edges and corners
call amr_1blk_guardcell(mype,iopt,nlayers,lb,pe, &
lcc,lfc,lec,lnc, &
l_srl_only,icoord,ldiag)
logical :: lcc,lfc,lec,lnc,l_srl_only,ldiag
integer :: pe
integer :: tag_offset
logical :: lguard, lprolong, lflux, ledge, lrestrict, lfulltree
include 'mpif.h'
error1 = abs(work1(i+1,j,k,1)-work1(i,j,k,1))
error2 = abs(work1(i-1,j,k,1)-work1(i,j,k,1))
error3 = abs(work1(i,j+k2d,k,1)-work1(i,j,k,1))
error4 = abs(work1(i,j-k2d,k,1)-work1(i,j,k,1))
error5 = abs(work1(i,j,k+k3d,1)-work1(i,j,k,1))
error6 = abs(work1(i,j,k-k3d,1)-work1(i,j,k,1))
error_num = max( error1,error2,error3,error4, &
error5,error6 )
error_den = max( work1(i,j,k,1) ,work1(i+1,j,k,1), &
work1(i-1,j,k,1),work1(i,j+k2d,k,1), &
work1(i,j-k2d,k,1),work1(i,j,k+k3d,1), &
work1(i,j,k-k3d,1), &
1.0e-6 )
use paramesh_interfaces, only : amr_restrict, &
amr_1blk_guardcell
use paramesh_mpi_interfaces, only : mpi_amr_comm_setup
Explanation:
There are essentially two changes here. First, our original routine
used the WORK array as input for the calculation of the error. Now
we must use the working block version, called WORK1, in its place.
The array bounds for WORK1 are ilw1:iuw1, etc. The ERROR array which we
use here must be conformable with WORK1. The second change is that we
must fill the guardcells of
WORK1 as they are needed, rather than in a global guardcell filling
operation, as before. Therefore we have removed the call to
amr_guardcell, and inserted a call to amr_1blk_guardcell inside
the loop over grid blocks. Finally we have added a call to
amr_restrict. This is a global
operation which ensures that the parent blocks of leaf blocks have
valid data. The routine amr_1blk_guardcell may require this data
in the neighborhood of spatial resolution jumps. Obviously, if we
have chosen to advance the solution at all refinement levels, then
this restriction operation should not be performed, hence the
'if (advance_all_levels)' test.
Step 5. - Modify the makefile.
main := tutorial_npgc.F90 \
sources := \ amr_initial_soln.F90 \ amr_test_refinement_npgc.F90 \ advance_soln_npgc.F90 \ amr_timestep.F90 \ amr_1blk_bcset.F90
CMD := tutor_npgc
The executable which will be made will be called tutor_npgc.
Step 6. - Build the executable.
gmake -f make_tutor clean
gmake -f make_tutor your_tutorial
Step 7. - Run the executable.
mpirun -np 1 your_tutorial/tutor_npgc
Verify that the solution is the same as in the original version with guardcell storage. Note, after 250 timesteps a single precision result may differ from the original run with guardcell allocation, with accuracy to about the fourth decimal place. This is because the two runs have significant order of calculation differences which introduce round-off error.