Running Athena on Parallel Processors
Documentation/UserGuide/Parallelization
Athena is parallelized using domain decomposition based on the Message Passing Interface (MPI). The code can be run on any distributed memory cluster (or any multiple processor system) on which MPI is installed using the following steps:
-
During the configure step, the MPI option must be enabled via
% configure --enable-mpi
This sets precompiler macros to include the appropriate MPI code.
- During the compile step, the appropriate MPI libraries and include files must be linked. Often this requires using a compile script, usually called
mpicc
. There are several ways to invokempicc
, and to set the paths to the libraries and include files it needs:- environment variables such as
CC
,LDR
,MPIINC
, andMPILIB
can be set using Linux commands - the default values of these environment variables can be set in the file
./athena/Makeoptions.in
. -
multiple independent values for these environment variables can be set using the
MACHINE
macro provided in the file./athena/Makeoptions.in
.The value of the
MACHINE
macro can be specified from the command line using% make all MACHINE=mymachine
where mymachine specifies a valid block in
./athena/Makeoptions.in
where environment variables corresponding to this system are set. Note that if the./athena/Makeoptions.in
file is edited, configure must be run again.
- environment variables such as
-
The input file for the problem of interest must be edited to specify the desired domain decomposition within each
<domain>
block. For example, the following segment of a<domain>
block in the input file<domain1> ... NGrid_x1 = 1 NGrid_x2 = 10 NGrid_x3 = 1
will result in a slab decomposition with 10 slabs in the y-direction (a total of 10 processors are needed); while
<domain1> ... NGrid_x1 = 1 NGrid_x2 = 2 NGrid_x3 = 3
will result in a pencil decomposition with two pencils in the y-direction and three in the z-direction (a total of 6 processors are needed); while
<domain1> ... NGrid_x1 = 4 NGrid_x2 = 4 NGrid_x3 = 4
will result in block decomposition with four blocks in each direction (a total of 64 processors are needed). Any decomposition is allowed on each domain, and with SMR the decomposition of each domain is independent. In all cases, however, there can be no fewer than four active zones along any direction in any Grid.
Alternatively, Athena contains algorithms to automatically compute the optimal domain decomposition (based on minimizing the amount of data that must be communicated). For example, to automatically decompose a Domain into 10 Grids, use
<domain1> ... AutoWithNProc = 10
The
AutoWithNProc
parameter takes precedence over theNGrid_x*
parameters; if it is specified in a<domain>
block the values of these other parameters are ignored. - MPI jobs must be run using the
mpiexec
ormpirun
command. The total number of processors to be used must be specified, generally through the command line option-np #
(where#
is the number of processors to be used). The number of processors used at run time must agree with the total number of MPI blocks (Grids) specified over all the<domain>
blocks in the input file, or Athena will print an error message and terminate. A useful script for the Parallel Batch System (PBS) which is often used to schedule jobs on parallel clusters is included in the./athena/doc
directory.
Note that data generated by MPI parallel jobs will be written to separate files
for each process (except for history or pdf files, which contain the
appropriate MPI calls to do global sums). A useful program for joining
together multiple vtk files generated by a parallel job is
included in ./athena/vis/vtk
.
See also the sections on MPI in the Tutorial and the section on SMR with MPI in the User Guide.