Running Athena on Parallel Processors

Documentation/UserGuide/Parallelization

Athena is parallelized using domain decomposition based on the Message Passing Interface (MPI). The code can be run on any distributed memory cluster (or any multiple processor system) on which MPI is installed using the following steps:

During the configure step, the MPI option must be enabled via
```
 % configure --enable-mpi
```
This sets precompiler macros to include the appropriate MPI code.
During the compile step, the appropriate MPI libraries and include files must be linked. Often this requires using a compile script, usually called mpicc. There are several ways to invoke mpicc, and to set the paths to the libraries and include files it needs:
- environment variables such as CC, LDR, MPIINC, and MPILIB can be set using Linux commands
- the default values of these environment variables can be set in the file ./athena/Makeoptions.in.
- multiple independent values for these environment variables can be set using the MACHINE macro provided in the file ./athena/Makeoptions.in.
  
  The value of the MACHINE macro can be specified from the command line using
```
% make all MACHINE=mymachine
```
  where mymachine specifies a valid block in ./athena/Makeoptions.in where environment variables corresponding to this system are set. Note that if the ./athena/Makeoptions.in file is edited, configure must be run again.
The input file for the problem of interest must be edited to specify the desired domain decomposition within each <domain> block. For example, the following segment of a <domain> block in the input file
```
 <domain1>
 ...
 NGrid_x1 = 1
 NGrid_x2 = 10
 NGrid_x3 = 1
```
will result in a slab decomposition with 10 slabs in the y-direction (a total of 10 processors are needed); while
```
 <domain1>
 ...
 NGrid_x1 = 1
 NGrid_x2 = 2
 NGrid_x3 = 3
```
will result in a pencil decomposition with two pencils in the y-direction and three in the z-direction (a total of 6 processors are needed); while
```
 <domain1>
 ...
 NGrid_x1 = 4
 NGrid_x2 = 4
 NGrid_x3 = 4
```
will result in block decomposition with four blocks in each direction (a total of 64 processors are needed). Any decomposition is allowed on each domain, and with SMR the decomposition of each domain is independent. In all cases, however, there can be no fewer than four active zones along any direction in any Grid.

Alternatively, Athena contains algorithms to automatically compute the optimal domain decomposition (based on minimizing the amount of data that must be communicated). For example, to automatically decompose a Domain into 10 Grids, use
```
 <domain1>
 ...
 AutoWithNProc = 10
```
The AutoWithNProc parameter takes precedence over the NGrid_x* parameters; if it is specified in a <domain> block the values of these other parameters are ignored.
MPI jobs must be run using the mpiexec or mpirun command. The total number of processors to be used must be specified, generally through the command line option -np # (where # is the number of processors to be used). The number of processors used at run time must agree with the total number of MPI blocks (Grids) specified over all the <domain> blocks in the input file, or Athena will print an error message and terminate. A useful script for the Parallel Batch System (PBS) which is often used to schedule jobs on parallel clusters is included in the ./athena/doc directory.

Note that data generated by MPI parallel jobs will be written to separate files for each process (except for history or pdf files, which contain the appropriate MPI calls to do global sums). A useful program for joining together multiple vtk files generated by a parallel job is included in ./athena/vis/vtk.

See also the sections on MPI in the Tutorial and the section on SMR with MPI in the User Guide.