Documentation/UserGuide/MPI with SMR

The SMR algorithm in Athena allows for very flexible decompositions using MPI. Each Domain at every level can be decomposed into Grids (each updated by a single processor) independently. The code automatically figures out the overlap between Grids on different processors, and handles the communication necessary for the restriction and prolongation operations.

Since different decompositions require differing amounts of data to be communicated, users are strongly recommended to experiment in order to find the most efficient layout for their specific problem.

The decomposition of Domains into Grids with MPI is specified using parameters in the <domain> blocks in the input file (see Domain Blocks in the User Guide). For example, to specify two levels in a 3D calculation, with 8 Grids per level in a 2x2x2 configuration in the root level, and a 4x2x1 configuration in the level=1 Domain, use

    <domain1>
    level           = 0         # refinement level this Domain (root=0)
    Nx1             = 128       # Number of zones in X1-direction
    x1min           = 0.0       # minimum value of X1
    x1max           = 3.0       # maximum value of X1
    bc_ix1          = 4         # boundary condition flag for inner-I (X1)
    bc_ox1          = 4         # boundary condition flag for outer-I (X1)
    NGrid_x1        = 2         # with MPI, number of Grids in X1 coordinate
    AutoWithNProc   = 0         # set to Nproc for auto domain decomposition
    
    Nx2             = 64        # Number of zones in X2-direction
    x2min           = 0.0       # minimum value of X2
    x2max           = 1.5       # maximum value of X2
    bc_ix2          = 4         # boundary condition flag for inner-J (X2)
    bc_ox2          = 4         # boundary condition flag for outer-J (X2)
    NGrid_x2        = 2         # with MPI, number of Grids in X2 coordinate
    
    Nx3             = 64        # Number of zones in X3-direction
    x3min           = 0.0       # minimum value of X3
    x3max           = 1.5       # maximum value of X3
    bc_ix3          = 4         # boundary condition flag for inner-K (X3)
    bc_ox3          = 4         # boundary condition flag for outer-K (X3)
    NGrid_x3        = 2         # with MPI, number of Grids in X3 coordinate
    
    <domain2>
    level           = 1         # refinement level this Domain (root=0)
    Nx1             = 128       # Number of zones in X1-direction
    Nx2             = 64        # Number of zones in X2-direction
    Nx3             = 64        # Number of zones in X3-direction
    iDisp           = 32        # i-displacement measured in cells of this level
    jDisp           = 16        # j-displacement measured in cells of this level
    kDisp           = 16        # k-displacement measured in cells of this level
    AutoWithNProc   = 0         # set to Nproc for auto domain decomposition
    NGrid_x1        = 4         # with MPI, number of Grids in X1 coordinate
    NGrid_x2        = 2         # with MPI, number of Grids in X2 coordinate
    NGrid_x3        = 1         # with MPI, number of Grids in X3 coordinate

This will create Grids of size 64x32x32 on the root level, and 32x32x64 on level=1.

This configuration could be run on either 8 or 16 processors. In the former case, each processor would run one Grid from each level. In the latter case, each Grid would be run on a separate processor.