Parallel Programming

 

Basic parallel concepts on hydra

On the hydra HPC cluster two basic parallelization methods are available.

MPI
The Message Passing Interface provides a maximum of portability for distributed-memory parallelization over a large number of nodes.
OpenMP
OpenMP is a standardized set of compiler directives for shared-memory parallelism. On hydra a pure OpenMP code is restricted to run on a single node.
It is possible to mix MPI and OpenMP parallelism within the same code in order to achieve large-scale parallelism.

IBM's Parallel Operating Environment (POE) with IBM MPI

If you have never used POE, you may find the Overview of IBM's Parallel Environment interesting to read.

Simple interactive example

To run an MPI program interactively (for debug purposes) please follow these steps:

  • Compile your program, e.g.

    mpiifort myprog.f -o myprog

  • Create a file named host.list with a line "localhost" for each processor you intend to use. To use, say, 4 processors of a node, there are four lines with "localhost" (without quotes) in host.list.

  • Run your program on hydra-i.rzg.mpg.de using the command

    poe ./myprog -procs 4
    (poe's options must appear after the name of your binary)

To run a hybrid MPI/OpenMP program interactively set the following environment variables in addition to the above mentioned:

  • export OMP_NUM_THREADS=<nr_of_threads_per_task>
  • export MP_TASK_AFFINITY=core:$OMP_NUM_THREADS

Batch jobs

For production runs it is necessary to run the MPI program as a batch job. Please see the sample batch job scripts for further information.

Environment variables

There are many environment variables and command line flags to the 'poe' command which influence the operation of the PE tools and the execution of parallel programs. Useful defaults are set on hydra. A complete list of these environment variables can be found in the poe man page or in the poe documentation.

 

Accelerator technologies available on Hydra

A part of the Hydra HPC system is equipped with NVidia K20X GPUs, a smaller part with Intel Xeon Phi accelerator cards.

GPGPU computing with the NVidia K20X GPUs

338 IvyBridge nodes are equipped with 2 NVIDIA K20X GPGPUs each. NVidia CUDA and the PGI compiler are provided via environment modules. After loading the respective module, GPGPU code can be compiled and linked on the login nodes. For information on how to submit a GPGPU job, please have a look at the respective sample batch job script.

Using the Intel Xeon Phi accelerators

12 IvyBridge nodes are equipped with 2 Intel Xeon Phi cards each. The default Intel Compiler supports the compilation of offload code by default. No special command line flags are required. An introduction to the topic and reference information are provided by the official Intel Compiler Documentation. Please note that only the offload model is supported, and not the native mode. For information on how to submit a job to the Intel Xeon Phi nodes, please have a look at the respective sample batch job script.

 

Document Actions