The LoadLeveler Batch System on HYDRA

The batch system on the HPC cluster HYDRA is IBM's LoadLeveler. To run test or production jobs, submit a job script (see below) to the LoadLeveler, which will find and allocate the resources required for your job (e.g. the compute nodes to run your job on).

Short test jobs ( < 15 min) with 2, 4 or 8 cores will run on a dedicated node with short turn around times.

By default, the job run limit is set to 8 on HYDRA. If your batch jobs can't run independently from each other, please use job steps or contact the helpdesk on the MPCDF web page.

The Intel processors on HYDRA support the hyperthreading mode which might increase the performance of your application by up to 20%. With hyperthreading, you have to increase the number of MPI tasks_per_node from 16 (20 on Ivy Bridge nodes) to 32 (40 on Ivy Bridge nodes) in your job script. Please be aware that with 32 (or 40) MPI tasks_per_node each process gets only half of the memory by default. If you need more memory per MPI task you have to specify it in the variable "ConsumableMemory". In the HYDRA cluster, there are 20 Sandy Bridge compute nodes and 2 x 100 Ivy Bridge compute nodes available with 128 GB of real memory (120 GB for the application).

The default Parallel Environment on HYDRA is POE with the IBM MPI. But you may use Intel MPI as well. You can use executables that were built with Intel MPI in the poe call in your job script. Or, you can use a pure Intel MPI environment in your job (with '@ job_type = mpich'). However, we recommend to use IBM's MPI/POE because it shows somewhat better performance than Intel MPI.

Small batch jobs on less than 16 nodes will mainly run on the Sandy Bridge nodes with 16 cores per node. If you request more than 16 cores per node for such small jobs you probably have to face a longer waiting time in the batch queue because the Ivy Bridge nodes are dedicated to big jobs primarily.

For detailed information about LoadLeveler, please see IBM's manual about Using and Administering IBM LoadLeveler for Linux.

 

The most important Loadleveler commands are

llsubmit job_script_name
Submit a job script for execution.
llq
Check the status of your job(s).
llcancel job_id
Cancel a job.
llclass
List the available batch classes.

 

Notes on job scripts:

Sample Batch job scripts can be found here.

The variable

# @ node = <nr. of nodes>

gives the number of iDataPlex nodes that your program will use.

The variable

# @ tasks_per_node = <nr. of cpus>

specifies the number of MPI processes for the job. If you are using OpenMP, you have to set this variable to 1. The parameter @tasks_per_node can not be greater than 32 (40) because one iDataPlex node has 16 (20 if Ivy Bridge) cores with 2 threads each, thus 32 (40) logical CPUs in Hyperthreading mode.

The variable

# @ resources = ConsumableCpus(nr. of threads)

specifies the number of threads if you are using OpenMP. In case of MPI, you have to set this variable to 1.

Along with ConsumableCpus(xx) you can specify ConsumableMemory(yyyy) as the memory (in MB) that your job needs per MPI task.

The expression

tasks_per_node * ConsumableCpus()

may not exceed 32 (40).

The expression

node * tasks_per_node * ConsumableCpus()

gives the total number of CPUs that your job will use.

The variable

@ first_node_tasks

determines the number of tasks that are run on the first node among all nodes allocated for a job. It is useful when a job does not use a multiple of the number of cores per node. For example, to run a job with 128 MPI tasks on the Hydra Sandy Bridge partition, 8 nodes with 16 tasks on each node have to be allocated. To run the same job on a Hydra Ivy Bridge partition, only 7 nodes are necessary in total. 6 nodes will then run 20 tasks per node and 1 node will run 8 tasks. The latter is specified by the "first_node_tasks" variable. Please see the sample script page for a complete example.

Document Actions