Bits & Bytes online Edition




System Environments for HPC

Ingeborg Weidl, Johannes Reetz

IBM Power6 system 'vip'

The IBM Power6 supercomputer 'vip' (207 compute nodes, 6 I/O nodes, 1 node for the Hierarchical Storage Management, all connected by a fast 8-plane InfiniBand network) is in production since June 2008. A Power6 node has 32 processors, each with 2 hardware threads, thus there are 6624 processors (or 13248 logical CPUs) available for computing, with a total main memory of 18.5 TB and a peak performance of 120 TF/s.

The operating system of the Power6 cluster is AIX 6.1 with the traditional parallel programming environment (MPI, ESSL/PESSL). The current compiler levels are Fortran xlf 12.1, C xlc 10.0 and C++ xlC 10.0. The batch system is LoadLeveler 3.5.

The Power6 processors can be used in 'Simultaneaous Multithreading' (SMT) mode. The SMT mode increases the performance of most applications significantly. We are using the SMT mode with 64 logical CPUs as the default on the Power6 nodes, but it is possible to use a Power6 node in 'Single Thread' (ST) mode with 32 CPUs as well. Meanwhile, about half of the batch jobs are using the SMT mode.

Testing and debugging programs interactively can be done on a dedicated Power6 node, the vip100. On the login node vip.rzg.mpg.de (vip001), interactive usage of poe is not allowed to avoid memory constraints.

The total disk space on the Power6 system is about 400 TB. There are 3 GPFS file systems available that are symmetrically accessible from all Power6 nodes:

/u
(60 TB) for permanent user data. The users' home directories are in /u.
/ptmp
(320 TB) for temporary job I/O. Files in /ptmp that have been not accessed for more than 14 days are removed automatically.
/r
for migrated data (with an online disk space of 30 TB).

The AFS file system is available only on the login node vip001 and on vip100. Please note that no system backups are done neither of /u nor /ptmp.

Like the former Regatta cluster, the Power6 cluster 'vip' is part of the European DEISA (Distributed European Infrastructure for Supercomputing Applications) project. Within the 'DEISA Extreme Computing Initiative' (DECI) the RZG provides computing resources for the most challenging applications in Material Science, Bio Sciences, Plasma Physics, Earth Sciences, and Engineering.

Since the beginning, the Power6 machine is very well utilized (about 90-95%), and a considerable number of applications is using 512 and more processors.

IBM BlueGene/P genius

In October 2008, the Blue Gene/P system 'genius' was upgraded from 3 to 4 racks. There are now 4096 quad-core processors (i.e. 16384 cores) available with a total main memory of 8 TB. The peak performance is 55 TF/s.

Communication is done using a fast 3D torus network, disk I/O is done via a 10 Gbit/s Ethernet network. The GPFS file systems /u and /ptmp are shared by the Blue Gene/P and the Power6 system. AFS is only available on the login node 'genius.rzg.mpg.de', not on the Blue Gene/P compute nodes.

The operating system of the Blue Gene/P is Linux (SLES10), but the programming environment includes the Fortran xlf 11.1 and C/C++ 9.0 compilers and the ESSL library from IBM. As the batch system we are running LoadLeveler 3.5.

Most of the highly parallel applications on 'genius' are using 2048 cores up to 8192 cores. The utilization of the machine is around 85-90%. The Blue Gene/P also participates in the DEISA project.