Gravitational Physics

Name of the cluster:
SAKURA
Institution:
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)

Access:


Configuration:

Login nodes sakura[01-02] :

  • CPU Model:  Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
  • 2 sockets
  • 20 cores per socket
  • no hyper-threading (1 threads per core)
  • 376 GB RAM

362 execution nodes sakura[001-362] :

  • CPU Model:  Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
  • 2 sockets
  • 20 cores per socket
  • no hyper-threading (1 threads per core)
  • 376 GB RAM

Node interconnect is based on Intel Omni-Path Fabric (Speed: 100Gb/s)

 

Filesystems:

/u
shared home filesystem; GPFS-based; user quotas (currently 400GB, 1M files) enforced
quota can be checked with '/usr/lpp/mmfs/bin/mmlsquota'
NO BACKUPS yet
/sakura/ptmp
shared scratch filesystem (1.3 PB); GPFS-based; no quotas enforced
NO BACKUPS!

Compilers and Libraries:

The "module" subsystem is implemented on SAKURA. Please use 'module available' to see all available modules.

  • Intel compilers (-> 'module load intel'): icc, icpc, ifort
  • GNU compilers (-> 'module load gcc'): gcc, g++, gfortran
  • Intel MKL (-> 'module load mkl'): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
  • Intel MPI (-> 'module load impi'): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, ...´
  • Python (-> 'module load anaconda'): python

 

Batch system based on Slurm:

The batch system on SAKURA is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, ...) can be found on the Draco home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for SAKURA cluster (partition must be changed).

Current Slurm configuration on SAKURA:

  • default run time: 12 hours
  • current max. run time (wallclock): 1 days
  • only one partitions: p.sakura
  • default memory per node for jobs:  p.sakura ( 380000 MB )
  • nodes are exclusively allocated to jobs
  • max number of nodes each user is able to use: 160

 

Debugging of parallel codes:

1. on login nodes load impi module: module load impi
2. unset two variables:

  • unset I_MPI_HYDRA_BOOTSTRAP
  • unset I_MPI_PMI_LIBRARY

3. compile your parallel code
4. launch your code: mpirun/mpiexec -n <# less than 10> <mpi_code>

 

Useful tips:

By default run time limit used for jobs that don't specify a value is 12 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 24 hours

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the SLURM environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

 

Support

For support please create a trouble ticket at the MPCDF helpdesk.

Document Actions