Spark on Draco

Everyting you need to know to get up and running with Spark on Draco

Introduction to Spark

Apache Spark is an open-source cluster-computing framework that supports big data and machine learning applications.

To allow projects to explore the use of Spark, the MPCDF now provides the ability to run a Spark application on Draco via the Slurm batch system. This page provides an introduction to running Spark applications on Draco.

But first here's a bit of background information about how this works.

As a user you will create a standard Slurm batch script which will be submitted to Slurm and managed as any normal batch job. However, within the Slurm job a stand-alone Spark cluster is created with a master and several workers (which are distributed across the nodes reserved for the Slurm job).

A Spark module has been created which provides the Spark source and some helper scripts to create a spark cluster. This makes it easier for you as a user to create a Spark cluster and submit an application to it.

Running Spark on Draco:

The first step is to load the spark module and create a shared secret for your future cluster(s). This step only needs to be done once (on a Draco login node) and will ensure that only you can run applications on your Spark cluster.

$ module load jdk
$ module load spark
$ spark-create-secure-setup

The spark-create-secure-setup script creates a spark config directory (~/.spark-config/) and generates the configuration and keyfiles needed to ensure your clusters and only accessible by you.

Creating the Slurm script

The Slurn script is used to

1. Request the resources required by the Spark Cluster/Application

2. Configure the Spark software and start the cluster

3. Submit the Spark application to the cluster

In some senses the resources are allocated twice. First the Slurm batch script is used to allocate the  resources on the Slurm cluster (this is done by the directives at the start of the Slurm script). Secondly, the Spark application requests resources once the spark-submit command is called to submit a Spark job to the Spark Cluster (see later). It is important that there is no large mis-match in these resource allocations.

The following snippet shows resource allocation for a 3 Node batch job with 4 tasks per node and 5 cores per task. The memory and time are also allocated and we have defined a partition, although this is not needed?

#SBATCH -t 00:30:00
#SBATCH --mem 20000
#SBATCH --ntasks-per-node 4
#SBATCH --cpus-per-task 5
#SBATCH -p express

Each of the "tasks" will be used to run a Spark executor process (5 cores per Spark executor is a good rule of thumb number). To make best use of the cluster resources it is recommended that you use as many cores as possible on each node. However, some tuning may be needed to fit with memory requirements and the cores per task etc.

Software setup and Spark cluster Start.

The next part of the Slurm script sets up the spark software and starts the Spark Cluster.

module load jdk
module load anaconda
module load spark

echo $MASTER


The spark-start helper script starts the Spark Master and starts a Spark Slave per Slurm task on the reserved nodes. Note that one node will run both Slaves and the Master and this may be something you need to consider when allocating resources to your executors (in the next step).

Running the Spark application

The last part of the script is where the actual Spark application is submitted to the Spark cluster.

spark-submit --total-executor-cores 60 --executor-memory 5G \
/u/jkennedy/Spark/ \

spark-submit is a command line tool to submit a application to a Spark cluster. In this example that application is a simple word-count written in python which read a file from GPFS. The example is an adaption of one of the standard Spark examples which are distributed with the Spark code itself (see:

When submitting a Spark application there are a few tuning parameters which should be considered. The number of executors and the executor memory. You also need to match this to the resource allocation for the Slurm job. The number of Spark executors will be defined by the Slurm --ntasks-per-node and the number of nodes (in this example N*tasks-per-node = 3*4 = 12). Spark, by default, allocates one core per executor. However, in our example, we have set 5 cores per task so the total number of executor-cores for the Spark application will be 12*5=60. It is important that the Spark application matches the Slurm resource reservation (or at least does not exceed it).

Similarly the executor-memory needs to match the Slurm reservation (and cannot exceed the memory available to the Spark cluster). In this example we allocate 5GB per executor giving a total of 60GB which is easily within the limits of the Slurm reservation (remember some resources will be required by the Spark Master).

Submitting the Spark application to Slurm

Now that we have the full Slurm script (and have considered the resource allocation) we can submit it to the Slurm batch system. This is achieved by the sbatch command.

$ sbatch spark-wordcount.cmd

And from this point on the Spark job will behave as any other Slurm batch job.

Tips and Known Issues

Spark Driver Memory Tuning: The Spark Driver memory is used to store accumulator variables as well as any outputs of the collect() operation. By default, the driver is allocated 1GB of memory. If the application demands, it sometimes makes sense to increase this available memory, particularly for long-running Spark jobs, which may accumulate data during their execution. The Driver memory can be set using the -—driver-memory Ng argument where N is the number of gigabytes of memory to allocate for use by the drive. Here is an example

$ spark-submit —-spark-master spark//${MASTER} —-num-executor 10 \
 —-executor-memory 10g —-driver-memory 10g"


Document Actions