This lesson is in the early stages of development (Alpha version)

Arrays

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • How do I submit many similar jobs?

Objectives
  • Be able to submit and run array jobs that run python code

Many similar jobs

Writing job scripts isn’t exactly the most rewarding experience. This is particularly true when you are writing many almost identical job scripts.

Luckily Slurm has a solution for this: job arrays.

How it works:

Job arrays are an excellent way to exploit a kind of parallelism without having to make your serial program parallel: since multiple jobs can run at the same time, the net effect is that your multiple serial jobs are running in parallel.

Here is a very basic example of how arrays work, try submitting it:

array-basic.sh

#SBATCH --array=1,4,7
#SBATCH --time=00:10:00

echo "I am the job with array task ID $SLURM_ARRAY_TASK_ID"
sleep 60

How do I use $SLURM_ARRAY_TASK_ID with my python program?

There are a number of ways.

Putting it together …

Let’s write a job script for an array job that does some machine learning, using different models on the classic Titanic data set

First we download a script and some data:

wget https://raw.githubusercontent.com/ualberta-rcg/python-cluster/gh-pages/files/titanic.py
wget https://raw.githubusercontent.com/ualberta-rcg/python-cluster/gh-pages/files/titanic-train.csv

The titanic.py gives an example of using argparse for working with commandline arguments. In particular, it has a required parameter --model to select the model to use. The available options are decision_tree, random_forest and state_vector_machine., So for example, we might chose to run the program with:

python titanic.py --model random_forest

This will train a model with the data (reserving 1/3 of the data for testing), and report on the accuracy, precision and recall of the model.

Your task is to write an array job that will run all three different models. It should include

  • Loading a python module
  • Create (and activate!) a virtual environment on local disk ($SLURM_TMPDIR)
  • Upgrade pip and use it to install pandas, numpy, and scikit-learn.
  • Add an #SBATCH directive for using a job array
  • use a bash array to translate numbers to model names.
  • run the python script: python titanic.py ...

(Tip: copy/paste from the previous example, and the one in the ‘jobs’ section of this workshop.)

The jobs run pretty quick, but you might be able to catch them in squeue. Use seff to check out the job performance of each sub-job in the array.

Solution

submit-titanic.sh

#!/bin/bash
#SBATCH --array=0-2
#SBATCH --time=00:10:00

module load python/3.11

models=('decision_tree' 'random_forest' 'state_vector_machine')

virtualenv --no-download $SLURM_TMPDIR/venv
source $SLURM_TMPDIR/venv/bin/activate
pip install --no-index pandas numpy scikit-learn

model=${models[$SLURM_ARRAY_TASK_ID]}

python titanic.py --model "$model"

Key Points

  • Array jobs allow you to run several jobs with a single job script