Slurm User Guide

Writing Slurm Job Scripts

A Slurm job script is a shell script (typically bash) that contains both Slurm directives and the commands you want to run on the cluster. Let's break down the components and syntax of a Slurm job script:

Basic Structure


#!/bin/bash
#SBATCH [options]
#SBATCH [more options]

# Your commands here

Shebang

The first line of your script should be the shebang:
#!/bin/bash
This tells the system to interpret the script using the bash shell.

Slurm Directives

Slurm directives are special comments that start with `#SBATCH`. They tell Slurm how to set up and run your job. Here are some common directives:
#SBATCH --job-name=my_job        # Name of the job 
#SBATCH --output=output_%j.log   # Standard output log file (%j is replaced by the job ID)
#SBATCH --error=error_%j.log     # Standard error log file
#SBATCH --time=01:00:00          # Time limit (HH:MM:SS)
#SBATCH --ntasks=1               # Number of tasks (processes)
#SBATCH --cpus-per-task=1        # Number of CPU cores per task
#SBATCH --mem=1G                 # Memory limit
#SBATCH --partition=general      # Partition (queue) name
#SBATCH --gres=gpu:2             # Request 2 GPUs

Common Slurm Directives

Here's a more comprehensive list of Slurm directives:
  • `--job-name=`: Set a name for the job
  • `--output=`: Specify the file for standard output
  • `--error=`: Specify the file for standard error
  • `--time=
  • `--ntasks=`: Specify the number of tasks to run
  • `--cpus-per-task=`: Set the number of CPU cores per task
  • `--mem=`: Set the total memory required (e.g., 1G for 1 gigabyte)
  • `--partition=`: Specify the partition to run the job on
  • `--array=`: Create a job array (e.g., --array=1-10 for 10 array jobs)
  • `--mail-type=`: Specify email notification events (e.g., BEGIN, END, FAIL)
  • `--mail-user=`: Set the email address for notifications
  • `--nodes=`: Request a specific number of nodes
  • `--gres=`: Request generic consumable resources (e.g., GPUs)
  • Environment Variables

    Slurm sets several environment variables that you can use in your script:
  • - `$SLURM_JOB_ID`: The ID of the job
  • - `$SLURM_ARRAY_TASK_ID`: The array index for job arrays
  • - `$SLURM_CPUS_PER_TASK`: Number of CPUs allocated per task
  • - `$SLURM_NTASKS`: Total number of tasks in a job
  • Example Job Script

    Here's an example of a more complex Slurm job script:
        #!/bin/bash 
    #SBATCH --job-name=complex_job
    #SBATCH --output=output_%A_%a.log
    #SBATCH --error=error_%A_%a.log
    #SBATCH --array=1-5
    #SBATCH --time=02:00:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=4
    #SBATCH --mem=8G
    #SBATCH --partition=general
    #SBATCH --mail-type=BEGIN,END,FAIL
    #SBATCH --mail-user=your.email@example.com
    
    # Load any necessary modules
    module load python/3.8
    
    
    # Run the main command
    python my_script.py --input-file input.txt --output-file output.txt
    
    # Optional: Run some post-processing
    if [ $? -eq 0 ]; then
        echo "Job completed successfully"
        python post_process.py output.txt
    else
        echo "Job failed"
    fi
    
    

    Understanding --ntasks in Slurm

    When you use the `--ntasks` option in Slurm without other specifications, it's important to understand how Slurm interprets and applies this setting.

    When you specify `--ntasks=4` without other options:

    #SBATCH --ntasks=4
    
    # This will run your command with 4 tasks
    srun ./my_program
        

    In this scenario:

    Important Considerations

    1. CPU Allocation: Without specifying `--cpus-per-task`, each task gets 1 CPU by default.
    2. Memory Allocation: The default memory allocation per task depends on the cluster's configuration. It's often a good practice to specify memory requirements explicitly.
    3. Node Distribution: Tasks may be distributed across nodes unless you specify `--nodes` or use the `--ntasks-per-node` option.
    4. Parallel Execution: This setting is particularly useful for MPI jobs where you want to run multiple parallel processes.

    Examples with Additional Specifications

    1. Specifying CPUs per Task

    #SBATCH --ntasks=4
    #SBATCH --cpus-per-task=2
    
    srun ./my_multi_threaded_program
        

    This allocates 4 tasks, each with 2 CPUs, totaling 8 CPUs for the job.

    2. Constraining to a Single Node

    #SBATCH --ntasks=4
    #SBATCH --nodes=1
    
    srun ./my_program
        

    This ensures all 4 tasks run on the same node.

    3. Specifying Tasks per Node

    #SBATCH --ntasks=4
    #SBATCH --ntasks-per-node=2
    
    srun ./my_program
        

    This distributes the 4 tasks across 2 nodes, with 2 tasks per node.

    Best Practices