A Slurm job script is a shell script (typically bash) that contains both Slurm directives and the commands you want to run on the cluster.
Let's break down the components and syntax of a Slurm job script:
Basic Structure
#!/bin/bash
#SBATCH [options]
#SBATCH [more options]
# Your commands here
Shebang
The first line of your script should be the shebang:
#!/bin/bash
This tells the system to interpret the script using the bash shell.
Slurm Directives
Slurm directives are special comments that start with `#SBATCH`. They tell Slurm how to set up and run your job. Here are some common directives:
#SBATCH --job-name=my_job # Name of the job
#SBATCH --output=output_%j.log # Standard output log file (%j is replaced by the job ID)
#SBATCH --error=error_%j.log # Standard error log file
#SBATCH --time=01:00:00 # Time limit (HH:MM:SS)
#SBATCH --ntasks=1 # Number of tasks (processes)
#SBATCH --cpus-per-task=1 # Number of CPU cores per task
#SBATCH --mem=1G # Memory limit
#SBATCH --partition=general # Partition (queue) name
#SBATCH --gres=gpu:2 # Request 2 GPUs
Common Slurm Directives
Here's a more comprehensive list of Slurm directives:
`--job-name=`: Set a name for the job
`--output=`: Specify the file for standard output
`--error=`: Specify the file for standard error
`--time=
`--ntasks=`: Specify the number of tasks to run
`--cpus-per-task=`: Set the number of CPU cores per task
`--mem=`: Set the total memory required (e.g., 1G for 1 gigabyte)
`--partition=`: Specify the partition to run the job on
`--array=`: Create a job array (e.g., --array=1-10 for 10 array jobs)
Slurm sets several environment variables that you can use in your script:
- `$SLURM_JOB_ID`: The ID of the job
- `$SLURM_ARRAY_TASK_ID`: The array index for job arrays
- `$SLURM_CPUS_PER_TASK`: Number of CPUs allocated per task
- `$SLURM_NTASKS`: Total number of tasks in a job
Example Job Script
Here's an example of a more complex Slurm job script:
#!/bin/bash
#SBATCH --job-name=complex_job
#SBATCH --output=output_%A_%a.log
#SBATCH --error=error_%A_%a.log
#SBATCH --array=1-5
#SBATCH --time=02:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --partition=general
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=your.email@example.com
# Load any necessary modules
module load python/3.8
# Run the main command
python my_script.py --input-file input.txt --output-file output.txt
# Optional: Run some post-processing
if [ $? -eq 0 ]; then
echo "Job completed successfully"
python post_process.py output.txt
else
echo "Job failed"
fi
Understanding --ntasks in Slurm
When you use the `--ntasks` option in Slurm without other specifications, it's important to understand how Slurm interprets and applies this setting.
When you specify `--ntasks=4` without other options:
Slurm will allocate resources for 4 tasks.
By default, each task is allocated 1 CPU (core).
The tasks may be distributed across multiple nodes, depending on the cluster's configuration and available resources.
#SBATCH --ntasks=4
# This will run your command with 4 tasks
srun ./my_program
In this scenario:
Your job will be allocated 4 CPUs in total.
These 4 CPUs could be on a single node or spread across multiple nodes, depending on availability and the cluster's configuration.
Each task will have access to 1 CPU by default.
Important Considerations
CPU Allocation: Without specifying `--cpus-per-task`, each task gets 1 CPU by default.
Memory Allocation: The default memory allocation per task depends on the cluster's configuration. It's often a good practice to specify memory requirements explicitly.
Node Distribution: Tasks may be distributed across nodes unless you specify `--nodes` or use the `--ntasks-per-node` option.
Parallel Execution: This setting is particularly useful for MPI jobs where you want to run multiple parallel processes.