Slurm User Guide

Using the Module System in Slurm Jobs

The module system is a software environment management tool widely used in HPC environments. It allows users to dynamically modify their shell environment to access different software packages and versions. Here's how to effectively use the module system in your Slurm jobs:

1. Basic Module Commands

Before diving into Slurm-specific usage, let's review some basic module commands:
  • `module avail`: List all available modules
  • `module list`: Show currently loaded modules
  • `module load `: Load a specific module
  • `module unload `: Unload a specific module
  • `module purge`: Unload all currently loaded modules
  • `module show `: Display information about a module
  • 2. Using Modules in Slurm Job Scripts

    Here's an example of how to use modules in a Slurm job script:

    
    #!/bin/bash 
    #SBATCH --job-name=module_job
    #SBATCH --output=output_%j.log
    #SBATCH --error=error_%j.log
    #SBATCH --time=01:00:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=4G
    
    # Purge all loaded modules
    module purge
    
    # Load required modules
    module load gcc/9.3.0
    module load python/3.8.5
    module load openmpi/4.0.4
    
    # Your job commands here
    python my_script.py
        

    3. Loading Software Stacks

    Sometimes, you might need to load a complete software stack. Many HPC systems provide meta-modules for this purpose:
        
    #!/bin/bash
    #SBATCH --job-name=stack_job
    #SBATCH --output=output_%j.log
    #SBATCH --error=error_%j.log
    #SBATCH --time=01:00:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=4G
    
    # Load a complete software stack
    module load foss/2020a
    
    # Load additional modules as needed
    module load python/3.8.5
    
    # Your job commands here
    
    

    4. Module Dependencies

    Some modules may have dependencies or conflicts. The module system often handles these automatically, but it's good to be aware of them:
        
    #!/bin/bash
    #SBATCH --job-name=dep_job
    #SBATCH --output=output_%j.log
    #SBATCH --error=error_%j.log
    #SBATCH --time=01:00:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=4G
    
    # Load a module with dependencies
    module load tensorflow/2.4.1-cuda11.0-python3
    
    # The above might automatically load CUDA, cuDNN, and Python modules
    
    # Your job commands here
    python my_tensorflow_script.py
    
    

    5. Using Module Collections

    If you frequently use the same set of modules, you can create a module collection:
           
    # Create a module collection 
    module save my_collection
        
    # In your Slurm script
    #!/bin/bash
    #SBATCH --job-name=collection_job
    #SBATCH --output=output_%j.log
    #SBATCH --error=error_%j.log
    #SBATCH --time=01:00:00
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=4G
    
    # Load your module collection
    module restore my_collection
    
    # Your job commands here
    
    

    6. Best Practices for Using Modules with Slurm

    1. Purge before loading: Start your script with `module purge` to ensure a clean environment.
    2. Be specific: Use full module names including versions to ensure reproducibility.
    3. Check for conflicts: Use `module show` to check for potential conflicts before loading modules.
    4. Use module collections: For complex environments, create and use module collections.
    5. Document your modules: Comment your Slurm script to explain why each module is needed.
    6. Use module load in job scripts: Don't rely on modules loaded in your login environment; explicitly load them in your job script.

    7. Troubleshooting Module Issues in Slurm Jobs

    If you encounter module-related issues:
  • Check your Slurm output and error logs for module-related errors.
  • Ensure the modules you're trying to load are available on the compute nodes (they might differ from login nodes).
  • Use `module show ` to verify module details and dependencies.
  • If a module isn't found, check if you need to load a specific compiler or MPI implementation first.
  • 8. Advanced Module Usage

    Some advanced module features:
  • Module versioning: `module load /`
  • Swapping modules : `module swap `
  • Module aliases : `module alias my_python python/3.8.5`