Common Pain Points¶
This page covers recurring issues that Bodhi users encounter when migrating from LSF to SLURM. These aren't simple directive swaps — they're behavioral differences that catch people off guard.
Debugging OOM (Out-of-Memory) errors¶
How OOM kills look in SLURM¶
When a job exceeds its memory allocation, SLURM kills it immediately. The job state is set to OUT_OF_MEMORY:
$ sacct -j 12345 --format=JobID,JobName,State,ExitCode,MaxRSS
JobID JobName State ExitCode MaxRSS
------------ ---------- ---------- -------- ----------
12345 analysis OUT_OF_ME+ 0:125
12345.batch batch OUT_OF_ME+ 0:125 15.8G
You can also see this with seff:
$ seff 12345
Job ID: 12345
State: OUT_OF_MEMORY (exit code 0)
Memory Utilized: 15.80 GB
Memory Efficiency: 98.75% of 16.00 GB
This is different from LSF
On Bodhi's LSF, memory limits were often soft limits — jobs could exceed their requested memory without being killed (as long as the node had memory available). In SLURM, --mem is a hard limit enforced by cgroups. If your job exceeds it, even briefly, it will be killed.
Diagnosing memory usage¶
For completed jobs, use sacct:
# Check peak memory usage
sacct -j <jobid> --format=JobID,JobName,MaxRSS,MaxVMSize,State
# For array jobs, check all tasks
sacct -j <jobid> --format=JobID%20,JobName,MaxRSS,State
For running jobs, use sstat:
Use seff for quick checks
seff <jobid> gives a one-line summary of memory efficiency for completed jobs. It's the fastest way to check if your job was close to its memory limit.
Fixing OOM errors¶
-
Check what your job actually used — run
seff <jobid>on a similar completed job to see actual peak memory. -
Request more memory with headroom — add 20–30% buffer above the observed peak:
-
Use
--mem-per-cpufor multi-threaded jobs — if your job scales memory with cores:
Don't just request the maximum
Requesting far more memory than you need reduces scheduling priority and wastes cluster resources. Right-size your requests based on actual usage from seff.
Understanding SLURM accounts¶
What is --account?¶
In SLURM, the --account flag associates your job with a resource allocation account. This is used for:
- Fair-share scheduling — accounts that have used fewer resources recently get higher priority
- Resource tracking — PIs and admins can see how allocations are consumed
- Access control — some partitions may be restricted to certain accounts
Why this matters on Bodhi
On LSF, the -P project flag was often optional or had a simple default. On SLURM, submitting with the wrong account (or no account) can result in job rejection or lower scheduling priority.
Finding your account(s)¶
# List your SLURM associations (accounts and partitions you can use)
sacctmgr show associations user=$USER format=Account,Partition,QOS
# Shorter version — just account names
sacctmgr show associations user=$USER format=Account --noheader | sort -u
Setting a default account¶
Rather than adding --account to every script, set a default:
# Set your default account (persists across sessions)
sacctmgr modify user $USER set DefaultAccount=<your_account>
You can also add it to your ~/.bashrc or a SLURM defaults file:
In your job scripts¶
Paying attention to wall time¶
SLURM enforces --time strictly¶
In SLURM, the --time (wall time) limit is a hard cutoff. When your job hits the limit:
- SLURM sends
SIGTERMto your job (giving it a chance to clean up) - After a short grace period, SLURM sends
SIGKILL - The job state is set to
TIMEOUT
$ sacct -j 12345 --format=JobID,JobName,Elapsed,Timelimit,State
JobID JobName Elapsed Timelimit State
------------ ---------- ---------- ---------- ----------
12345 longrun 02:00:00 02:00:00 TIMEOUT
This is different from LSF
On Bodhi's LSF, wall-time limits were often loosely enforced — jobs could sometimes run past their -W limit. In SLURM, when your time is up, your job is killed. Period.
Checking remaining time¶
From outside the job:
# See time limit and elapsed time
squeue -u $USER -o "%.10i %.20j %.10M %.10l %.6D %R"
# Elapsed ^ ^ Limit
# Detailed view
scontrol show job <jobid> | grep -E "RunTime|TimeLimit"
From inside the job (in your script):
Consequences of TIMEOUT¶
- Your job output may be incomplete or corrupted
- Any files being written at kill time may be truncated
- Temporary files won't be cleaned up
Add cleanup traps
If your job writes large intermediate files, add a trap to handle SIGTERM:
Bodhi partition time limits¶
| Partition | Max wall time | Default wall time | Notes |
|---|---|---|---|
short |
4 hours | 1 hour | Quick jobs, higher priority |
normal |
7 days | 1 hour | General-purpose |
long |
30 days | 1 hour | Extended runs |
gpu |
7 days | 1 hour | GPU jobs |
interactive |
12 hours | 1 hour | Interactive sessions |
Check current limits
Partition limits can change. Verify the current limits with:
Tips for setting wall time¶
-
Start with a generous estimate, then refine based on actual runtimes using
sefforsacct. -
Shorter jobs schedule faster — SLURM's backfill scheduler can fit shorter jobs into gaps. Requesting 2 hours instead of 7 days can dramatically reduce queue wait time.
-
Use
sacctto check past runtimes: -
SLURM format for
--time:Format Meaning MMMinutes HH:MM:SSHours, minutes, seconds D-HH:MM:SSDays, hours, minutes, seconds D-HHDays and hours