Site Tools


staff-wiki:compute_slurm_troubleshooting

SLURM Troubleshooting

Automagically fix

root@slurm:~# gcs update
root@slurm:~# gcs fix

Manually Fix Compute Node

Check if slurmd and munge are running, restart if not

# Check service
node# systemctl status munge
node# systemctl status slurmd
# Restart if needed
node# systemctl restart munge
node# systemctl restart slurmd

Jobs Owned by 'nobody'

This is probably an issue with one of the power nodes. If this happens, log in to the power node and run:

service sssd restart
staff-wiki/compute_slurm_troubleshooting.txt · Last modified: 2023/08/29 19:56 by 127.0.0.1