staff-wiki:compute_slurm_troubleshooting
Table of Contents
SLURM Troubleshooting
Automagically fix
root@slurm:~# gcs update root@slurm:~# gcs fix
Manually Fix Compute Node
Check if slurmd and munge are running, restart if not
# Check service node# systemctl status munge node# systemctl status slurmd # Restart if needed node# systemctl restart munge node# systemctl restart slurmd
Jobs Owned by 'nobody'
This is probably an issue with one of the power nodes. If this happens, log in to the power node and run:
service sssd restart
staff-wiki/compute_slurm_troubleshooting.txt · Last modified: 2023/08/29 19:56 by 127.0.0.1