- 1 Using the PBS Cluster
- 2 Logging on to the Cluster
- 3 Using PBS
- 4 PBS Resources
- 4.1 Queues
- 4.2 Requesting a Specific Resource
- 5 Debugging PBS Jobs
- 5.1 Normal Error Reporting
- 5.2 Running Interactively
- 5.3 Common Errors
- 5.3.1 Password-less SSH not properly set up
- 5.3.2 "bad UID"
- 5.3.3 SCP I/O errors
- 5.3.4 MPI communication errors
- 5.3.5 "bad interpreter" error
Using the PBS Cluster
Jobs are submitted to the cluster by creating a PBS job command file that specifies certain attributes of the job, such as how long the job is expected to run and how many nodes of the cluster are needed (e.g. for parallel programs). PBS then schedules when the job is to start running on the cluster (based in part on those attributes), runs and monitors the job at the scheduled time, and returns any output to the user once the job completes.
Logging on to the Cluster
Logins to the cluster are done via SSH to one of the head nodes; you should not log in direction on any of the compute nodes. You will need an SSH client on your computer - either OpenSSH for a *nix system or SecureCRT for a Windows PC.
Log onto power[1..6].cs.virginia.edu; these are the head nodes for the cluster, and act as the control console for the queues. These machines are appropriate for any interactive work such as source code editing, compilation, and submitting jobs through PBS.
Centurion001 is the PBS server node, and jobs are executed on the centurion[2..64] nodes by default. There are also the radio, generals, lava and realitytv cluster queues, but you should check with root first before using those queues.
The sections below will outline what you need to know to set up and run your jobs on the various clusters. "Screen Capture" examples are given in the gray boxes to show what you should expect to see as output from various commands.
Configuring Your Account
Use of the CS PBS system assumes some familiarity with the Unix/Solaris software environment. In order to use PBS for batch job submission, it may be necessary to configure some of your Unix account startup files. General information about the Unix operating system can be found here.
When a job is submited to the cluster through PBS a new login to your account is initiated, and any initialization commands in your startup files (.profile, .variables.ksh, .kshrc etc) are executed. In this case (running in batch mode) it is necessary to disable the interactive commands such as setting tset and stty. If these precautions are not taken then error messages will be written to the batch jobs error file and your program may not run. The recommended procedure to disable the interactive sections of the startup files is to test the environment variable PBS_ENVIRONMENT, which is set when PBS runs. If the variable has been set, meaning a PBS job has initiated the login, the interactive parts of the startup files are skipped. CS Department profiles are not set to run any commands interactively, by default, so unless you have modified this yourself, there is nothing to be concerned with.
Your CS account is completely separate from your ITC account, even though it has the same 'username' and UID. Your password and home directory are not shared between the two systems, so please remember that this is a different password.
You will need to set up ssh for password-free login on the CS Departmental Systems. In order to do this, you will need to set up a public/private key pair for use with SSH. Log onto one of the interactive front ends (power[1..6]), and do the following:
1. Change directories to your .ssh directory:
jpr9c@power1 : /af13/jpr9c ; cd .ssh jpr9c@power1 : /af13/jpr9c/.ssh ;
2. Generate an ssh key-pair using ssh-keygen:
jpr9c@power1 : /af13/jpr9c/.ssh ; ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/af13/jpr9c/.ssh/id_rsa):
Go ahead and press "enter" and use the default file name. Next, just leave the "passphrase" blank - remember, you want to be able to log onto other systems using SSH without having to type in anything:
Enter passphrase (empty for no passphrase): Enter same passphrase again:
The keys will be stored right where they need to be by default:
Your identification has been saved in /af13/jpr9c/.ssh/id_rsa. Your public key has been saved in /af13/jpr9c/.ssh/id_rsa.pub. The key fingerprint is: a5:99:57:33:ee:d5:4f:b9:28:ff:91:4d:66:22:0e:5e jpr9c@power1
3. Next, we need to let the SSH client and server know that this key is to be used to allow you to log in, so add the public key to the "authorized_keys" file:
jpr9c@power1 : /af13/jpr9c/.ssh ; cat id_rsa.pub >> authorized_keys
Be sure the permissions are set correctly on this file:
jpr9c@power1 : /af13/jpr9c/.ssh ; chmod 644 authorized_keys jpr9c@power1 : /af13/jpr9c/.ssh ; ls -l authorized_keys -rw-r--r-- 1 jpr9c uucp 617 2008-07-14 13:39 authorized_keys
Please also verify that the permissions are set correctly on your .ssh directory - if your permissions are set too loosely (i.e., a stranger can write to the directory) - ssh will not trust the keys and will ignore them.
jpr9c@power1 : /af13/jpr9c ; ls -ld .ssh/ drwx------ 2 jpr9c staff 1024 2010-08-30 14:45 .ssh/
4. Because SSH uses asymmetrical keys for host as well as user identification, the first time you connect to a new host, you get a prompt back asking if you want to add the host's public key to your known_hosts file. Since you need to be able to log onto the different machines without any keyboard-interaction, you'll need to add these host keys to your known_hosts file. Those keys are available in the pbs_hosts file; download a copy (either use the command below or 'right-click' and 'save-as') and add it to your known_hosts file:
jpr9c@power1 : /af13/jpr9c/.ssh ; wget http://www.cs.virginia.edu/~csadmin/pbs/pbs_hosts --17:10:56-- http://www.cs.virginia.edu/~csadmin/pbs/pbs_hosts => `pbs_hosts.1' Resolving www.cs.virginia.edu... 22.214.171.124 Connecting to www.cs.virginia.edu|126.96.36.199|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 141,440 (138K) [text/plain] 100%[====================================>] 141,440 --.--K/s 17:10:56 (89.92 MB/s) - `pbs_hosts.1' saved [141440/141440] jpr9c@power1 : /af13/jpr9c/.ssh ; cat pbs_hosts >> known_hosts
5. Test your key setup to be sure you can log onto another system without typing in a password:
jpr9c@power1 : /af13/jpr9c/.ssh ; ssh power2 Linux power2 2.6.24-19-generic #1 SMP Fri Jul 11 21:01:46 UTC 2008 x86_64 The programs included with the Ubuntu system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. To access official Ubuntu documentation, please visit: http://help.ubuntu.com/ Last login: Mon Aug 25 10:00:29 2008 from roark.cs.virginia.edu No mail for jpr9c
It is absolutely essential that you have password-less SSH working for your PBS jobs to run! The PBS queue uses the SSH suite (scp, ssh) to move your files around the cluster and to copy your results back to you after your job has run. If you have to type anything to log onto one node from another, then review your steps, and see the trouble-shooting section below.
The PBS resource management system handles the management and monitoring of the computational workload on the Department clusters. Users submit "jobs" to the resource management system (PBS Server - Centurion001) where they are queued up until the system is ready to run them. PBS selects which jobs to run, when, and where, according to node attributes requested by the users. There are resource routing queues to try and ensure that users are handled as efficiently as possible and to maximize throughput.
To use PBS, you create a batch job command file which you submit to the PBS server to run on a PBS queue. A batch job file is simply a shell script containing the set of commands you want run on some set of cluster compute nodes. It also contains directives which specify the characteristics (attributes), and resource requirements (e.g. number of compute nodes and maximum runtime) that your job needs. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.
PBS also provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job, in that it is placed into the queue and must wait for resources to become available before it can run. Once it is started, however, the user's terminal input and output are connected to the job in what appears to be an rlogin session to one of the compute nodes. Many users find this useful for debugging their applications or for computational steering.
PBS provides two user interfaces for batch job submission: a command line interface (CLI) and a graphical user interface (GUI). Both interfaces provide the same functionality. The CLI lets you type commands at the system prompt. This guide will only provide examples for using the CLI; if you are an experienced *nix and X-windows user and prefer the GUI, further information about how to configure and use the xpbs interface can be found in Chapter 5 of the PBS Pro User Guide. The remainder of this tutorial will focus on the PBS command line interface. More detailed information about using PBS can be found in the PBS Pro User Guide.
PBS Job Command File
To submit a job to run on the Centurion cluster, a PBS job command file must be created. The job command file is a shell script that contains PBS directives; these directives are preceded by #PBS. The following is an example of a PBS command file to run a serial job, which would require only 1 processor on 1 node.
#!/bin/sh #PBS -l nodes=1:ppn=1 #PBS -l walltime=12:00:00 #PBS -o output_filename #PBS -j oe #PBS -m bea #PBS -M email@example.com cd $PBS_O_WORKDIR ./your_executable
The first line identifies this file as a shell script. The next several lines are PBS directives that must precede any commands to be executed by the shell (e.g. the last two lines). The PBS directives are defined in the table below:
PBS Directive Function #PBS -l nodes=1:ppn=1 Specifies a PBS resource requirement of 1 compute node and 1 processor per node. #PBS -l walltime=12:00:00 Specifies a PBS resource requirement of 12 hours of wall clock time to run the job. #PBS -o output_filename Specifies the name of the file where job output is to be saved. May be omitted to generate filename appended with jobid number. #PBS -j oe Specifies that job output and error messages are to be joined in one file. #PBS -m bea Specifies that PBS send email notification when the job begins (b), ends (e), or aborts (a). #PBS -M firstname.lastname@example.org Specifies an alternate email address where PBS notification is to be sent. #PBS -V Specifies that all environment variables are to be exported to the batch job.
The following is an example of a PBS email notification to the user at the end of the job:
Date: Tue, 2 Sep 2008 12:43:09 -0500 From: root To: email@example.com Subject: PBS JOB 9563.centurion001 PBS Job Id: 1187.centurion Job Name: script.sh Execution terminated Exit_status=0 resources_used.cpupercent=02 resources_used.cput=00:00:01 resources_used.mem=64248kb resources_used.ncpus=1 resources_used.vmem=81036kb resources_used.walltime=00:00:02
Note that the walltime-used information in the email should be used to accurately estimate the walltime resource requirement in the PBS job command file for future job submissions so that PBS can more effectively schedule the job. When submitting a particular PBS job for the first time, the walltime requirement should be overestimated to prevent premature job termination.
After the PBS directives in the command file, the shell executes a change directory command to $PBS_O_WORKDIR, a PBS variable indicating the directory where the PBS job was submitted. Normally this will also be where the progam executable is located. Other shell commands can be executed as well. In the last line, the executable itself itself is invoked.
Submitting a Job
The PBS qsub command is used to submit job command files for scheduling and execution. For example, to submit your job with a PBS command file called "pbs_test.sh", the syntax would be
jpr9c@power1 : /af13/jpr9c/work/pbs ; qsub pbs_test.sh 9563.centurion001
Notice that upon successful submission of a job, PBS returns a job identifier of the form <jobid>.centurion001, where <jobid> is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status, deleting the job, or specifying job dependencies as described below.
There are many options to the qsub command as can be seen by typing man qsub at the command prompt on power[1..6].cs.virginia.edu or looking at the PBS Pro User Guide. Three of the more useful ones are the -W option for allowing specification of additional job attributes, the -I option, which declares that the job is to be run "interactively", and the -l option, which allows resource requirements to be listed as part of the qsub command. These are discussed below.
You may specify an alternate queue from the command line by using the -q <queue name> option:
jpr9c@power2 : /af13/jpr9c/work/pbs ; qsub -q sunfire mpitest 489850.centurion001 jpr9c@power2 : /af13/jpr9c/work/pbs ; qstat 489850 Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 489850.centurion0 mpitest jpr9c 0 Q sunfire
Displaying Job Status
The qstat -a command is used to obtain status information about jobs submitted to PBS.
jpr9c@power1 : /af13/jpr9c ; qstat -a centurion001: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 9517.centurion0 bcs8d centurio applu_1000 20134 1 1 512mb 10:00 R 02:04 9518.centurion0 bcs8d centurio apsi_10000 29066 1 1 512mb 10:00 R 02:04 9520.centurion0 bcs8d centurio bzip2graph 7502 1 1 512mb 10:00 R 02:03 9521.centurion0 bcs8d centurio crafty_100 7515 1 1 512mb 10:00 R 02:03 9522.centurion0 bcs8d centurio eoncook_10 26435 1 1 512mb 10:00 R 02:03 9523.centurion0 bcs8d centurio equake_100 26444 1 1 512mb 10:00 E 02:01 9525.centurion0 bcs8d centurio fma3d_1000 17953 1 1 512mb 10:00 R 02:03 9526.centurion0 bcs8d centurio galgel_100 19263 1 1 512mb 10:00 R 02:03 9529.centurion0 bcs8d centurio gzipgraphi 749 1 1 512mb 10:00 R 02:04 9530.centurion0 bcs8d centurio lucas_1000 10537 1 1 512mb 10:00 R 02:03 9531.centurion0 bcs8d centurio mcf_100000 10548 1 1 512mb 10:00 R 02:03 9532.centurion0 bcs8d centurio mesa_10000 16103 1 1 512mb 10:00 R 02:03 9533.centurion0 bcs8d centurio mgrid_1000 16113 1 1 512mb 10:00 R 02:03 9534.centurion0 bcs8d centurio parser_100 4316 1 1 512mb 10:00 R 02:03 9538.centurion0 bcs8d centurio twolf_1000 12837 1 1 512mb 10:00 R 02:03 9539.centurion0 bcs8d centurio vortexone_ 12846 1 1 512mb 10:00 R 02:03 9541.centurion0 bcs8d centurio wupwise_10 8063 1 1 512mb 10:00 R 02:03
The first five fields of the display are self-explanatory. The sixth and seventh fields, titled NDS and TSK in the above display, indicate the total number of nodes and processors respectively required by each job. The ninth field indicates the required walltime (hrs:min.) and the last field shows the elapsed runtime. The tenth field titled S indicates the state of the job. The job state can have the following values:
State Definition E Job is exiting after having run H Job is held Q Job is queued, eligible to run or be routed R Job is Running T Job is in transition (being moved to a new location) W Job is waiting for its requested execution time to be reached S Job is suspended
To see more specific information on your particular job, you can run "qstat -f <jobid>.centurion001":
jpr9c@centurion001 : /af13/jpr9c/work/pbs ; qstat -f 10252.centurion001 Job Id: 10252.centurion001 Job_Name = script.sh Job_Owner = firstname.lastname@example.org resources_used.cpupercent = 95 resources_used.cput = 02:13:54 resources_used.mem = 1351476kb resources_used.ncpus = 1 resources_used.vmem = 2500800kb resources_used.walltime = 38:47:19 job_state = R queue = centurion server = centurion001 Checkpoint = u ctime = Wed Sep 10 17:46:04 2008 Error_Path = centurion001.cs.virginia.edu:/uf8/jm6dg/fractal/driver/src/scr ipt.sh.e10252 exec_host = centurion039/1 exec_vnode = (centurion039:ncpus=1) Hold_Types = n Join_Path = oe Keep_Files = n Mail_Points = a mtime = Wed Sep 10 21:35:50 2008 Output_Path = centurion001.cs.virginia.edu:/uf8/jm6dg/fractal/driver/src/sc ript.sh.o10252 Priority = 0 qtime = Wed Sep 10 17:46:04 2008 Rerunable = True Resource_List.mem = 400mb Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.nodes = 1:ppn=1 Resource_List.place = scatter Resource_List.select = 1:ncpus=1 Resource_List.walltime = 100:00:00 stime = 1221096950 session_id = 29207 job_dir = /uf8/jm6dg Variable_List = PBS_O_HOME=/uf8/jm6dg,PBS_O_LANG=en_US.UTF-8, PBS_O_LOGNAME=jm6dg, PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bi n:/usr/games:/usr/pbs/bin,PBS_O_MAIL=/var/mail/jm6dg, PBS_O_SHELL=/usr/cs/bin/bash,PBS_O_HOST=centurion001.cs.virginia.edu, PBS_O_WORKDIR=/uf8/jm6dg/fractal/driver/src,PBS_O_SYSTEM=Linux, PBS_O_QUEUE=centurion comment = Job run at Wed Sep 10 at 21:35 on (centurion039:ncpus=1) etime = Wed Sep 10 17:46:04 2008
The CS department has a number of clusters, each of which currently has it's own queue. You may use all the queues, but be mindful of which queues actually have the resources needed to run your job. For example, if you need 4 cpus (cores) on the same node, please be sure to use a queue with nodes that actually have more than two cores.
A note on resource limits: we try to have as few user/resource limits as possible on the queues in order to maximize throughput. There are times when the clusters are unused and we want users to be able to grab as many resources as possible. A few notes on how these limits work:
Centurion (default queue)
This is an x86-64 architecture queue. This is the default queue, please most jobs here.
64 nodes 2 CPUs per node (AMD Opteron(tm) Processor 242) 2GB RAM per node
max_user_res_soft.ncpus = 32
This is another x86_64 queue, very similar to the Centurion queue.
24 nodes 2 CPUs/node (AMD Opteron(tm) Processor 246) Radio[1..5] 4GB RAM Radio[6..24] 2GB RAM
max_user_res_soft.ncpus = 48 max_user_run_soft = 24
This is the general purpose i686 queue - these machines have the 32-bit kernel. These are ideal for smaller jobs which don't require 64bit memory.
22 nodes 4 CPUs/node (Intel(R) Xeon(TM) CPU 2.80GHz stepping 07) 3 GB RAM/node
This is the high-memory queue; big memory jobs should go here. Smaller jobs should be spread out across the other queues.
8 CPUs/node (Intel(R) Xeon(R) CPU - a mixture, all dual-quad-core)
48GB - Camillus, Sulla, Titus 32GB - Cray[1..4] 24GB - Romulus, Radio[25,26]
max_user_res_soft.mem = 48gb max_user_res_soft.ncpus = 24
Requesting a Specific Resource
Generally we do not apply limits to users or queues or partition nodes among many different queues - the fewer limits, the better the aggregate throughput for everyone. However, because the hardware in the queue is heterogenous, it is necessary to request a specific resource in order to guaranty your job gets placed on a node that has that resource!
In the simplest case, you can simply add the resource to the command line resource list when submitting your job; this example is run interactively (-I flag) for illustration purposes. The special hardware resource in this case is a CUDA-capable GPU - we just use the -l cuda_gpu=1 flag:
jpr9c@cuda1 : /af13/jpr9c ; qsub -I -l cuda_gpu=1 qsub: waiting for job 1364543.centurion001 to start qsub: job 1364543.centurion001 ready No mail for jpr9c jpr9c@cray1 : /af13/jpr9c ; ./work/pbs/cuda/testjob/deviceQuery There is 1 device supporting CUDA Device 0: "Tesla C2050" Major revision number: 2 Minor revision number: 0 Total amount of global memory: 2817720320 bytes Number of multiprocessors: 0 Number of cores: 0 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.15 GHz Concurrent copy and execution: Yes Test PASSED Press ENTER to exit...
Running regular jobs
Ordinarily we do not want to run interactive jobs since the resource isn't freed until the user interactively logs off. So, the resource can be used by the next job. You should do your job debugging on one of the interactive nodes (for the CUDA hardware, these are cuda[1..3].cs.virginia.edu) until you have a job which will run without any user interaction (no touching they keyboard!).
Our job in this case is very simple; here's the job script; we ask for the resource and the queue inside the script file this time:
#!/bin/bash #PBS -q generals #PBS -l walltime=00:02:00 #PBS -l mem=1GB #PBS -l cuda_gpu=1 cd $PBS_O_WORKDIR hostname ./deviceQuery -noprompt
And then submit to PBS, where it gets appropriately scheduled:
jpr9c@cuda1 : /af13/jpr9c/work/pbs/cuda/testjob ; qsub test_cuda.sh 1364546.centurion001 jpr9c@cuda1 : /af13/jpr9c/work/pbs/cuda/testjob ; qstat -f 1364546 Job Id: 1364546.centurion001 Job_Name = test_cuda.sh Job_Owner = email@example.com job_state = Q queue = generals server = centurion001 Checkpoint = u ctime = Fri Dec 20 11:45:03 2013 Error_Path = cuda1.cs.virginia.edu:/af13/jpr9c/work/pbs/cuda/testjob/test_c uda.sh.e1364546 Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Fri Dec 20 11:45:03 2013 Output_Path = cuda1.cs.virginia.edu:/af13/jpr9c/work/pbs/cuda/testjob/test_ cuda.sh.o1364546 Priority = 0 qtime = Fri Dec 20 11:45:03 2013 Rerunable = True Resource_List.cput = 168:00:00 Resource_List.cuda_gpu = 1 Resource_List.mem = 1gb Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.place = pack Resource_List.select = 1:cuda_gpu=1:mem=1gb:ncpus=1 Resource_List.walltime = 00:02:00 substate = 10 Variable_List = PBS_O_HOME=/af13/jpr9c,PBS_O_LANG=en_US.UTF-8, PBS_O_LOGNAME=jpr9c, PBS_O_PATH=/af13/jpr9c/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/u sr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-5.5/bin:/usr/pbs/bin:.:/af 13/jpr9c/simics,PBS_O_MAIL=/var/mail/jpr9c,PBS_O_SHELL=/usr/cs/bin/bash, PBS_O_WORKDIR=/af13/jpr9c/work/pbs/cuda/testjob,PBS_O_SYSTEM=Linux, PBS_O_QUEUE=generals,PBS_O_HOST=cuda1.cs.virginia.edu etime = Fri Dec 20 11:45:03 2013 Submit_arguments = test_cuda.sh
You can see where it was run on a machine with the correct resource:
jpr9c@cuda1 : /af13/jpr9c/work/pbs/cuda/testjob ; cat test_cuda.sh.o1364546 cray1 There is 1 device supporting CUDA Device 0: "Tesla C2050" Major revision number: 2 Minor revision number: 0 Total amount of global memory: 2817720320 bytes Number of multiprocessors: 0 Number of cores: 0 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.15 GHz Concurrent copy and execution: Yes Test PASSED
You should be able to use the above files as templates for creating your own jobs.
Debugging PBS Jobs
Ordinarily, you should compile your program on one of the head/interactive nodes (power[1..6].cs.virginia.edu); the software and operating system environment (eg, filesystem layout, libraries, etc.) on the head nodes is identical to the PBS compute nodes. If your job will run correctly on the head node, it should run correctly when submitted through PBS. Please remember: to run in batch mode, your program should not require user interaction! Your test program should not require interactive input from the user (mouse or keyboard); stdin should come from a file or socket. To run a program interactively, see below.
Most PBS users have selected PBS because they need to run very large numbers of jobs; either multiple runs of slightly altered jobs, or the same job run against multiple data sets. Because you will likely be queuing a very large number of jobs, and because we try to minimize the user limits on PBS resources, you can potentially take all the department PBS resources. Please remember to check your jobs to see if they are running as expected. If you feel a job is hung, please qdel it and do not submit large batches until you are confident they are running as expected.
Normal Error Reporting
PBS will give you two type of error reporting - batch processing errors are reported via email, and the stderr output of jobs is captured and returned to you in a file named: <script_name>.e<jobid>.
To get PBS to report more extensively on batch processing errors, please use the "-m bae" script option so that PBS will report when it starts your job, when it aborts your job (and why), and when the job is ended. PBS will not report jobs which are switched to the "HELD" state because of errors actually batch-processing the job. If your job is "HELD" (status "H" in qstat) then it will stay that way indefinitely (or until the administrator cancels it), so you need to investigate why there were errors! To obtain more detailed information on why the job was held, please use:
If your job has a system hold, please send mail to firstname.lastname@example.org, with the output of that command in the contents of your email; we will need to see which host is causing your job to fail.qstat -f <jobid>.centurion001
If your job is being run through the batch system successfully, you will get notification that PBS has run your job and you will get two files back, stdout and stderr from your job. These are returned to the CWD context of your qsub, unless you specify an alternate $PBS_WRK_DIR or "-o <dir/filename>" option in your PBS script. If your results are not what you expect, please examine the std error file <scriptname>.e<jobid>.
Interactive Console Shell
The -I option of qsub declares that a job must be run "interactively". The job will be queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected through qsub to the terminal session in which qsub is running. To acquire a node in interactive mode, you can construct a trivial PBS script such as the following, which for purposes of this example will be called debug.sh:
#!/bin/sh #PBS -l walltime=10:00:00
Now submit debug.sh with
Once the PBS intereactive job is executed, the terminal session will be logged into one of the compute nodes allocated by PBS (you will have an interactive bash session on the compute node). The executable can then be invoked manually from the command prompt. After you have completed your interactive session, be sure to exit from the shell on the compute node so that the node can be returned to PBS for other jobs. Exiting the shell terminates your interactive PBS job.qsub -I debug.sh
It is best to avoid the need for a graphical user interface when using PBS interactively. If you must use one, e.g. for developing a Matlab script, it is best to use it on the frontend. Keep in mind that an X server must be installed on your system; the dept. Linux systems already have XFree86 built in, and the Windows systems are deployed with Exceed. If there is an absolute need for using a graphical interface on the compute nodes, a more complex process is required. With eXceed under Windows: open a console window and at the prompt, type
This will return your IP address. For the purposes of this example, suppose it is 188.8.131.52
Once on the node assigned by PBS, you must set your environment display; type:
depending on whether you know the name or only the IP address of your local machine.export DISPLAY=cordelia.cs.virginia.edu:0 or export DISPLAY=184.108.40.206:0
If you use tcsh, type:
and similarly for an IP address. You should now be able to run X applications on the compute node assigned by PBS.setenv DISPLAY jellybean.itc.virginia.edu:0
Password-less SSH not properly set up
If your password-free ssh keys are not set up properly, you will get debugging output back which looks like the message below. Please step through the procedures outlined above. Note: if you have old host keys for any of the PBS nodes in your ~<user>/.ssh/known_hosts file, it can cause problems; please be sure to remove all old keys, and append the keys in the download above to your known_hosts file.
This generally happens when you are not logged into one of the authorized front ends; you must submit your jobs from one of the power nodes. If you are submitting from a power node, then it is likely your job is being scheduled on a compute node which is having authentication problems. Please report this too email@example.com, with the full output of the email you got back from PBS so we can track down the problem node.
SCP I/O errors
"scp: ambiguous target"
If you have spaces in your path you may get this error back from PBS when it tries to write the output of your job back to your home directory. You will get an email indicating that the job executed but that the copy of results was left in the 'undelivered' directory on the compute node; specifically, you will get the message above in the output (an extract is shown below). If this happens, please check your path to be sure there are no white spaces in it! Unlike GUIs, command-line shells (and other binaries) use white spaces as inter-field separators (to separate the ARGV elements), and spaces in the path can confuse them. In general, when working on Unix systems, it's just a bad habit to use spaces; use underscores "_" as non-whitespace placeholders.
In the example below, renaming the path from "CS 654" to "CS_654" fixes the problem.
debug1: Next authentication method: publickey debug1: Trying private key: /af21/vm9u/.ssh/identity debug1: Offering public key: /af21/vm9u/.ssh/id_rsa debug1: Server accepts key: pkalg ssh-rsa blen 277 debug1: read PEM private key done: type RSA debug1: Authentication succeeded (publickey). debug1: channel 0: new [client-session] debug1: Entering interactive session. debug1: Sending environment. debug1: Sending env LANG = en_US.UTF-8 debug1: Sending command: scp -v -d -t /af21/vm9u/wrk/CS 644/ debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 scp: ambiguous target vm9u@centurion010:/var/spool/PBS/undelivered$ debug1: channel 0: free: client-session, nchannels 1 debug1: fd 0 clearing O_NONBLOCK debug1: fd 1 clearing O_NONBLOCK debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.0 seconds debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0 debug1: Exit status 1
Random Host-Key Exchange errors: "Post job file processing error"
From time to time, some users are affected by a bug in OpenSSH regarding vulnerable host keys; we are not sure why this affects only some users and not others and have not been able to find the differences between those users. The problem results in PBS leaving your job output (STDOUT and STDERR) on the node where your job executed, and sending you email to that effect. You will see output similar to:
Unable to copy file 9680.centurion001.OU to power1.cs.virginia.edu: /uf13/ag2dx/cs644/hw0/script.sh.o9680 >>> error from copy power1.cs.Virginia.EDU: Connection refused .cs.virginia.edu, user ag2dx, command scp -v -r -p -t /uf13/ag2dx/cs644/hw0/script.sh.o9680 OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to power1.cs.virginia.edu [220.127.116.11] port 22. debug1: Connection established. debug1: identity file /uf13/ag2dx/.ssh/identity type -1 debug1: identity file /uf13/ag2dx/.ssh/id_rsa type 1 debug1: identity file /uf13/ag2dx/.ssh/id_dsa type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_4.7p1 Debian-8ubuntu1.2 debug1: match: OpenSSH_4.7p1 Debian-8ubuntu1.2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.7p1 Debian-8ubuntu1.2 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY Host key verification failed. lost connection >>> end error output Output retained on that host in: /var/spool/PBS/undelivered/9680.centurion001.OU
The workaround for this situation is to disable strict host-key-checking in your user ssh configuration file (~<user>/.ssh/config):
# This is the ssh client system-wide configuration file. See # ssh_config(5) for more information. This file provides defaults for # users, and the values can be changed in per-user configuration files # or on the command line. # Configuration data is parsed as follows: # 1. command line options # 2. user-specific file # 3. system-wide file # Any configuration value is only changed the first time it is set. # Thus, host-specific definitions should be at the beginning of the # configuration file, and defaults at the end. # Site-wide defaults for some commonly used options. For a comprehensive # list of available options, their meanings and defaults, please see the # ssh_config(5) man page. Host * # ForwardAgent no ForwardX11 yes ForwardX11Trusted yes # RhostsRSAAuthentication no # RSAAuthentication yes # PasswordAuthentication yes # HostbasedAuthentication no GSSAPIAuthentication no # GSSAPIDelegateCredentials no # GSSAPIKeyExchange no # GSSAPITrustDNS no # BatchMode no # CheckHostIP yes # AddressFamily any # ConnectTimeout 0 StrictHostKeyChecking no
If you don't have this file, you can copy it from /etc/ssh/ssh_config on any department unix system, and then edit to customize your SSH settings:
jpr9c@cordelia:~$ cp /etc/ssh/ssh_config .ssh/config
If you have this problem and would like root to retrieve the results from your job, please just forward the error email to root and we'll copy the file off the node in question back to where you expected the results to go.
MPI communication errors
There is presently a bug in the p4 (default) comm device on the x86_64 mpich library with shared memory. If you run a job using mpiexec and use the default comm (mpich-p4), and run on an SMP node (all our nodes are SMP nodes) you may have a shared memory problem. This problem is likely to occur if you compile and run an MPI job with the default options, requesting more than one CPU and you get multiple CPUs on the same node (ie, the most likely scenario). If you request an equal number of nodes to the number of CPUs you want, then your job will likely not have any errors.
The error reported in your error file will look like:
bm_slave_1_478: (0.035156) process not in process table; my_unix_id = 478 my_host=centurion002 bm_slave_1_478: (0.035156) Probable cause: local slave on uniprocessor without shared memory bm_slave_1_478: (0.035156) Probable fix: ensure only one process on centurion002 bm_slave_1_478: (0.035156) (on master process this means 'local 0' in the procgroup file) bm_slave_1_478: (0.035156) You can also remake p4 with SYSV_IPC set in the OPTIONS file bm_slave_1_478: (0.035156) Alternate cause: Using localhost as a machine name in the progroup bm_slave_1_478: (0.035156) file. The names used should match the external network names. bm_slave_1_478: p4_error: p4_get_my_id_from_proc: 0 p0_477: (98.054688) net_send: could not write to fd=4, errno = 32 rm_14580: (-) net_recv failed for fd = 3 rm_26216: (-) net_recv failed for fd = 3 rm_31238: (-) net_recv failed for fd = 3
If you see something like that, please specify the mpiexec option to not use shared memory:
mpiexec -mpich-p4-no-shmem ./my_mpi_prog
When the next release of the OSC mpiexec for PBS becomes available, we will upgrade to MPICH2 which should resolve this problem.
IPC errors "p4_error: semget failed for setnum:"
Several users have reported the problem that the MPI_INITIALIZE() fails because a previous MPI job has crashed on a node, and left the MPI semaphores locked. The typical error looks like:
rm_13010: p4_error: semget failed for setnum: 0 p0_15425: p4_error: net_recv read: probable EOF on socket: 1 p0_15425: (6.359375) net_send: could not write to fd=4, errno = 32 rm_6498: p4_error: net_recv read: probable EOF on socket: 3 rm_12042: p4_error: net_recv read: probable EOF on socket: 3
There is a simple binary included with mpi which is used to clean up any semaphores left behind for a given user, called (suggestively) "clearipcs"; it's located in /usr/sbin on all the mpi-enabled systems, and will be part of your default environment. There are two ways to invoke this simple command - either before your script executes, or after, the latter being a very simple cleanup.
To cleanup any old semaphores left behind from your previous (crashed) jobs, insert the following loop in your PBS Script file before you call mpiexec; this will loop through all the nodes you are assigned and call the cleanipcs command:
## Loop through nodes before executing my job to be sure there aren't any MPI shm semaphores left for i in `cat $PBS_NODEFILE | sort -u` ; do echo "removing IPC shm segments on $i" ssh $i "/usr/sbin/cleanipcs" done
After this runs, you should see something similar to this in your output file:
removing IPC shm segments on centurion004 removing IPC shm segments on centurion005 removing IPC shm segments on centurion006 removing IPC shm segments on centurion007 removing IPC shm segments on centurion008 removing IPC shm segments on centurion009 removing IPC shm segments on centurion010 removing IPC shm segments on centurion011 removing IPC shm segments on centurion015 removing IPC shm segments on centurion016 removing IPC shm segments on centurion017 removing IPC shm segments on centurion018 removing IPC shm segments on centurion019
Alternately, you can simply "source" the binary at the end of your script; this is generally a good practice anyway, as it ensures proper cleanup after your job completes, even if there is a crash. Just place the following line at the bottom of your script:
# Cleanup any leftover IPC semaphores after execution . /usr/sbin/cleanipcs
Please make note of the little "." at the beginning of that line; this is the magic shell character which tells the interpreter to "source" the filename which follows (execute or concatenate, for text files); if you are using bash, there is also a builtin 'source' command which can be used instead of the '.'.
"bad interpreter" error
If you create your script file on a Windows PC using some text editor, your file will be saved using "DOS" text. The EOL 'character' in a DOS/Windows text is actually a carriage-return and a linefeed character for the "newline" symbol, whereas on a *nix system only a linefeed is used. This extra character is often ignored by many *nix applications - they will display the file normally and "hide" the extra character. If the particular application will show the character, then it appears as a CTRL+M (shown here in /bin/vi on Solaris):
#!/bin/bash^M #PBS -l walltime=00:02:00^M #PBS -l select=4:mpiprocs=1^M #PBS -m a^M #PBS -o bigtest.mpich-p4-no-shmem^M #PBS -j oe^M ^M ^M # show my node file^M cat $PBS_NODEFILE^M ^M cd $PBS_O_WORKDIR^M mpiexec -mpich-p4-no-shmem ./testMPI^M
Although most editors on the dept. Ubuntu systems will hide this extra character, the *nix shell (bash) and PBS will not ignore it. You will get an error message back from the PBS server that looks like this:
jpr9c@power1 : /af13/jpr9c/work/mpi ; more bigtest.mpich-p4-no-shmem No mail for jpr9c /bin/stty: standard input: Invalid argument -bash: /var/spool/PBS/mom_priv/jobs/92840.centurion001.SC: /bin/bash^M: bad interpreter: No such file or directory
The system believes that the file name for the interpreter is "bash^M" not "bash".
To work around this, there is a handy utility for stripping off this extra character: "dos2unix", which is installed on the dept. systems. On the Ubuntu systems, all you need to do is run the program once:
That's all! Then you should be able to successfully submit your job.jpr9c@power1 : /af13/jpr9c/work/mpi ; dos2unix test.dos.sh jpr9c@power1 : /af13/jpr9c/work/mpi ;