From CS Support Wiki
Revision as of 21:25, 6 November 2014 by Jpr9c (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Using the PBS Cluster

Jobs are submitted to the cluster by creating a PBS job command file that specifies certain attributes of the job, such as how long the job is expected to run and how many nodes of the cluster are needed (e.g. for parallel programs). PBS then schedules when the job is to start running on the cluster (based in part on those attributes), runs and monitors the job at the scheduled time, and returns any output to the user once the job completes.

Logging on to the Cluster

Logins to the cluster are done via SSH to one of the head nodes; you should not log in direction on any of the compute nodes. You will need an SSH client on your computer - either OpenSSH for a *nix system or SecureCRT for a Windows PC.

Log onto power[1..6].cs.virginia.edu; these are the head nodes for the cluster, and act as the control console for the queues. These machines are appropriate for any interactive work such as source code editing, compilation, and submitting jobs through PBS.

Centurion001 is the PBS server node, and jobs are executed on the centurion[2..64] nodes by default. There are also the radio, generals, lava and realitytv cluster queues, but you should check with root first before using those queues.

The sections below will outline what you need to know to set up and run your jobs on the various clusters. "Screen Capture" examples are given in the gray boxes to show what you should expect to see as output from various commands.

Configuring Your Account

Use of the CS PBS system assumes some familiarity with the Unix/Solaris software environment. In order to use PBS for batch job submission, it may be necessary to configure some of your Unix account startup files. General information about the Unix operating system can be found here.

When a job is submited to the cluster through PBS a new login to your account is initiated, and any initialization commands in your startup files (.profile, .variables.ksh, .kshrc etc) are executed. In this case (running in batch mode) it is necessary to disable the interactive commands such as setting tset and stty. If these precautions are not taken then error messages will be written to the batch jobs error file and your program may not run. The recommended procedure to disable the interactive sections of the startup files is to test the environment variable PBS_ENVIRONMENT, which is set when PBS runs. If the variable has been set, meaning a PBS job has initiated the login, the interactive parts of the startup files are skipped. CS Department profiles are not set to run any commands interactively, by default, so unless you have modified this yourself, there is nothing to be concerned with.

Your CS account is completely separate from your ITC account, even though it has the same 'username' and UID. Your password and home directory are not shared between the two systems, so please remember that this is a different password.

Password-less Login

You will need to set up ssh for password-free login on the CS Departmental Systems. In order to do this, you will need to set up a public/private key pair for use with SSH. Log onto one of the interactive front ends (power[1..6]), and do the following:

1. Change directories to your .ssh directory:

: /af13/jpr9c ; cd .ssh

: /af13/jpr9c/.ssh ; 

2. Generate an ssh key-pair using ssh-keygen:

: /af13/jpr9c/.ssh ; ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/af13/jpr9c/.ssh/id_rsa): 

Go ahead and press "enter" and use the default file name. Next, just leave the "passphrase" blank - remember, you want to be able to log onto other systems using SSH without having to type in anything:

Enter passphrase (empty for no passphrase):
Enter same passphrase again:

The keys will be stored right where they need to be by default:

Your identification has been saved in /af13/jpr9c/.ssh/id_rsa.
Your public key has been saved in /af13/jpr9c/.ssh/id_rsa.pub.
The key fingerprint is:
a5:99:57:33:ee:d5:4f:b9:28:ff:91:4d:66:22:0e:5e jpr9c@power1

3. Next, we need to let the SSH client and server know that this key is to be used to allow you to log in, so add the public key to the "authorized_keys" file:

: /af13/jpr9c/.ssh ; cat id_rsa.pub >> authorized_keys

Be sure the permissions are set correctly on this file:

: /af13/jpr9c/.ssh ; chmod 644 authorized_keys

: /af13/jpr9c/.ssh ; ls -l authorized_keys 
-rw-r--r-- 1 jpr9c uucp 617 2008-07-14 13:39 authorized_keys

Please also verify that the permissions are set correctly on your .ssh directory - if your permissions are set too loosely (i.e., a stranger can write to the directory) - ssh will not trust the keys and will ignore them.

: /af13/jpr9c ; ls -ld .ssh/
drwx------ 2 jpr9c staff 1024 2010-08-30 14:45 .ssh/

4. Because SSH uses asymmetrical keys for host as well as user identification, the first time you connect to a new host, you get a prompt back asking if you want to add the host's public key to your known_hosts file. Since you need to be able to log onto the different machines without any keyboard-interaction, you'll need to add these host keys to your known_hosts file. Those keys are available in the pbs_hosts file; download a copy (either use the command below or 'right-click' and 'save-as') and add it to your known_hosts file:

: /af13/jpr9c/.ssh ; wget http://www.cs.virginia.edu/~csadmin/pbs/pbs_hosts 
--17:10:56--  http://www.cs.virginia.edu/~csadmin/pbs/pbs_hosts
           => `pbs_hosts.1'
Resolving www.cs.virginia.edu...
Connecting to www.cs.virginia.edu||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 141,440 (138K) [text/plain]

100%[====================================>] 141,440       --.--K/s             

17:10:56 (89.92 MB/s) - `pbs_hosts.1' saved [141440/141440]

: /af13/jpr9c/.ssh ; cat pbs_hosts >> known_hosts

5. Test your key setup to be sure you can log onto another system without typing in a password:

: /af13/jpr9c/.ssh ; ssh power2
Linux power2 2.6.24-19-generic #1 SMP Fri Jul 11 21:01:46 UTC 2008 x86_64

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To access official Ubuntu documentation, please visit:
Last login: Mon Aug 25 10:00:29 2008 from roark.cs.virginia.edu
No mail for jpr9c

It is absolutely essential that you have password-less SSH working for your PBS jobs to run! The PBS queue uses the SSH suite (scp, ssh) to move your files around the cluster and to copy your results back to you after your job has run. If you have to type anything to log onto one node from another, then review your steps, and see the trouble-shooting section below.

Using PBS

The PBS resource management system handles the management and monitoring of the computational workload on the Department clusters. Users submit "jobs" to the resource management system (PBS Server - Centurion001) where they are queued up until the system is ready to run them. PBS selects which jobs to run, when, and where, according to node attributes requested by the users. There are resource routing queues to try and ensure that users are handled as efficiently as possible and to maximize throughput.

To use PBS, you create a batch job command file which you submit to the PBS server to run on a PBS queue. A batch job file is simply a shell script containing the set of commands you want run on some set of cluster compute nodes. It also contains directives which specify the characteristics (attributes), and resource requirements (e.g. number of compute nodes and maximum runtime) that your job needs. Once you create your PBS job file, you can reuse it if you wish or modify it for subsequent runs.

PBS also provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job, in that it is placed into the queue and must wait for resources to become available before it can run. Once it is started, however, the user's terminal input and output are connected to the job in what appears to be an rlogin session to one of the compute nodes. Many users find this useful for debugging their applications or for computational steering.

PBS provides two user interfaces for batch job submission: a command line interface (CLI) and a graphical user interface (GUI). Both interfaces provide the same functionality. The CLI lets you type commands at the system prompt. This guide will only provide examples for using the CLI; if you are an experienced *nix and X-windows user and prefer the GUI, further information about how to configure and use the xpbs interface can be found in Chapter 5 of the PBS Pro User Guide. The remainder of this tutorial will focus on the PBS command line interface. More detailed information about using PBS can be found in the PBS Pro User Guide.

PBS Job Command File

To submit a job to run on the Centurion cluster, a PBS job command file must be created. The job command file is a shell script that contains PBS directives; these directives are preceded by #PBS. The following is an example of a PBS command file to run a serial job, which would require only 1 processor on 1 node.

#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -o output_filename
#PBS -j oe
#PBS -m bea
#PBS -M userid@virginia.edu

The first line identifies this file as a shell script. The next several lines are PBS directives that must precede any commands to be executed by the shell (e.g. the last two lines). The PBS directives are defined in the table below:

PBS Directive                         Function
#PBS -l nodes=1:ppn=1          Specifies a PBS resource requirement of
                               1 compute node and 1 processor per node.
#PBS -l walltime=12:00:00      Specifies a PBS resource requirement of
                               12 hours of wall clock time to run the job.
#PBS -o output_filename        Specifies the name of the file where job
                               output is to be saved. May be omitted to
                               generate filename appended with jobid number.
#PBS -j oe                     Specifies that job output and error messages
                               are to be joined in one file.
#PBS -m bea                    Specifies that PBS send email notification
                               when the job begins (b), ends (e), or
                               aborts (a).
#PBS -M userid@virginia.edu    Specifies an alternate email address where PBS
                               notification is to be sent.

#PBS -V                        Specifies that all environment variables
                               are to be exported to the batch job.

The following is an example of a PBS email notification to the user at the end of the job:

Date: Tue, 2 Sep 2008 12:43:09 -0500
From: root 
To: jpr9c@cs.virginia.edu
Subject: PBS JOB 9563.centurion001
PBS Job Id: 1187.centurion
Job Name:   script.sh
Execution terminated

Note that the walltime-used information in the email should be used to accurately estimate the walltime resource requirement in the PBS job command file for future job submissions so that PBS can more effectively schedule the job. When submitting a particular PBS job for the first time, the walltime requirement should be overestimated to prevent premature job termination.

After the PBS directives in the command file, the shell executes a change directory command to $PBS_O_WORKDIR, a PBS variable indicating the directory where the PBS job was submitted. Normally this will also be where the progam executable is located. Other shell commands can be executed as well. In the last line, the executable itself itself is invoked.

Submitting a Job

The PBS qsub command is used to submit job command files for scheduling and execution. For example, to submit your job with a PBS command file called "pbs_test.sh", the syntax would be

: /af13/jpr9c/work/pbs ; qsub pbs_test.sh 

Notice that upon successful submission of a job, PBS returns a job identifier of the form <jobid>.centurion001, where <jobid> is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status, deleting the job, or specifying job dependencies as described below.

There are many options to the qsub command as can be seen by typing man qsub at the command prompt on power[1..6].cs.virginia.edu or looking at the PBS Pro User Guide. Three of the more useful ones are the -W option for allowing specification of additional job attributes, the -I option, which declares that the job is to be run "interactively", and the -l option, which allows resource requirements to be listed as part of the qsub command. These are discussed below.

qsub Options

You may specify an alternate queue from the command line by using the -q <queue name> option:

: /af13/jpr9c/work/pbs ; qsub -q sunfire mpitest

: /af13/jpr9c/work/pbs ; qstat 489850
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
489850.centurion0 mpitest          jpr9c                    0 Q sunfire

Displaying Job Status

The qstat -a command is used to obtain status information about jobs submitted to PBS.

: /af13/jpr9c ; qstat -a

                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
9517.centurion0 bcs8d    centurio applu_1000  20134   1   1  512mb 10:00 R 02:04
9518.centurion0 bcs8d    centurio apsi_10000  29066   1   1  512mb 10:00 R 02:04
9520.centurion0 bcs8d    centurio bzip2graph   7502   1   1  512mb 10:00 R 02:03
9521.centurion0 bcs8d    centurio crafty_100   7515   1   1  512mb 10:00 R 02:03
9522.centurion0 bcs8d    centurio eoncook_10  26435   1   1  512mb 10:00 R 02:03
9523.centurion0 bcs8d    centurio equake_100  26444   1   1  512mb 10:00 E 02:01
9525.centurion0 bcs8d    centurio fma3d_1000  17953   1   1  512mb 10:00 R 02:03
9526.centurion0 bcs8d    centurio galgel_100  19263   1   1  512mb 10:00 R 02:03
9529.centurion0 bcs8d    centurio gzipgraphi    749   1   1  512mb 10:00 R 02:04
9530.centurion0 bcs8d    centurio lucas_1000  10537   1   1  512mb 10:00 R 02:03
9531.centurion0 bcs8d    centurio mcf_100000  10548   1   1  512mb 10:00 R 02:03
9532.centurion0 bcs8d    centurio mesa_10000  16103   1   1  512mb 10:00 R 02:03
9533.centurion0 bcs8d    centurio mgrid_1000  16113   1   1  512mb 10:00 R 02:03
9534.centurion0 bcs8d    centurio parser_100   4316   1   1  512mb 10:00 R 02:03
9538.centurion0 bcs8d    centurio twolf_1000  12837   1   1  512mb 10:00 R 02:03
9539.centurion0 bcs8d    centurio vortexone_  12846   1   1  512mb 10:00 R 02:03
9541.centurion0 bcs8d    centurio wupwise_10   8063   1   1  512mb 10:00 R 02:03

The first five fields of the display are self-explanatory. The sixth and seventh fields, titled NDS and TSK in the above display, indicate the total number of nodes and processors respectively required by each job. The ninth field indicates the required walltime (hrs:min.) and the last field shows the elapsed runtime. The tenth field titled S indicates the state of the job. The job state can have the following values:

State              Definition
E          Job is exiting after having run
H          Job is held
Q          Job is queued, eligible to run or be routed
R          Job is Running
T          Job is in transition (being moved to a new location)
W          Job is waiting for its requested execution time to be reached
S          Job is suspended

To see more specific information on your particular job, you can run "qstat -f <jobid>.centurion001":

: /af13/jpr9c/work/pbs ; qstat -f 10252.centurion001
Job Id: 10252.centurion001
    Job_Name = script.sh
    Job_Owner = jm6dg@centurion001.cs.virginia.edu
    resources_used.cpupercent = 95
    resources_used.cput = 02:13:54
    resources_used.mem = 1351476kb
    resources_used.ncpus = 1
    resources_used.vmem = 2500800kb
    resources_used.walltime = 38:47:19
    job_state = R
    queue = centurion
    server = centurion001
    Checkpoint = u
    ctime = Wed Sep 10 17:46:04 2008
    Error_Path = centurion001.cs.virginia.edu:/uf8/jm6dg/fractal/driver/src/scr
    exec_host = centurion039/1
    exec_vnode = (centurion039:ncpus=1)
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Wed Sep 10 21:35:50 2008
    Output_Path = centurion001.cs.virginia.edu:/uf8/jm6dg/fractal/driver/src/sc
    Priority = 0
    qtime = Wed Sep 10 17:46:04 2008
    Rerunable = True
    Resource_List.mem = 400mb
    Resource_List.ncpus = 1
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=1
    Resource_List.place = scatter
    Resource_List.select = 1:ncpus=1
    Resource_List.walltime = 100:00:00
    stime = 1221096950
    session_id = 29207
    job_dir = /uf8/jm6dg
    Variable_List = PBS_O_HOME=/uf8/jm6dg,PBS_O_LANG=en_US.UTF-8,
    comment = Job run at Wed Sep 10 at 21:35 on (centurion039:ncpus=1)
    etime = Wed Sep 10 17:46:04 2008

PBS Resources


The CS department has a number of clusters, each of which currently has it's own queue. You may use all the queues, but be mindful of which queues actually have the resources needed to run your job. For example, if you need 4 cpus (cores) on the same node, please be sure to use a queue with nodes that actually have more than two cores.

A note on resource limits: we try to have as few user/resource limits as possible on the queues in order to maximize throughput. There are times when the clusters are unused and we want users to be able to grab as many resources as possible. A few notes on how these limits work:

  • We use "soft" limits - these are limits that only apply when there are users queued and waiting on resources, otherwise a user is free to use everything.
  • We do not preempt running jobs; if a user is "first" and fills all the available nodes (over any soft limits), and another user gets queued waiting behind them, they will wait until the first user's jobs start to drain. Any jobs the first user also has queued will take a lower priority, until they are below the soft limits.
  • The scheduler does do some backfill; that is, jobs are not necessarily processed in strict FIFO order, but a close approximation. Large resource request jobs are given a higher priority (so that they don't starve behind backfill jobs). We are continually adjusting these limits and the scheduler configuration; if you find your jobs are starving (waiting for days, while others who come into the queue behind/after you are getting run), then please contact root ASAP so that we can get an accurate snapshot of what is going on. There is a strong desire to make the sharing as equitable as possible.

    Centurion (default queue)

    This is an x86-64 architecture queue. This is the default queue, please most jobs here.


    64 nodes 2 CPUs per node (AMD Opteron(tm) Processor 242) 2GB RAM per node


    max_user_res_soft.ncpus = 32


    This is another x86_64 queue, very similar to the Centurion queue.


    24 nodes 2 CPUs/node (AMD Opteron(tm) Processor 246) Radio[1..5] 4GB RAM Radio[6..24] 2GB RAM


    max_user_res_soft.ncpus = 48 max_user_run_soft = 24


    This is the general purpose i686 queue - these machines have the 32-bit kernel. These are ideal for smaller jobs which don't require 64bit memory.


    22 nodes 4 CPUs/node (Intel(R) Xeon(TM) CPU 2.80GHz stepping 07) 3 GB RAM/node




    This is the high-memory queue; big memory jobs should go here. Smaller jobs should be spread out across the other queues.


    8 CPUs/node (Intel(R) Xeon(R) CPU - a mixture, all dual-quad-core)

    48GB - Camillus, Sulla, Titus 32GB - Cray[1..4] 24GB - Romulus, Radio[25,26]


    max_user_res_soft.mem = 48gb max_user_res_soft.ncpus = 24

    Requesting a Specific Resource

    Generally we do not apply limits to users or queues or partition nodes among many different queues - the fewer limits, the better the aggregate throughput for everyone. However, because the hardware in the queue is heterogenous, it is necessary to request a specific resource in order to guaranty your job gets placed on a node that has that resource!

    Simple Case

    In the simplest case, you can simply add the resource to the command line resource list when submitting your job; this example is run interactively (-I flag) for illustration purposes. The special hardware resource in this case is a CUDA-capable GPU - we just use the -l cuda_gpu=1 flag:

    : /af13/jpr9c ; qsub -I -l cuda_gpu=1
    qsub: waiting for job 1364543.centurion001 to start
    qsub: job 1364543.centurion001 ready
    No mail for jpr9c
    : /af13/jpr9c ; ./work/pbs/cuda/testjob/deviceQuery 
    There is 1 device supporting CUDA
    Device 0: "Tesla C2050"
      Major revision number:                         2
      Minor revision number:                         0
      Total amount of global memory:                 2817720320 bytes
      Number of multiprocessors:                     0
      Number of cores:                               0
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 32768
      Warp size:                                     32
      Maximum number of threads per block:           1024
      Maximum sizes of each dimension of a block:    1024 x 1024 x 64
      Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Clock rate:                                    1.15 GHz
      Concurrent copy and execution:                 Yes
    Test PASSED
    Press ENTER to exit...

    Running regular jobs

    Ordinarily we do not want to run interactive jobs since the resource isn't freed until the user interactively logs off. So, the resource can be used by the next job. You should do your job debugging on one of the interactive nodes (for the CUDA hardware, these are cuda[1..3].cs.virginia.edu) until you have a job which will run without any user interaction (no touching they keyboard!).

    Our job in this case is very simple; here's the job script; we ask for the resource and the queue inside the script file this time:

    #PBS -q generals
    #PBS -l walltime=00:02:00
    #PBS -l mem=1GB
    #PBS -l cuda_gpu=1
    ./deviceQuery -noprompt

    And then submit to PBS, where it gets appropriately scheduled:

    : /af13/jpr9c/work/pbs/cuda/testjob ; qsub test_cuda.sh 
    : /af13/jpr9c/work/pbs/cuda/testjob ; qstat -f 1364546
    Job Id: 1364546.centurion001
        Job_Name = test_cuda.sh
        Job_Owner = jpr9c@cuda1.cs.virginia.edu
        job_state = Q
        queue = generals
        server = centurion001
        Checkpoint = u
        ctime = Fri Dec 20 11:45:03 2013
        Error_Path = cuda1.cs.virginia.edu:/af13/jpr9c/work/pbs/cuda/testjob/test_c
        Hold_Types = n
        Join_Path = n
        Keep_Files = n
        Mail_Points = a
        mtime = Fri Dec 20 11:45:03 2013
        Output_Path = cuda1.cs.virginia.edu:/af13/jpr9c/work/pbs/cuda/testjob/test_
        Priority = 0
        qtime = Fri Dec 20 11:45:03 2013
        Rerunable = True
        Resource_List.cput = 168:00:00
        Resource_List.cuda_gpu = 1
        Resource_List.mem = 1gb
        Resource_List.ncpus = 1
        Resource_List.nodect = 1
        Resource_List.place = pack
        Resource_List.select = 1:cuda_gpu=1:mem=1gb:ncpus=1
        Resource_List.walltime = 00:02:00
        substate = 10
        Variable_List = PBS_O_HOME=/af13/jpr9c,PBS_O_LANG=en_US.UTF-8,
        etime = Fri Dec 20 11:45:03 2013
        Submit_arguments = test_cuda.sh

    You can see where it was run on a machine with the correct resource:

    : /af13/jpr9c/work/pbs/cuda/testjob ; cat test_cuda.sh.o1364546 
    There is 1 device supporting CUDA
    Device 0: "Tesla C2050"
      Major revision number:                         2
      Minor revision number:                         0
      Total amount of global memory:                 2817720320 bytes
      Number of multiprocessors:                     0
      Number of cores:                               0
      Total amount of constant memory:               65536 bytes
      Total amount of shared memory per block:       49152 bytes
      Total number of registers available per block: 32768
      Warp size:                                     32
      Maximum number of threads per block:           1024
      Maximum sizes of each dimension of a block:    1024 x 1024 x 64
      Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
      Maximum memory pitch:                          2147483647 bytes
      Texture alignment:                             512 bytes
      Clock rate:                                    1.15 GHz
      Concurrent copy and execution:                 Yes
    Test PASSED

    You should be able to use the above files as templates for creating your own jobs.

    Debugging PBS Jobs

    Ordinarily, you should compile your program on one of the head/interactive nodes (power[1..6].cs.virginia.edu); the software and operating system environment (eg, filesystem layout, libraries, etc.) on the head nodes is identical to the PBS compute nodes. If your job will run correctly on the head node, it should run correctly when submitted through PBS. Please remember: to run in batch mode, your program should not require user interaction! Your test program should not require interactive input from the user (mouse or keyboard); stdin should come from a file or socket. To run a program interactively, see below.

    Most PBS users have selected PBS because they need to run very large numbers of jobs; either multiple runs of slightly altered jobs, or the same job run against multiple data sets. Because you will likely be queuing a very large number of jobs, and because we try to minimize the user limits on PBS resources, you can potentially take all the department PBS resources. Please remember to check your jobs to see if they are running as expected. If you feel a job is hung, please qdel it and do not submit large batches until you are confident they are running as expected.

    Normal Error Reporting

    PBS will give you two type of error reporting - batch processing errors are reported via email, and the stderr output of jobs is captured and returned to you in a file named: <script_name>.e<jobid>.

    To get PBS to report more extensively on batch processing errors, please use the "-m bae" script option so that PBS will report when it starts your job, when it aborts your job (and why), and when the job is ended. PBS will not report jobs which are switched to the "HELD" state because of errors actually batch-processing the job. If your job is "HELD" (status "H" in qstat) then it will stay that way indefinitely (or until the administrator cancels it), so you need to investigate why there were errors! To obtain more detailed information on why the job was held, please use:

    qstat -f <jobid>.centurion001
    If your job has a system hold, please send mail to root@cs.virginia.edu, with the output of that command in the contents of your email; we will need to see which host is causing your job to fail.

    If your job is being run through the batch system successfully, you will get notification that PBS has run your job and you will get two files back, stdout and stderr from your job. These are returned to the CWD context of your qsub, unless you specify an alternate $PBS_WRK_DIR or "-o <dir/filename>" option in your PBS script. If your results are not what you expect, please examine the std error file <scriptname>.e<jobid>.

    Running Interactively

    Interactive Console Shell

    The -I option of qsub declares that a job must be run "interactively". The job will be queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected through qsub to the terminal session in which qsub is running. To acquire a node in interactive mode, you can construct a trivial PBS script such as the following, which for purposes of this example will be called debug.sh:

    #PBS -l walltime=10:00:00

    Now submit debug.sh with

    qsub -I debug.sh
    Once the PBS intereactive job is executed, the terminal session will be logged into one of the compute nodes allocated by PBS (you will have an interactive bash session on the compute node). The executable can then be invoked manually from the command prompt. After you have completed your interactive session, be sure to exit from the shell on the compute node so that the node can be returned to PBS for other jobs. Exiting the shell terminates your interactive PBS job.

    Interactive GUI

    It is best to avoid the need for a graphical user interface when using PBS interactively. If you must use one, e.g. for developing a Matlab script, it is best to use it on the frontend. Keep in mind that an X server must be installed on your system; the dept. Linux systems already have XFree86 built in, and the Windows systems are deployed with Exceed. If there is an absolute need for using a graphical interface on the compute nodes, a more complex process is required. With eXceed under Windows: open a console window and at the prompt, type


    This will return your IP address. For the purposes of this example, suppose it is

    Once on the node assigned by PBS, you must set your environment display; type:

          export DISPLAY=cordelia.cs.virginia.edu:0
          export DISPLAY=
    depending on whether you know the name or only the IP address of your local machine.

    If you use tcsh, type:

          setenv DISPLAY jellybean.itc.virginia.edu:0
    and similarly for an IP address. You should now be able to run X applications on the compute node assigned by PBS.

    Common Errors

    Password-less SSH not properly set up

    If your password-free ssh keys are not set up properly, you will get debugging output back which looks like the message below. Please step through the procedures outlined above. Note: if you have old host keys for any of the PBS nodes in your ~<user>/.ssh/known_hosts file, it can cause problems; please be sure to remove all old keys, and append the keys in the download above to your known_hosts file.

    "bad UID"

    This generally happens when you are not logged into one of the authorized front ends; you must submit your jobs from one of the power nodes. If you are submitting from a power node, then it is likely your job is being scheduled on a compute node which is having authentication problems. Please report this too root@cs.virginia.edu, with the full output of the email you got back from PBS so we can track down the problem node.

    SCP I/O errors

    "scp: ambiguous target"

    If you have spaces in your path you may get this error back from PBS when it tries to write the output of your job back to your home directory. You will get an email indicating that the job executed but that the copy of results was left in the 'undelivered' directory on the compute node; specifically, you will get the message above in the output (an extract is shown below). If this happens, please check your path to be sure there are no white spaces in it! Unlike GUIs, command-line shells (and other binaries) use white spaces as inter-field separators (to separate the ARGV elements), and spaces in the path can confuse them. In general, when working on Unix systems, it's just a bad habit to use spaces; use underscores "_" as non-whitespace placeholders.

    In the example below, renaming the path from "CS 654" to "CS_654" fixes the problem.

    debug1: Next authentication method: publickey
    debug1: Trying private key: /af21/vm9u/.ssh/identity
    debug1: Offering public key: /af21/vm9u/.ssh/id_rsa
    debug1: Server accepts key: pkalg ssh-rsa blen 277
    debug1: read PEM private key done: type RSA
    debug1: Authentication succeeded (publickey).
    debug1: channel 0: new [client-session]
    debug1: Entering interactive session.
    debug1: Sending environment.
    debug1: Sending env LANG = en_US.UTF-8
    debug1: Sending command: scp -v -d -t /af21/vm9u/wrk/CS 644/
    debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
    scp: ambiguous target
    vm9u@centurion010:/var/spool/PBS/undelivered$ debug1: channel 0: free: client-session, nchannels 1
    debug1: fd 0 clearing O_NONBLOCK
    debug1: fd 1 clearing O_NONBLOCK
    debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.0 seconds
    debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
    debug1: Exit status 1
    Random Host-Key Exchange errors: "Post job file processing error"

    From time to time, some users are affected by a bug in OpenSSH regarding vulnerable host keys; we are not sure why this affects only some users and not others and have not been able to find the differences between those users. The problem results in PBS leaving your job output (STDOUT and STDERR) on the node where your job executed, and sending you email to that effect. You will see output similar to:

    Unable to copy file 9680.centurion001.OU to power1.cs.virginia.edu:
    >>> error from copy
    power1.cs.Virginia.EDU: Connection refused
    .cs.virginia.edu, user ag2dx, command scp -v -r -p -t
    OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Connecting to power1.cs.virginia.edu [] port 22.
    debug1: Connection established.
    debug1: identity file /uf13/ag2dx/.ssh/identity type -1
    debug1: identity file /uf13/ag2dx/.ssh/id_rsa type 1
    debug1: identity file /uf13/ag2dx/.ssh/id_dsa type -1
    debug1: Remote protocol version 2.0, remote software version OpenSSH_4.7p1
    debug1: match: OpenSSH_4.7p1 Debian-8ubuntu1.2 pat OpenSSH*
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_4.7p1 Debian-8ubuntu1.2
    debug1: SSH2_MSG_KEXINIT sent
    debug1: SSH2_MSG_KEXINIT received
    debug1: kex: server->client aes128-cbc hmac-md5 none
    debug1: kex: client->server aes128-cbc hmac-md5 none
    debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
    debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
    Host key verification failed.
    lost connection
    >>> end error output
    Output retained on that host in:

    The workaround for this situation is to disable strict host-key-checking in your user ssh configuration file (~<user>/.ssh/config):

    # This is the ssh client system-wide configuration file.  See
    # ssh_config(5) for more information.  This file provides defaults for
    # users, and the values can be changed in per-user configuration files
    # or on the command line.
    # Configuration data is parsed as follows:
    #  1. command line options
    #  2. user-specific file
    #  3. system-wide file
    # Any configuration value is only changed the first time it is set.
    # Thus, host-specific definitions should be at the beginning of the
    # configuration file, and defaults at the end.
    # Site-wide defaults for some commonly used options.  For a comprehensive
    # list of available options, their meanings and defaults, please see the
    # ssh_config(5) man page.
    Host *
    #   ForwardAgent no
    ForwardX11 yes
    ForwardX11Trusted yes
    #   RhostsRSAAuthentication no
    #   RSAAuthentication yes
    #   PasswordAuthentication yes
    #   HostbasedAuthentication no
       GSSAPIAuthentication no
    #   GSSAPIDelegateCredentials no
    #   GSSAPIKeyExchange no
    #   GSSAPITrustDNS no
    #   BatchMode no
    #   CheckHostIP yes
    #   AddressFamily any
    #   ConnectTimeout 0
       StrictHostKeyChecking no

    If you don't have this file, you can copy it from /etc/ssh/ssh_config on any department unix system, and then edit to customize your SSH settings:

    jpr9c@cordelia:~$ cp /etc/ssh/ssh_config .ssh/config

    If you have this problem and would like root to retrieve the results from your job, please just forward the error email to root and we'll copy the file off the node in question back to where you expected the results to go.

    MPI communication errors

    Shared Memory Bug

    There is presently a bug in the p4 (default) comm device on the x86_64 mpich library with shared memory. If you run a job using mpiexec and use the default comm (mpich-p4), and run on an SMP node (all our nodes are SMP nodes) you may have a shared memory problem. This problem is likely to occur if you compile and run an MPI job with the default options, requesting more than one CPU and you get multiple CPUs on the same node (ie, the most likely scenario). If you request an equal number of nodes to the number of CPUs you want, then your job will likely not have any errors.

    The error reported in your error file will look like:

    bm_slave_1_478: (0.035156) process not in process table; my_unix_id = 478 my_host=centurion002
    bm_slave_1_478: (0.035156) Probable cause:  local slave on uniprocessor without shared memory
    bm_slave_1_478: (0.035156) Probable fix:  ensure only one process on centurion002
    bm_slave_1_478: (0.035156) (on master process this means 'local 0' in the procgroup file)
    bm_slave_1_478: (0.035156) You can also remake p4 with SYSV_IPC set in the OPTIONS file
    bm_slave_1_478: (0.035156) Alternate cause:  Using localhost as a machine name in the progroup
    bm_slave_1_478: (0.035156) file.  The names used should match the external network names.
    bm_slave_1_478:  p4_error: p4_get_my_id_from_proc: 0
    p0_477: (98.054688) net_send: could not write to fd=4, errno = 32
    rm_14580: (-) net_recv failed for fd = 3
    rm_26216: (-) net_recv failed for fd = 3
    rm_31238: (-) net_recv failed for fd = 3

    If you see something like that, please specify the mpiexec option to not use shared memory:

    mpiexec -mpich-p4-no-shmem ./my_mpi_prog

    When the next release of the OSC mpiexec for PBS becomes available, we will upgrade to MPICH2 which should resolve this problem.

    IPC errors "p4_error: semget failed for setnum:"

    Several users have reported the problem that the MPI_INITIALIZE() fails because a previous MPI job has crashed on a node, and left the MPI semaphores locked. The typical error looks like:

    rm_13010:  p4_error: semget failed for setnum: 0
    p0_15425:  p4_error: net_recv read:  probable EOF on socket: 1
    p0_15425: (6.359375) net_send: could not write to fd=4, errno = 32
    rm_6498:  p4_error: net_recv read:  probable EOF on socket: 3
    rm_12042:  p4_error: net_recv read:  probable EOF on socket: 3

    There is a simple binary included with mpi which is used to clean up any semaphores left behind for a given user, called (suggestively) "clearipcs"; it's located in /usr/sbin on all the mpi-enabled systems, and will be part of your default environment. There are two ways to invoke this simple command - either before your script executes, or after, the latter being a very simple cleanup.

    To cleanup any old semaphores left behind from your previous (crashed) jobs, insert the following loop in your PBS Script file before you call mpiexec; this will loop through all the nodes you are assigned and call the cleanipcs command:

    ## Loop through nodes before executing my job to be sure there aren't any MPI shm semaphores left
    for i in `cat $PBS_NODEFILE | sort -u` ; do
         echo "removing IPC shm segments on $i"
         ssh $i "/usr/sbin/cleanipcs"

    After this runs, you should see something similar to this in your output file:

    removing IPC shm segments on centurion004
    removing IPC shm segments on centurion005
    removing IPC shm segments on centurion006
    removing IPC shm segments on centurion007
    removing IPC shm segments on centurion008
    removing IPC shm segments on centurion009
    removing IPC shm segments on centurion010
    removing IPC shm segments on centurion011
    removing IPC shm segments on centurion015
    removing IPC shm segments on centurion016
    removing IPC shm segments on centurion017
    removing IPC shm segments on centurion018
    removing IPC shm segments on centurion019

    Alternately, you can simply "source" the binary at the end of your script; this is generally a good practice anyway, as it ensures proper cleanup after your job completes, even if there is a crash. Just place the following line at the bottom of your script:

    # Cleanup any leftover IPC semaphores after execution
    . /usr/sbin/cleanipcs

    Please make note of the little "." at the beginning of that line; this is the magic shell character which tells the interpreter to "source" the filename which follows (execute or concatenate, for text files); if you are using bash, there is also a builtin 'source' command which can be used instead of the '.'.

    "bad interpreter" error

    If you create your script file on a Windows PC using some text editor, your file will be saved using "DOS" text. The EOL 'character' in a DOS/Windows text is actually a carriage-return and a linefeed character for the "newline" symbol, whereas on a *nix system only a linefeed is used. This extra character is often ignored by many *nix applications - they will display the file normally and "hide" the extra character. If the particular application will show the character, then it appears as a CTRL+M (shown here in /bin/vi on Solaris):

    #PBS -l walltime=00:02:00^M
    #PBS -l select=4:mpiprocs=1^M
    #PBS -m a^M
    #PBS -o bigtest.mpich-p4-no-shmem^M
    #PBS -j oe^M
    # show my node file^M
    mpiexec -mpich-p4-no-shmem ./testMPI^M

    Although most editors on the dept. Ubuntu systems will hide this extra character, the *nix shell (bash) and PBS will not ignore it. You will get an error message back from the PBS server that looks like this:

    : /af13/jpr9c/work/mpi ; more bigtest.mpich-p4-no-shmem 
    No mail for jpr9c
    /bin/stty: standard input: Invalid argument
    -bash: /var/spool/PBS/mom_priv/jobs/92840.centurion001.SC: /bin/bash^M: bad interpreter: No such file or directory

    The system believes that the file name for the interpreter is "bash^M" not "bash".

    To work around this, there is a handy utility for stripping off this extra character: "dos2unix", which is installed on the dept. systems. On the Ubuntu systems, all you need to do is run the program once:

    : /af13/jpr9c/work/mpi ; dos2unix test.dos.sh
    : /af13/jpr9c/work/mpi ; 
    That's all! Then you should be able to successfully submit your job.