UVACSE
From VCGR Wiki
| Revision as of 14:10, 27 October 2009 Sarnowska (Talk | contribs) ← Previous diff |
Current revision Sarnowska (Talk | contribs) |
||
| Line 1: | Line 1: | ||
| + | =Printing Code= | ||
| + | enscript -2G -e4 -r2 -T 4 --mark-wrapped-lines=arrow --file-align=2 -o <outputfile> <filename> | ||
| + | ps2pdf <outputfile> | ||
| + | |||
| + | *only works on a double-sided printer | ||
| + | |||
| =Getting STDOUT/STDERR= | =Getting STDOUT/STDERR= | ||
Current revision
Printing Code
enscript -2G -e4 -r2 -T 4 --mark-wrapped-lines=arrow --file-align=2 -o <outputfile> <filename> ps2pdf <outputfile>
- only works on a double-sided printer
Getting STDOUT/STDERR
1. Add <Output> and <Error> elements to the <POSIXApplication> element after the <Executable> element and before the </Argument> elements.
<POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"> <Executable>Sample.ext</Executable> <Output>stdout</Output> <Error>stderr</Error> <Argument>arg</Argument> </POSIXApplication>
2. Stage the stdout and stderr files out like any other output file.
<DataStaging> <FileName>stdout</FileName> <CreationFlag>overwrite</CreationFlag> <DeleteOnTermination>true</DeleteOnTermination> <Target><URI>rns:/GRID_PATH_WHERE_TO_SAVE_FILE/DESRIED_FILE_NAME</URI></Target> </DataStaging> <DataStaging> <FileName>stderr</FileName> <CreationFlag>overwrite</CreationFlag> <DeleteOnTermination>true</DeleteOnTermination> <Target><URI>rns:/GRID_PATH_WHERE_TO_SAVE_FILE/DESRIED_FILE_NAME</URI></Target> </DataStaging>
Maintaining Reza's Jobs
Reza's 100 Windows Jobs (grid user: sarsara)
Login as gbg@camillus
Grid export contains job files located at /localtmp/gbg/rezaExport/convrg'
- 50 jobs in composite directory
- 50 jobs in quadruple directory
- Files located in these directories are accessible via the grid for staging in and out.
These are month long jobs that have been partitioned into one to two hour chunks. When a job chuck is runs successfully, it writes two files to an Action directory indicating that next job chunk can be started.
- *.act fine indicating next job chunk to start
- *.act.done file indicating job was indeed completed (as sometimes *.act file is created regardless of whether the job actually ran successfully)
To be Done Regularly
Santity Check Script to run every couple of days to check if any jobs are STUCK
>cd /localtmp/gbg/jobDispatcher
>./progressFix.2.sh
Job Scheduling Infrastructure
- located at /localtmp/gbg/jobDispatcher
- Action directory contains the files of jobs to be started as described above
- submitter.sh is the job dispatching script.
- Every couple of hours, submitter.sh concats to output.txt the current iteration of all jobs.
Output files created during Job Dispatching:
- /localtmp/gbg/jobDispatcher/avg.out - contains avgs of each jobs set's current iteration every two hours
- /localtmp/gbg/jobDispatcher/output.txt - contains the current iteration for each job
- /localtmp/gbg/jobDispatcher/proress.out - contains output for which jobs were found to be stuck
Stopping & Restarting from current iteration
1. Stop submitter.sh script
Get job's ID:
>ps aux | grep submitter.sh
>kill <job-id> (might be multiple IDs)
2. Delete/Complete sarsara jobs from the queue (consult with Reza if he's any other jobs) via qmgr tool in grid if you have access to trajan
start grid.sh
>qmgr /queues/grid-queue
If no access to trajan, pull up grid client on camillus, cat out qstat which will give you all of reza's jobs, then write bash script that kills each or completes each job
>~/XCG/GenesisII/grid
>qstat /queues/grid-queue > qstat.file
for each job id in file,
if running or requeued
>qkill /queues/grid-queue <job id>
>qcomplete /queues/grid-queue <job id>
if finished
>qcomplete /queues/grid-queue <job id>
3. Clean Action directory (when no more files are trickling into Action directory, all jobs have been deleted)
>cd /localtmp/gbg/jobDispatcher/Action
>rm *
4. Restart jobs
a. regenerate jsdl and action files for current iterations of jobs
>cd /localtmp/gbg/jobDispatcher/Scripts
>./restartAll.sh
b. start submitter.sh to get jobs submitted to grid
>cd /localtmp/gbg/jobDispatcher
>nohup submitter.sh > sub.out 2> sub.err < /dev/null &
Stopping & Restarting from zero
1. Stop submitter.sh script
Get job's ID:
>ps aux | grep submitter.sh
>kill <job-id> (might be multiple IDs)
2. Delete/Complete sarsara jobs from the queue (consult with Reza if he's any other jobs) as above
3. Clean Action directory (when no more files are trickling into Action directory, all jobs have been deleted)
4. Saved old output and then delete the files >cd localtmp/gbg/rezaExport/convrg
>tar -czf composite.DATE.tar.gz composite
>tar -czf quadruple.DATE.tar.gz quadruple
>cd /localtmp/gbg/jobDispatcher/Scripts
>./rmResults.sh /localtmp/gbg/rezaExport/convrg/composite
>./rmResults.sh /localtmp/gbg/rezaExport/convrg/quadruple
5. Delete old JSDL XML files
>cd /localtmp/gbg/jobDispatcher/Scripts
>./cleanXML.sh quad
>./cleanXML.sh comp
5. Regenerate intial input files
>cd /localtmp/gbg/jobDispatcher/Scripts
>./addOuts.sh /localtmp/gbg/rezaExport/convrg/composite
>./addOuts.sh /localtmp/gbg/rezaExport/convrg/quadruple
6. Generate JSDL XML files and start jobs from zero
a. generate jsdl and action files from zero
>cd /localtmp/gbg/jobDispatcher/Scripts
>./startAll.sh
b. start submitter.sh to get jobs submitted to grid
>cd /localtmp/gbg/jobDispatcher
>nohup submitter.sh > sub.out 2> sub.err < /dev/null &
XCG Intall Issues
- Security attribute is not yet valid
- clock skew;make sure machine time is up to date
- Container.log only contains two lines. No services start.
- In the wrapper.log you will notice an "info" message about port being in use. This is a symptom of another java XCGContainer processes already running occupying the port and the container not being able to start.
Restarting Reza's Jobs
- Stop jobs via qmgr tool and kill job submitter script
- Save old results and post link
- Cleanup Actions folder once all jobs stop running
- Remove old result files
./rmResults.sh <location>
- Cleanup XML
./cleanXML.sh <location>
- Create XML and scripts for new set of jobs
./startAll.sh
- Check job count in Actions
- Start submitter script
Adding Desktop BESes to Grid
1. Follow the Installation guide to
- Sign up for XCG account at http://www.cs.virginia.edu/~vcgr/userrequest/
- An XCG identify is needed so you can install your grid container as your own
- Install Genesis II from http://www.cs.virginia.edu/~vcgr/wiki/index.php/Genesis_II_Downloads
- Choose installer correct for you system
- download installer from website
- choose full container option, remaining options leave default
2. Send name of your newly installed container to uvacse for initialization.
To confirm the name of your machine as perceived by the grid:
- From the grid shell, list the containers under the uninitialized-containers directory
$> login ... $> ls /uninitialized-containers
- determine which is your machine and send name to uvacse@virginia.edu
3. Once your container is initialized, you will need to create the BES resource that will enable jobs to be run on your container
- From the grid shell, run the besmanager tool
$> besmanager
- send chosen BES resource location to admin
- your resource will be added to the queue
- configure BES options
- choose to have jobs run on your machine anytime or only when you are NOT logged in
- From the grid shell, ru nthe besmanager tool
Creating updated BES Resources with correct memory specifications
Update Properties File
- get sample pbs bes properties files from repository
- uncomment:
edu.virginia.vcgr.genii.native-q.resource-override.physical-memory
- set to memory value in bytes:
2 gigabytes = 2.147483648E9 bytes 3 gigabytes = 3.221225472E9 bytes 4 gigabytes = 4.294967296E9 bytes 24 gigabytes = 2.5769803776E10 bytes
Create BES Resources
- unlink bes resources from queue
cd /queues/grid-queue/resources unlink itc-smallmem-pbs.from_elder unlink generals-pbs.from-camillus ulnink centurion-pbs.from-camillus unlink sunfire-pbs.from-camillus unlink radio-pbs.from-camillus
- rm bes resources from path (not successful - instead created differently named updated resources)
cd /bes-containers/UVA/CS/GBG/Generals cd /bes-containers/UVA/ITC/Cluster/
- create new bes resources using updated files
cd /containers/UVA/CS/GBG/Generals/camillus/Services create-resource --creation-properties=props/centurion-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/CS/GBG/Generals/pbs-centurion.from-camillus create-resource --creation-properties=props/generals-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/CS/GBG/Generals/pbs-generals.from-camillus create-resource --creation-properties=props/radio-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/CS/GBG/Generals/pbs-radio.from-camillus create-resource --creation-properties=props/sunfire-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/CS/GBG/Generals/pbs-sunfire.from-camillus
cd /containers/UVA/ITC/Cluster/elder.itc.virginia.edu/Services create-resource --creation-properties=props/smallmem-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/ITC/Cluster/pbs-smallmem.from-elder create-resource --creation-properties=props/lc4-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/ITC/Cluster/pbs-lc4.from-elder
- Add permission so all can read and submit jobs to bes resources
chmod /bes-containers/UVA/CS/GBG/Generals/pbs-centurion.from-camillus +r --everyone chmod /bes-containers/UVA/CS/GBG/Generals/pbs-centurion.from-camillus +x /groups/uva-idp-group chmod /bes-containers/UVA/CS/GBG/Generals/pbs-centurion.from-camillus +x /groups/uva-standard-assurance-new ...
- add updated resources back to queue
ln /bes-containers/UVA/CS/GBG/Generals/pbs-radio.from-camillus /queues/grid-queue/resources/pbs-radio.from-camillus ln /bes-containers/UVA/CS/GBG/Generals/pbs-sunfire.from-camillus /queues/grid-queue/resources/pbs-sunfire.from-camillus ln /bes-containers/UVA/CS/GBG/Generals/pbs-centurion.from-camillus /queues/grid-queue/resources/pbs-centurion.from-camillus ln /bes-containers/UVA/CS/GBG/Generals/pbs-generals.from-camillus /queues/grid-queue/resources/pbs-generals.from-camillus ln /bes-containers/UVA/ITC/Cluster/pbs-smallmem /queues/grid-queue/resources/pbs-smallmem.from-elder
- configure resources slots
qconfigure /queues/grid-queue pbs-radio.from-camillus 32 qconfigure /queues/grid-queue pbs-sunfire.from-camillus 32 qconfigure /queues/grid-queue pbs-centurion.from-camillus 64 qconfigure /queues/grid-queue pbs-generals.from-camillus 24 qconfigure /queues/grid-queue pbs-smallmem.from-elder 24
How to Test a Port is Open
http://www.wfu.edu/~borwicjh/howto/port-test.html
telnet <server-name> PORTNUMBER telnet modular.math.jmu.edu 18443
UNIX system info
- /proc/meminfo
- /proc/cpuinfo
- uname -a
Installing MPICH2
- acquired source from
http://www.mcs.anl.gov/research/projects/mpich2/
- acquired script for configuring with options from Bill Pemberton
! /bin/sh # options used on our cluster with mpich2-1.0.8p1.tar.gz # change the --prefix option in the configure line OPTIMIZATION_FLAG="-O3 -fno-strict-aliasing" CONFIG_ENABLE_F77="--enable-f77" CONFIG_ENABLE_F90="--enable-f90" export CC=gcc export CXX=g++ export FC=gfortran export F77=gfortran export F90=gfortran export CFLAGS="-Wall" export FFLAGS="-fPIC" export CXXFLAGS="-fPIC" export F90FLAGS="" export CONFIG_FLAGS="--enable-debuginfo --enable-totalview" export CFLAGS="$CFLAGS $OPTIMIZATION_FLAG -fPIC -g" export USER_CFLAGS export MPE_OPTS export MPE_CFLAGS export LDFLAGS ./configure --enable-sharedlib --prefix=/home/griduser/mpich2 $CONFIG_ENABLE_F77 $CONFIG_ENABLE_F90 $COMPILER_CONFIG $EXTRA_CFLAG $CONFIG_FLAGS make
- followed README file to install
- followed CS wiki to setup passwordless login to nodes
https://www.cs.virginia.edu/~csadmin/wiki/index.php/PBS#Logging_on_to_the_Cluster
Updating JSDL Tool
- Checkout from svn at
svn://camillus.cs.virginia.edu:9002/JSDLTool/trunk
- Commit changes in svn and get jar file
- Place updated jar file in vcgr account at
public_html/XCG/2.0/downloads/JSDLTool.jar
Headnode Issues
mpdcheck -l reveals:
********** Your unqualified hostname resolves to 127.0.0.1, which is the IP address reserved for localhost. This likely means that you have a line similar to this one in your /etc/hosts file: 127.0.0.1 $uqhn This should perhaps be changed to the following: 127.0.0.1 localhost.localdomain localhost **********
Since this is not a file I can change, I need to not include the headnode in the ring?
SVN Commands
Creating Branches
- svn copy SRC DST
- Copies the source to the path. This copy is not a simple file copy but is a repository copy (a special CHEAP copy that preserves history. This is the main way of creating branches and tags.
To create the XxPrivateBranch branch, you would issue the command:
svn copy svn://camillus.cs.virginia.edu:9002/GenesisII/trunk svn://camillus.cs.virginia.edu:9002/GenesisII/branches/XsPrivateCopy svn copy svn://camillus.cs.virginia.edu:9002/GenesisII/trunk svn://camillus.cs.virginia.edu:9002/GenesisII/branches/karolina/JSDLupdate4Mem
Checking out
Check out the main trunk of the Genesis II repository into a local directory called GenII
svn checkout svn://camillus.cs.virginia.edu:9002/GenesisII/trunk GenII
Check out branch XsPrivateCopy into local directory called Genii
svn checkout svn://camillus.cs.virginia.edu:9002/GenesisII/branches/XsPrivateCopy genii
Merging Source in SVN
Merging a branch with the trunk:
- checkout trunk
svn checkout svn://camillus.cs.virginia.edu:9002/GenesisII/trunk trunk
- checkout branch
svn checkout svn://camillus.cs.virginia.edu:9002/GenesisII/branches/<branchPath> branch
- get the change number for the branch checkout (this is the last r<NUM> from the output)
svn log --stop-on-copy -v svn://camillus:9002/GenesisII/branches/<branchPath>
- get latest changes from repository in trunk checkout and note change number
svn update
- merge branch into trunk (called from trunk checkout)
svn merge -r <branchNum>:<repoNum> svn://camillus.cs.virginia.edu:9002/GenesisII/branches/<branchPath>
Merging after implementing updates to propagating memory request to PBS script.
svn merge -r 1377:1383 svn://camillus.cs.virginia.edu:9002/GenesisII/branches/karolina/JSDLupdate4MemPBS
- see updated files that will be checked in on commit in trunk checkout
svn status
- remove file from being checked-in in trunk checkout
svn revert <filename>
- run tests to confirm all is working
ant clean ant
- finalize changes into repository in trunk checkout
svn commit
Testing BES/JSDL Changes for Memory Based Scheduling
Bootstrap test grid
script deployments/default/configuration/bootstrap.xml
Login and confirm
login deployments/default/security/keys.pfx
Create queue
cd /containers/BootstrapContainer/Services create-resource --rns QueuePortType /testq ln /bes-containers/BootstrapContainer /testq/resources/bs
Copy test files into grid
cp --local-src me/javaHW.xml / cp --local-src me/hw.java /home/ cp --local-src me/hw.class /home/ cp --local-src me/run.bat /home/
cp --local-src javaHW.xml / cp --local-src hw.java /home/ cp --local-src hw.class /home/ cp --local-src run.sh /home/
Run job directly on resource
run --jsdl=me/javaHW.xml /bes-containers/BootstrapContainer run --jsdl=me/javaHW2.xml /bes-containers/BootstrapContainer run --jsdl=javaHW.xml /bes-containers/BootstrapContainer
Run job via queue
cp javaHW.xml /testq/submission-point cp javaHW2.xml /testq/submission-point
Check results
cat /home/stdout cat /home/stderr
Memory request values
1 MB = 1000000
105 GB = 112345678901
<TotalPhysicalMemory>
<UpperBoundedRange>1000000</UpperBoundedRange>
</TotalPhysicalMemory>
Testing BES/JSDL Changes
Bootstrap test grid
script deployments/default/configuration/bootstrap.xml
Login and confirm
login deployments/default/security/keys.pfx
Create PBS BES resource
cd /containers/BootstrapContainer/Services create-resource --creation-properties=mpi-test.properties --rns GeniiBESPortType /bes-containers/power2-pbs
create-resource --creation-properties=me/pbs.properties --rns GeniiBESPortType /bes-containers/pbs ln /bes-containers/pbs /testq/resources/pbs chmod /bes-containers/pbs +r+x --everyone
Copy test file into grid
cp --local-src mpiTest.o /home/mpiTest.o
Run test
run --jsdl=mpiTestPPN.xml /bes-containers/power2-pbs run --jsdl=javaHW.xml /bes-containers/pbs run --jsdl=me/javaHW2.xml /bes-containers/pbs
Check results
cat /home/stdout cat /home/stderr
PBS Info
7.3 Command examples qsub -I submit an interactive job qsub -q queue submit a job directly to a paci�ed queue qstat list information about queues and jobs qstat -q list all queues on system qstat -Q list queue limits for all queue 5 Draft qstat -a list all jobs on system qstat -au userid list all jobs owned by userid qstat -s list all jobs with status comments qstat -r list all running jobs qstat -f jobid list all information known about speci�ed job qstat -Qf queue list all information known about speci�ed queue qstat -B list summary information about the PBS server qdel jobid delete speci�ed job qalter jobid modify the attributes of the job or jobs speci�ed by jobid pbsnodes -a list all nodes are currently available and some node characteristics.
- pbsnodes -l list all nodes are currently offine
man PBS_Resources
- nodes - Number and/or type of nodes to be reserved for exclusive use by the job.
VI Replacing
Replace all occurrences of ten with 10
:1,$s/ten/10/g
Setting Up Hess@VT
Config Steps
- Installed grid
- Compiled R and added to PATH
- Configured container
- Created bes resource for pbs queue
- Ran test R job on BES resource
- Added BES resource to grid queue and configured slots
script configure-containers.xml hess.arc.vt.edu /containers/VT hess.arc.vt.edu
create-resource --creation-properties=hess.properties --rns GeniiBESPortType /bes-containers/VT/hess.arc.vt.edu chmod /bes-containers/VT/hess.arc.vt.edu +r --everyone chmod /bes-containers/VT/hess.arc.vt.edu +x /groups/uva-idp-group chmod /bes-containers/VT/hess.arc.vt.edu +x /groups/uva-standard-assurance-new
run --jsdl=singleTestL.xml /bes-containers/VT/hess.arc.vt.edu
ln /bes-containers/VT/hess.arc.vt.edu /queues/grid-queue/resources/vt-pbs.from-hess qconfigure /queues/grid-queue vt-pbs.from-hess 16
PBS Node info for hess as of 6/30/09
- 96 total nodes
- 53 free nodes
- 25 busy nodes
- 12 offline nodes
Adding Matching Params
- Syntax for adding matching parameter for scheduling
matching-params /bes-containers/<path> "add(<param>, <value>)"
matching-params /bes-containers/UVA/ITC/Cluster/elder-smallmem "add(supports-mpi, true)"
Grid-Queue Resource Status
Slots vs Available CPUs as of 6/29/2009
- Radio-pbs 32/52
- Centurion-pbs 64/128
- Sunfire-pbs 32/44
- Generals-pbs 24/32
- THNA233 2/2
- trajan 8/8
- pompey 8/8
Persistent Datastaging
Changes to JSDL:
0
<DataStaging> <FileName>R-2.9.1-win32.exe</FileName> <CreationFlag>dontOverwrite</CreationFlag> <DeleteOnTermination>false</DeleteOnTermination> <Source><URI>rns:/home/sarsara/csExport/R-2.9.1-win32.exe</URI></Source> </DataStaging>
1
</JobName>(after)
<JobAnnotation>R-2.9.1</JobAnnotation>
2
(before)<OperatingSystem>
<FileSystem name="SCRATCH"/>
3
(before)</POSIXApplication>
<Argument filesystemName="SCRATCH">R-2.9.1-win32.exe</Argument>
4
(before)<CreationFlag>dontOverwrite
<FilesystemName>SCRATCH</FilesystemName>
Declaring Persistant Datastaging in JSDL
1. Choose a global name for jobs that will be using the persistent files. After the <Job Name> element, insert the following element declaring this name. This element should be included by all jobs that will be using the persistent files.
<JobAnnotation>GLOBAL_JOB_NAME</JobAnnotation>
2. In the <DataStaging> element that declares the staging-in of a file that is to persist, add a <FilesystemName> element and set the <CreationFlag> element to dontOverwrite and the <DeleteOnTermination> element to false. These changes indicate that the file should not be downloaded if a file with that name already exists on disk.
<DataStaging> <FileName>FILE_NAME</FileName> <FilesystemName>SCRATCH</FilesystemName> <CreationFlag>dontOverwrite</CreationFlag> <DeleteOnTermination>false</DeleteOnTermination> <Source><URI>rns:PATH_TO_FILE_IN_GRID</URI></Source> </DataStaging>
3. Before the <OperatingSystem> element under the <Resources> element, insert the following element declaring that we are using the filesystem.
<FileSystem name="SCRATCH"/>
4. If you are passing the name of a persistent file as an argument to your program, insert filesystemName="SCRATCH" in that arguments XML tag to indicate it should be pulled from the filesystem if possible.
<Argument filesystemName="SCRATCH">FILE_NAME</Argument>
Creating Lightweight Export with Security for User
export --create /containers/camillus/Services/LightWeightExportPortType /localtmp/gbg/rezaExport /home/sarsara/tmpExport vcgr:$>chmod /containers/camillus +rx /users/sarsara vcgr:$>chmod /containers/camillus/Services +rx /users/sarsara vcgr:$>chmod /containers/camillus/Services/LightWeightExportPortType +rx /users/sarsara chmod <export dir> +rwx <user>
export --create /containers/UVA/CS/GBG/Generals/sulla/Services/LightWeightExportPortType /localtmp/gbg/tester /home/tester/sullaExport
export --create /containers/UVA/CS/GBG/Generals/camillus/Services/LightWeightExportPortType /localtmp/gbg/rezaExport /home/sarsara/csExport
Container Log Locations
/GeniiNet/GeniiNetInstall/ext/JavaServiceWrapper/bin/container.log or /GeniiNet/GeniiNetInstall/container.log
Poster Printing
Printer Properties>Options>ConfigName>Portrait D 24 x 36
- Ensure paper size is set to 24 x 36.
- Saving to jpeg/image formats can flat out layers.
- Always look at print preview mode to check on margins.
Nohup
nohup myprogram > foo.out 2> foo.err < /dev/null &
nohup submitter.sh > sub.out 2> sub.err < /dev/null &
SVN CheckUp
If SVN is not responding in Eclipse, look for SVN server running on Camillus on port 9002 via ps aux. It not running, restart SVN using script startSVN located @ gbg home. Run only on Camillus.
Connecting to XCG Grid
connect http://www.cs.virginia.edu/~vcgr/GeniiNet/root.xml GeniiNetFullContainer
Installing Genii containers that point to pbs queue on cluster front ends
Setting up PBS Queue on Elder. Cedar, & Dogwood
Installing Genii
- Removed clean.sh if applicable
- Created local directory for genii state and linked to .genesisII where applicable
~/.genesisII-2.0 -> /export/vanet/genesisII-state
- Created local directory for install and created link
~/XCG/GenesisII -> /export/vanet/GeniiNet2_0Install
- downloaded newest installer from website and ran on commandline to install under ~/XCG/GenesisII
./XCG-installer.sh -c
- Checked that container is running
ps aux | grep java
Configuring Containers
Examined old script (configure-containers.xml) and picked out the functions such that the itc cluster nodes will have their security initialized and will be linked at the correct permanent location. New script is located on cicero at ~/XCG/GenesisII/KarsScripts/configure-containers.xml.
Functions for the new configuration script:
- <function name="makeDirectories"> - Creates directory with passed in path under /containers ; path may already exist
- <function name="configureContainerService"> - Gives read writes to everyone; gives read and execute rights to all under groups
- rmUndo, unlinkUndo, attempUndo - remove/unlink/undo function
- <function name="configureContainer"> - set read rights for everyone on uninitialied containers and its services; unlinks from uninitialized containers and links to /containers (relativePath - path where container should appear under /containers)
- <function name="configureContainers"> (containerPattern -arg1, relativePath - arg2) - calls makeDirectories for relative path; configures each container; If 3 arguments, configureContainers function is called.
Commandline arguments:
<container-pattern> <relative-path-name> <container-link-name>
- <container-pattern> - name of container under /uninitialized-containers
- /uninitialized-containers/<container-pattern>
- <relative-path-name> - where to place container under /containers
- /containers/<relative-path-name>
- <container-link-name> - what to name new container
- /containers/<relative-path-name>/<container-link-name>
Note: It took over 10 minutes for each container to be configured. Having trouble with lc3 aka dogwood. Keeps failing while security is being set for container services. Reinstalling fixed problem.
Creating PBS BES Resources
A pbs.properties file needs to be created for each new BES resource. Example files under SVN in example-nativeq directory. Also under ~/XCG/GenesisII/KarsScripts on Cicero.
For ITC Clusters, copied file as /itc-pbs.properties and changed the following per emailing with Katherine.
- qname = workq
- mpich1 commandline-args=-comm mpich-p4
Under each new containers services, created resource as follows:
>create-resource --creation-properties=example-nativeq/itc-pbs.properties --rns GeniiBESPortType /bes-containers/UVA/ITC/<node_name>-workq
Command syntax:
create-resource [--creation-properties=<properties-file>] [--rns | --url] <service-path-or-url> [<new-rns-path>]
- Add permission so all can read and submit jobs to bes resource.
>chmod /bes-containers/UVA/ITC/<node_name>-workq +r --everyone >chmod /bes-containers/UVA/ITC/<node_name>-workq +x /groups/uva-idp-group >chmod /bes-containers/UVA/ITC/<node_name>-workq +x /groups/uva-standard-assurance-new
Testing new PBS BES Resources
Ran sample job to confirm newly added queues are working.
>run --jsdl=example-jsdl/example-posix-jsdl.xml <queue_name>-workq
- cedar: success
- dogwood: success
- elder: success
Create New "Grid-Queue" Queue
8/7/9 - Creating new queue to resolve queue issues.
1. Under titus container services, create new queue resource. There is no requirement to do this on titus. Any container that provides at QueuePortType will do. We are choosing titus as this container was chosen to host the grid-queue previously.
Command syntax: create-resource [--creation-properties=<properties-file>] [--rns | --url] <service-path-or-url> [<new-rns-path>]
>create-resource --rns QueuePortType /queues/grid-queue
2. Add resources to which queue will be submitting jobs by linking in desired bes-containers. In this case we want the grid-queue to submit all available resources. We link in the bes-containers under the queue's resources.
Command Syntax: ln <source-path> [<target-path>]
>ln /bes-containers/<resource-path> /queues/grid-queue/resources/<resource-name>
3. Add permission so all can read and submit jobs to queue.
>chmod /queues/grid-queue +r --everyone >chmod /queues/grid-queue +x /groups/uva-idp-group >chmod /queues/grid-queue +x /groups/uva-standard-assurance-new
4. Configure queue to allow only a set number (in this case 8) of slots from one individual
>qconfigure /queues/grid-queue <resource> <# of slots>
vt-pbs.from-hess(24) generals-pbs.from-camillus(24) centurion-pbs.from-camillus(64) trajan-win(8) pompey-win(8) sunfire-pbs.from-camillus(32) radio-pbs.from-camillus(32) THN...(2)
5. Time to completion: 1 hour
Create New "Windows Short" Queue
Create a queue for short windows jobs that submits to trajan.
1. Under existing trajan container services, create new queue resource. There is no requirement to do this on trajan. Any container that provides at QueuePortType will do.
Command syntax: create-resource [--creation-properties=<properties-file>] [--rns | --url] <service-path-or-url> [<new-rns-path>]
>create-resource --rns QueuePortType /queues/windows-short-queue
2. Add resource to which queue will be submitting jobs by linking in desired bes-container. In this case we want the windows-short-queue to submit all jobs to trajan so we link in trajan's bes-container under the queue's resources.
Command Syntax: ln <source-path> [<target-path>]
>ln /bes-containers/trajan.cs.virginia.edu /queues/windows-short-queue/resources/trajan
3. Add permission so all can read and submit jobs to queue.
>chmod /queues/windows-short-queue +r --everyone >chmod /queues/windows-short-queue +x /groups/uva-idp-group >chmod /queues/windows-short-queue +x /groups/uva-standard-assurance-new
4. Configure queue to allow only a set number (in this case 8) of slots from one individual
>qconfigure /queues/windows-short-queue trajan 8
5. Time to completion: 1 hour
Restarting Genii Containers
After restarting a container, connect to the Grid and ensure you can ping the container. If not, consult container log (container.log) for debugging info. This is located in the installation directory.
Check Status/Restart Windows Container
Pull up the Services dialog under control panels. Locate the Genesis II Container Service. It's status will be indicated. If not running, you can left click and choose the restart option.
Restart Linux Container
Log into the machine and cd into install directory. Run the manage script with the status or start option:
>./Genesis-manage-script start|start
Queue Info
Bes-Containers
vcgr:$>ls /bes-containers bes-containers:
- BootstrapContainer
- UVA
- camillus
- centurion001-pbs
- pompey.cs.virginia.edu
- romulus.cs.virginia.edu
- sulla
- titus
- trajan.cs.virginia.edu
Linux Queue
vcgr:$>ls /queues/linux-queue/resources resources:
- titus
- dogwood-workq
- camillus
- sulla
- cedar-workq
- centurion001-pbs
Windows R Queue
Hosted on titus
Memory Configs
A currently working scheme for heap memory limits:
- Cicero container with Linux-queue: 1024MB
- Centurion container with PBS BES: 1024MB
- Camillus container with Test-queue: 1024MB
- Dogwood(lc3)/Cedar(lc2)/Elder(lc4) with PBS BES: 1024MB
Login Info
- Default login time of 1 week instituted 3/23.
- Get verbose login info via whoami and verbosity flag.
Setting Permissions for New Users
vcgr:$>chmod /groups/uva-idp-group null +x /home/<user>
Blas Package Installing
Ways to list what's installed
- locate blas
- rpm -qa | grep -i blas
- yum list installed | grep -i lapack
Debugging Reza's Jobs & Linux Queue
03/23 - job resubmission after two new patches to resolve bulk qstats and default login times; installing patches with Jes
Creating Resolver for Replicated Directories
Locating Files
Differences between unix cmds:
- which - shows the full path of (shell) commands
- locate - provides a way to index and quickly search for files on your system
- find - search for files in a directory hierarchy
Running MPI on ITC Clusters
Following information pulled from ITC site.
Compiling
To compile parallel programs, the open source MPI libraries OpenMPI and MPICH2 have been provided. A module corresponding to the compiler you wish to use must be loaded in order to set up the correct environment.
module load ompi-intel
Once the module is loaded the following commands should be used to compile programs that use MPI code:
- mpicc [options] file.c (C)
- mpiCC [options] file.C (C++)
- mpif77 [options] file.f (Fortran 77)
- mpif90 [options] file.f (Fortran 90)
Sample PBS Scripts
MPICH Job
!/bin/sh #PBS -l nodes=4:ppn=2 #PBS -l walltime=12:00:00 #PBS -m abe #PBS -M mst3k@virginia.edu cd $PBS_O_WORKDIR source /opt/Modules/default/init/sh module add ompi-pgi NP=`wc -l < $PBS_NODEFILE` mpiexec --prefix $OMPI_MPIHOME -np $NP -machinefile $PBS_NODEFILE myexec
MPICH2 Job
#!/bin/sh #PBS -l nodes=4:ppn=2 #PBS -l walltime=12:00:00 #PBS -m abe #PBS -M mst3k@virginia.edu cd $PBS_O_WORKDIR mpiexec -comm mpich2-pmi myexec
