Cross Campus Grid (XCG)
From VCGR Wiki
Contents |
The Virginia Cross Campus Grid (XCG)
The Virginia Cross Campus Grid (hereafter the XCG) is up and running on after a Summer of intense testing. If you have the need for faster computation or the need to securely and easily share data with colleagues, we invite you to use the XCG. The objectives of the XCG are to
• provide computational resources for science and engineering research,
• provide a secure data sharing platform for members of the University community and their colleagues at other institutions, and,
• serve as a test-bed for emerging Grid standards.
The XCG is implemented using Genesis II developed at the University of Virginia. Genesis II is the first integrated implementation of the standards and profiles coming out of the Open Grid Forum (OGF) Open Grid Service Architecture (OGSA) Working Group. The OGF is the standards organization for Grids. Genesis II is a complete set of Grid services for users and applications which not only follows our maxim “by default the user should not have to think”, but is also a from-scratch implementation of the standards and profiles – not a wrapping of existing artifacts. Genesis II is open source under the Apache license.
What is a Grid? [1]
What Can I Do With The XCG?
There are basically two things you can do with the XCG: you can use it to access Grid compute resources, or to export and access data.
Need speed?
Suppose you have an application that you need to run for your research, and that you need to execute the application many times; for example many executions with different parameters or input files, or simply a large number of times to establish a statistical property. In this case, you have a high-throughput problem. If each execution takes many minutes or hours, and you have hundreds or thousands of jobs, this could take days or weeks to run on your desktop. The XCG compute queues can be used to solve this sort of problem. Descriptions of the jobs to be executed are submitted to the XCG, and the jobs are distributed throughout the XCG and executed concurrently. Data management and movement in/out of your local compute environment is managed by the XCG. The result is you get your results tens to hundreds of times faster.
Similarly, if you already have an MPI application you can use the XCG to select a cluster for your job and run it.
Need to share data?
Suppose you are working on a project with a colleague in another department or at another institution, and having direct access to each others files would accelerate your research. For example, you have a colleague with an instrument that writes its data directly onto a hard disk in their lab, and you would like to be able to read those files as if they were local to your machine. You may perhaps post-process those results and store them locally, and your colleagues may in turn want to be able to read, modify, or view the results. This is a data sharing problem.
To solve this sort of problem using the XCG, the person who wants to share their data “exports” the data into the XCG, and the person(s) who want to access that data mount the Grid into their local Linux or Windows file system and then access the data as if it were in a local file system. Data access is fully secure using the latest Web Services security standards.
What does it mean to “share” or “export” data? The basic idea is really simple. Suppose you have a directory (also called a “folder” in Windows) on your hard drive that you want to share with a friend or colleague. You can use the Genesis II export Grid Shell command. Invoked without any command-line options, the export command starts up a simple GUI (which requires X-windows in Linux environments). The GUI allows you to specify a local directory path that you want to export, and the directory path where you want to place it in the XCG name space. The process is similar to mapping/mounting a drive, except that you are mounting a portion of your local file system into the Grid namespace.
Once exported, the directory and its contents can be manipulated both via local users and by XCG users. Updates made by local users are visible by remote users, and vice versa. Access by remote users is governed by access-control lists that you establish. The identity(ies) of the user who performed the export are given initial, exclusive Grid access to the exported resources. After performing an export, you will typically modify the access-control policies for the newly-exported resources to specify which users and groups can read, write, and execute (i.e., make subdirectories) these exported resources.
What resources are available on the XCG?
There are two broad classes of computational resources available on the XCG: clusters running Linux and desktop PCs running Windows. There are over 1000 cores (think CPUs) of Linux in the Grounds Grid, and 500 Windows cores currently (more than 2000 cores are available, but have not yet been included.) The Windows machines can be used as a high throughput-computing resource, and the Linux machines can be used either for high throughput-computing or to run true parallel jobs using MPI, Matlab, or Mathematica.
In terms of data resources, the data resources that are available are those that you and other users share or export into the XCG.
How Can I Use the XCG?
To use the Grounds Grid send email to the UVA computational science and engineering help group(uva-cse@virginia.edu) to set up a meeting to determine how the Grounds Grid can best meet your needs. We can create a Tiger Team to provide training and assistance to get your project up and going on the Grid.
Alternatively you may apply for an account via our XCG user application form located at http://www.cs.virginia.edu/~vcgr/userrequest. Once your account has been created, you will be able to log in to the XCG by issuing the login command from the grid shell. The syntax for this command will be:
login --username=<your username> rns:/users/<your username>
So, for example, if your login id is mmm2a, you would login using the command:
login --username=mmm2a rns:/users/mmm2a
The command will prompt you for your password. Please keep in mind that when you log in, you stay logged in for one month. If a job you submit is expected to take longer to complete than this time you will need to log in for a longer period via the --validDuration login option.
Demonstration of XCG
Clicking on the link will take you to an MP4 movies that demonstrate some of the features of the XCG. [XCG/Background on the standards used in Genesis II]
[XCG/GenesisII Accessing the Grid via the File System Demonstration]
[XCG/Genesis II Installation on a Windows system]
[XCG/Genesis II Export Data Demonstration]
[XCG/Genesis II Create an Execution Service on your machine]
[XCG/GenesisII High Throughput Computing with JSDL-tool]
Currently Available Computational Resources
Linux
There are currently 214 Linux job slots available via the XCG. These slots are composed of the following resources.
- UVA ITC Cluster: Cedar & Dogwood (24 slots) - 120 nodes in Cedar, dual-cpu 1.8GHz AMD Opteron 244, 2GB RAM/node; 200 nodes in Dogwood, dual-cpu 3GHz Intel Xeon, 3GB RAM/node
- UVA ITC Cluster: Elder (16 slots) - 12 nodes, dual-core 3GHz Intel Xeon 5160, 32GB RAM/node; 32 nodes, two quad-core 2.66GHz Intel Xeon L5430, 32GB RAM/node
- UVA CS Cluster: Generals (24 slots) - 6 nodes, 8-way 2.33GHz Intel Xeon, 24GB RAM/node
- UVA CS Cluster: Centurions (64 slots) - 64 nodes, dual-cpu 1.6GHz AMD Opteron 242, 2GB RAM/node
- UVA CS Cluster: Sunfires (32 slots) - 22 nodes, dual-cpu 2.8GHz Intel Xeon, 3GB RAM/node
- UVA CS Cluster: Radios (32 slots) - 26 nodes, dual-cpu 2GHz AMD Opteron 246, 2GB RAM/node
- VTech Cluster: Hess (24 slots) - 96 nodes, 4-way 2.66GHz Intel Xeon, 11.7GB RAM/node
Windows
There are currently 330 Windows job slots available via the XCG. These slots are composed of the following resources:
- UVA ITC Labs: THNA233, BRN235, CLM401 - 157 dual-cpu machines
- UVA CS Workstations - 2 8-way machines
FAQ
(1) Can Genesis II ignore/coexist with Globus WS containers and stuff?
Absolutely. The only constraint is that they must not use the same port numbers - very low probability. By default we use port 18443, though like everything else that can be changed by editing the configuration file.
(2) Must the local filesystems for the HPC system that you're submitting to be locally mounted on the Genesis II container host too?
There must be at least one shared file system that the container box can write files to and the "nodes" can read from, and that we can read/ write to/from, e.g., /shared/XCG_test_account/.
(3) So you'll need 18443 TCP (and/or UDP?) allowed inbound to the box, right? Any specific outbound port permissions/restrictions?
The container will be making web service over https connections to servers wherever client data resides and two a global queue (think meta scheduler). In general though, any restrictions will limit the data sources and sinks.
(4) So all users authenticating to your service need to have a local account on the container box?
There are really several questions here.
First, authentication/authorization. Our services use TLS transport level data integrity/authentication and WSI-BSP SOAP secure token authentication. We support two WSI-BSP authentication styles in the code: username/password (not recommended), and signed (by an X.509) SAML token. The SAML token contains a chain of X.509 signatures. These tokens are used in the access control policy.
Our AuthN is ACL - access control lists. One can allow a particular set of username/password tokens as access, but as a practice we put the signed X.509 .CER file of identities that should have access to the resource .. whether they are individuals or group identities.
Our current implementation multiplex's all requests onto a single account. There are other implementations of BES that do not do that because it requires root level permission on the machine - and it is in our roadmap to optionally support a usermap file. (We did this in the past with Legion.)
Note that all job executions are logged with the activity done, and with the signed, delegated to the BES service credentials of the caller. So we know who is doing what at all times.
(5) Does a Globus-style /etc/grid-mapfile suffice for X.509 authentication or do you need something else?
See the above answer.
