The use of computational (HPC) clusters continues to increase among companies, enterprises, universities and research institutions. There are several software products that manage these computational clusters: among them, the Portable Batch System (SGE), the Sun Grid Engine (SGE), Platform's LSF and, recently, Microsoft Windows HPC Server 2008. Each software uses their own specific format for job description, submission and management. Thus, an organization with more than one cluster must often support multiple, idiosyncratic interfaces to largely similar backend capabilities.
The HPC Basic Profile is a Web-services-based specification aimed to create a common interface to computational clusters by focusing on the basic use case of an HPC system. The HPC Basic Profile specification is applicable to every cluster management software since it is based on a basic set of functionalities that every cluster management software provides, despite the differences in interfaces and data formats. This specification is based on the OGSA Basic Execution Service and the Job Submission and Description Language and offers an abstraction of the most basic interactions with a computational cluster: create a job, check the status and attributes of a job, delete the job and check the status of the cluster.
In this talk, I present BES++, which is an open-source project that we have created with Platform Computing (LSF). BES++ implements the HPC Basic Profile and is architected to layer on any existing queuing system. We currently have support for PBS, LSF and SGE clusters. In addition to the HPC Basic Profile features, BES++ has been extended to support several extensions such as File Staging and Advanced Filter. We have also added the capability of job metascheduling and the support for legacy client tools such as PBS's qsub. We present a performance evaluation and an analysis of our implementation.
Other Recent and Upcoming Colloquia |