NSF Facilities Statement

From CS Support Wiki
Revision as of 18:58, 31 May 2016 by Jpr9c (Talk | contribs) (Cluster Resources:)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page can be cut and pasted into a word document and modified as need be to reflect your own additional lab resources, and attached to proposals. If you have suggested changes, please let us (root) know.

Facilities and Equipment

The Computer Science Department, which recently moved into the state-of-the-art Rice Hall for Information Technology Engineering, provides extensive computing and communications resources in support of its activities. We provide a shared compute infrastructure of Linux x64_86 (primarily) and Solaris niagara systems made up of interactive servers and batch scheduled (PBS) compute clusters. The department provides a transparent desktop environment (to match the shared infrastructure) and instructional labs as well. The department employs a systems group to help support this infrastructure. Students and faculty also have access to central university computing resources and network facilities.

Computation Resources:

The department provides a large shared infrastructure for research and instruction:

Cluster Resources:

The department maintains a number of clusters which are accessible via batch scheduling (PBS) for "big iron" computation jobs. The following table provides a list of nodes by CPU model, memory and core count:

CPU cores RAM nodes Total Cores
Intel Xeon E5-2670v3 12 256GB 3 36
Intel Xeon E5-2670v3 24 512GB 3 72
AMD Opteron 6276 64 256GB 4 256
AMD Opteron 6276 32 128GB 5 160
Intel Xeon E5345 8 64GB 9 72
Intel Xeon E5462 8 32GB 12 96
Intel Xeon E5405 8 8GB 18 144
AMD Operton 240 2 4GB 62 124
AMD Opteron 242 2 4GB 10 20
Sparc Niagara 32 48GB 3 96

Additionally, we have a number of GPUs nodes in our clusters:

GPU RAM Available
Nvidia K40 12GB 3
Nvidia K20 6GB 5
Nvidia C2070 6GB 1
Nvidia C2050 6GB 4

Finally, twelve of our cluster nodes have an Infiniband interconnect for jobs which require extremely fast IPC.

All of the cluster resources are accessed via a group of front-end interactive servers for job compilation and debugging; there are six general purpose systems - 8 core x86_64 with 16GB of RAM and three CUDA core front ends.

Desktop Resources:

The department has a special purpose instructional lab with 17 dual-boot Windows/Linux desktops with CUDA-capable GPUs (6 C2070, 11 GTX580), with i5 CPUs and 8GB of RAM. In addition, the department operates Windows 2008 servers to support PC software applications. Desktop facilities and machines available for general student use are a mixture of Intel PCs ranging from the E6550 to i7 Quad-Core with 4-8GB of RAM, running Windows 7 and Ubuntu Linux.

Storage:

The department provides over 250TB of SAN/NAS RAID storage to users exported via NFSv4 and SMB, along with large local fast disk storage on compute servers.

- 148TB of NAS RAID home directory storage (with backups) - 80TB of NAS RAID scratch storage - >1TB local disk (high speed) on computational nodes

Network:

Forty-two powerful servers (Sun SPARC, Niagra and Sunfire, IBM, Dell PowerEdge) are interconnected by a mixture of switched gigabit and 10GigE Ethernet. The departmental backbone switch is a Foundry FastIron Edge 12GCF, which connects the department's internal switch stacks and and the University Backbone at 10GigE.

Physical facilities:

The Departments of Computer Science and Electrical & Computer Engineering have office space for all faculty and all graduate students. Laboratory space is available for projects requiring special equipment. The computer science department has a central machine room for large clusters and provides two end user machine rooms for individual research groups to house project specific equipment.

Data Management Plans

These need to be reasonably tailored to each individual project, so boilerplate is difficult to provide. However, if the data associated with the project are small enough (less than the standard faculty storage allocation) then the following element are fully addressed using regular departmental access and storage methods. The following plan is developed around the framework suggested are taken from the NSF Engineering Directorate Guidelines.

Primary Data Management

Data may be stored on departmental storage (which provides archival backup), and accessed by researchers (project personne) using access control provided by departmental authentication (logins) and implemented with access controls which reflect the roles of those accessing the data.

files

The main storage format for data is digital files; these are stored on centralized storage and accessed through a variety of protocols based on individual user and group permissions. Data may be provided publicly (published) via HTTP on the web. Very large file collections (greater than 3TB) will require some special accomodations.

repositories

Version control repositories (GIT, SVN) are provided by the department and backed on filesystem storage which is archived as above. Access to repositories can be limited in fine grained ways (read-only, write, append, etc.) using OS or repository specific controls. Anonymous public access to repositories may also be provided (eg, web_dav SVN).

databases

The department maintains a production MySQL server, with backup, in which data may be stored. Working access to the database system is provided via data-base specific users and may also be fine-grained; public access is generally not provided.

This storage is not archival or long term, but aimed at disaster recovery. Any databases which need to be stored long term can be dumped into a (human readable) SQL-statement text file format and maintained on archival storage as regular files are.

Expected Data

From the NSF:

The DMP should describe the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project. It should then describe the expected types of data to be retained.

The data may be maintained in one of the the types (or for meta-data, can be put down in files) above. It is not necessary to specify each type and format of data in advance.

Data Retention

The standard for retention is at least after conclusion of the award or three years after public release, whichever is later. Data stored on the departmental systems will be retained for as long as the associated researcher retains an affiliation with the department and their account is active. In the general case, maintaining project files on the departmental systems for three years is adequate retention.

Longer-term archival storage can be accomplished by making archives (eg, DVD-ROM, USB storage) of the file storage.

In the special case of hardware-specific results; the department has some very limited long-term physical storage available for retaining specific equipment needed to replicate results. Plans for physical "collections" should be clearly stated.

Data formats & dissemination

NSF guidelines suggest that these formats should be fully documented; some successful sample plans suggest these do not need to be fully specified in advance. Some effort to make data available in non-proprietary formats (not as likely to become quickly obsolete and inaccessible) in addition to common commercial formats for public dissemination are encouraged. Dissemination can be generally accomplished via web publication or some form of "sneaker-net" for very large data sets.

Data storage and preservation of access

See files, repositories and databases above.

The primary departmental digital storage is maintained on large RAID arrays of disk; which are in turn replicated at an offsite location for geographic redundancy. Limited storage for physical "collections" (assets required to replicate results) is available in Rice Hall.


Sample Plan Text

This is the text of a data management plan from a successful NSF proposal; it is relatively vague but will give an idea of how much detail is needed at a minimum. The general department infrastructure is a drop-in replacement functionally for "the server" in this proposal.

Data Management Plan

This proposal, being interdisciplinary in nature necessarily will involve substantial data gathering, analysis, and sharing. To handle this, the following data management plan will be implemented:

Primary data management

A centralized server is being maintained, with the goal of providing archival media for all data generated and collected. All researchers will be given authenticated access to the server. Furthermore, for every paper or publication that results from work within the center, underlying data will be further required to be placed on the server in conjunction with the manuscript and final PDF of the work in question. Through this mechanism, therefore, it will be possible to ensure that all primary data is placed on the main center server. Compliance will be managed by the PI and a designated graduate student.

Expected data

It is expected that primary data from all research activities will be included in the plan,including material characterization results, device characterization results and process flows, system design documents (circuit simulations, layouts, etc.), and system characterization results. Similarly, for education and outreach activities, documents, presentations, etc., will be archived in an identical manner. Results will generally be held exclusively in electronic form, since the systems herein are all characterized in such a manner. However, within the investigators' facilities, room will be allocated to store select physical collections (samples, etc.) for demonstration and archival reasons.

Data formats

Data will generally be stored in multiple formats based on the software preferences of the data generator. It will be required that all public data be made available in commercial software formats (e.g., MS Excel, JMP, etc.).

Period of data retention

Given the low cost of data storage, the project will store and backup all data for three years past the entire duration of the award. After the project terminates, backups will be retained, since the server will be maintained and upgraded regularly.

Data Access and Sharing

In general, data will be made available to all researchers on the project. Usage with be monitored and controlled through authenticated server access.