NSF Facilities Statement
This page can be cut and pasted into a word document and modified as need be to reflect your own additional lab resources, and attached to proposals. If you have suggested changes, please let us (root) know.
- 1 Facilities and Equipment
- 2 Data Management Plans
Facilities and Equipment
The Computer Science Department, which recently moved into the state-of-the-art Rice Hall for Information Technology Engineering, provides extensive computing and communications resources in support of its activities. We provide a shared compute infrastructure of Linux x64_86 (primarily) and Solaris niagara systems made up of interactive servers and batch scheduled (PBS) compute clusters. The department provides a transparent desktop environment (to match the shared infrastructure) and instructional labs as well. The department employs a systems group to help support this infrastructure. Students and faculty also have access to central university computing resources and network facilities.
The department provides a large shared infrastructure for research and instruction:
The department maintains a number of clusters which are accessible via batch scheduling (PBS) for "big iron" computation jobs. The following table provides a list of nodes by CPU model, memory and core count:
|Intel Xeon E5-2670v3||12||256GB||3||36|
|Intel Xeon E5-2670v3||24||512GB||3||72|
|AMD Opteron 6276||64||256GB||4||256|
|AMD Opteron 6276||32||128GB||5||160|
|Intel Xeon E5345||8||64GB||9||72|
|Intel Xeon E5462||8||32GB||12||96|
|Intel Xeon E5405||8||8GB||18||144|
|AMD Operton 240||2||4GB||62||124|
|AMD Opteron 242||2||4GB||10||20|
Additionally, we have a number of GPUs nodes in our clusters:
Finally, twelve of our cluster nodes have an Infiniband interconnect for jobs which require extremely fast IPC.
All of the cluster resources are accessed via a group of front-end interactive servers for job compilation and debugging; there are six general purpose systems - 8 core x86_64 with 16GB of RAM and three CUDA core front ends.
The department has a special purpose instructional lab with 17 dual-boot Windows/Linux desktops with CUDA-capable GPUs (6 C2070, 11 GTX580), with i5 CPUs and 8GB of RAM. In addition, the department operates Windows 2008 servers to support PC software applications. Desktop facilities and machines available for general student use are a mixture of Intel PCs ranging from the E6550 to i7 Quad-Core with 4-8GB of RAM, running Windows 7 and Ubuntu Linux.
The department provides over 250TB of SAN/NAS RAID storage to users exported via NFSv4 and SMB, along with large local fast disk storage on compute servers.
- 148TB of NAS RAID home directory storage (with backups) - 80TB of NAS RAID scratch storage - >1TB local disk (high speed) on computational nodes
Forty-two powerful servers (Sun SPARC, Niagra and Sunfire, IBM, Dell PowerEdge) are interconnected by a mixture of switched gigabit and 10GigE Ethernet. The departmental backbone switch is a Foundry FastIron Edge 12GCF, which connects the department's internal switch stacks and and the University Backbone at 10GigE.
The Departments of Computer Science and Electrical & Computer Engineering have office space for all faculty and all graduate students. Laboratory space is available for projects requiring special equipment. The computer science department has a central machine room for large clusters and provides two end user machine rooms for individual research groups to house project specific equipment.
Data Management Plans
These need to be reasonably tailored to each individual project, so boilerplate is difficult to provide. However, if the data associated with the project are small enough (less than the standard faculty storage allocation) then the following element are fully addressed using regular departmental access and storage methods. The following plan is developed around the framework suggested are taken from the NSF Engineering Directorate Guidelines.
Primary Data Management
Data may be stored on departmental storage (which provides archival backup), and accessed by researchers (project personne) using access control provided by departmental authentication (logins) and implemented with access controls which reflect the roles of those accessing the data.
The main storage format for data is digital files; these are stored on centralized storage and accessed through a variety of protocols based on individual user and group permissions. Data may be provided publicly (published) via HTTP on the web. Very large file collections (greater than 3TB) will require some special accomodations.
Version control repositories (GIT, SVN) are provided by the department and backed on filesystem storage which is archived as above. Access to repositories can be limited in fine grained ways (read-only, write, append, etc.) using OS or repository specific controls. Anonymous public access to repositories may also be provided (eg, web_dav SVN).
The department maintains a production MySQL server, with backup, in which data may be stored. Working access to the database system is provided via data-base specific users and may also be fine-grained; public access is generally not provided.
This storage is not archival or long term, but aimed at disaster recovery. Any databases which need to be stored long term can be dumped into a (human readable) SQL-statement text file format and maintained on archival storage as regular files are.
From the NSF:
The DMP should describe the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project. It should then describe the expected types of data to be retained.
The data may be maintained in one of the the types (or for meta-data, can be put down in files) above. It is not necessary to specify each type and format of data in advance.
The standard for retention is at least after conclusion of the award or three years after public release, whichever is later. Data stored on the departmental systems will be retained for as long as the associated researcher retains an affiliation with the department and their account is active. In the general case, maintaining project files on the departmental systems for three years is adequate retention.
Longer-term archival storage can be accomplished by making archives (eg, DVD-ROM, USB storage) of the file storage.
In the special case of hardware-specific results; the department has some very limited long-term physical storage available for retaining specific equipment needed to replicate results. Plans for physical "collections" should be clearly stated.
Data formats & dissemination
NSF guidelines suggest that these formats should be fully documented; some successful sample plans suggest these do not need to be fully specified in advance. Some effort to make data available in non-proprietary formats (not as likely to become quickly obsolete and inaccessible) in addition to common commercial formats for public dissemination are encouraged. Dissemination can be generally accomplished via web publication or some form of "sneaker-net" for very large data sets.
Data storage and preservation of access
See files, repositories and databases above.
The primary departmental digital storage is maintained on large RAID arrays of disk; which are in turn replicated at an offsite location for geographic redundancy. Limited storage for physical "collections" (assets required to replicate results) is available in Rice Hall.
Sample Plan Text
This is the text of a data management plan from a successful NSF proposal; it is relatively vague but will give an idea of how much detail is needed at a minimum. The general department infrastructure is a drop-in replacement functionally for "the server" in this proposal.
Data Management Plan
This proposal, being interdisciplinary in nature necessarily will involve substantial data gathering, analysis, and sharing. To handle this, the following data management plan will be implemented:
Primary data management
A centralized server is being maintained, with the goal of providing archival media for all data generated and collected. All researchers will be given authenticated access to the server. Furthermore, for every paper or publication that results from work within the center, underlying data will be further required to be placed on the server in conjunction with the manuscript and final PDF of the work in question. Through this mechanism, therefore, it will be possible to ensure that all primary data is placed on the main center server. Compliance will be managed by the PI and a designated graduate student.
It is expected that primary data from all research activities will be included in the plan,including material characterization results, device characterization results and process flows, system design documents (circuit simulations, layouts, etc.), and system characterization results. Similarly, for education and outreach activities, documents, presentations, etc., will be archived in an identical manner. Results will generally be held exclusively in electronic form, since the systems herein are all characterized in such a manner. However, within the investigators' facilities, room will be allocated to store select physical collections (samples, etc.) for demonstration and archival reasons.
Data will generally be stored in multiple formats based on the software preferences of the data generator. It will be required that all public data be made available in commercial software formats (e.g., MS Excel, JMP, etc.).
Period of data retention
Given the low cost of data storage, the project will store and backup all data for three years past the entire duration of the award. After the project terminates, backups will be retained, since the server will be maintained and upgraded regularly.
Data Access and Sharing
In general, data will be made available to all researchers on the project. Usage with be monitored and controlled through authenticated server access.