Computer Science Colloquia
Tuesday, May 29, 2012
Advisor: Sudhanva Gurumurthi
Attending Faculty: Kevin Skadron, chair; Paul Reynolds, Marty Humphrey, and Mircea Stan
10:00 AM, Rice Hall, Rm. 242
PhD Proposal Presentation
Impact of Datacenter Infrastructure on Server Availability Characterization, Management and Optimization
As cloud computing, online services and user storage needs grow, enterprises continue building large datacenter facilities to serve end user requirements. These large datacenters facilities are warehouse-scale computers in their own right and the cost efficiency of such datacenters is critical for cloud profit margins. Datacenter infrastructure can be partitioned logically into server, network, power, cooling and management infrastructures. Although the server component of the datacenter is crucial for applications to run, almost one-third of the total cost of ownership in a datacenter is spent towards building and operating the other allied infrastructures. Our work focuses on identifying the interdependency between the different infrastructures and server operation, specifically on the characterization of the impact of infrastructure knobs on server availability. We also optimize the management infrastructure that coordinates all the other infrastructures, by proposing a novel wireless based management solution for datacenters.
As part of our characterization effort, we first focus on the cooling infrastructure and characterize the impact of temperature on hard disk failures with the help of a case study of large scale data center hosting tens of thousands of hard disk drives. We will then characterize the impact of power quality events in three large datacenters hosting hundred-thousands of servers, and identify its impact on datacenter and server availability. We will establish that handling failures in datacenter infrastructure through redundant infrastructure for each infrastructure component will be cost prohibitive for cloud-scale datacenters. We will then evaluate state-of-the-art management infrastructure architectures currently used in large datacenters, and address their limitations including their fixed topology, and increased cost for deployment and maintenance. We will then propose a novel wireless-based management architecture that is cost-efficient and reliable compared to current solutions. We propose to evaluate the research challenges including protocol changes, partitioning, signal interference and packet loss reliability in implementing a wireless management solution for cloud infrastructures. We will support our conclusions with a real prototype in an experimental datacenter facility. Finally, we will present a cost analysis based on datacenter total cost of ownership to highlight the need for this novel management solution.