Visions

architecting future warehouse scale computers

The class of datacenters coined as “warehouse scale computers” (WSCs) house large-scale data intensive webservices such as websearch, maps, social networking, docs, video sharing, etc. Companies like Google, Microsoft, Yahoo, and Amazon spend ten to hundreds of millions to construct and operate WSCs to provide these services. Maximizing the efficiency of this class of computing reduces this cost and has energy implications for a greener planet. However, WSC design and architecture remains in its relative infancy.

Research insight: WSCs are built using commodity processor architectures (Intel/AMD), and software components (Linux, GCC, JVM, etc) that has been engineered and optimized for traditional computing environments and workloads, such as those you’d find in the desktop / laptop / HPC environment. There are many characteristics, assumptions, and requirements present in a WSC computing environment that impacts many design decisions.

The goals: Rethink how WSCs are designed and architected in both the underlying hardware and system software platform. Identify sources of inefficiency and develop solutions to improve WSCs. Imagine a line graph where the y-axis is the size of the WSC required to do some fixed amount of work, and the x-axis is time as research progresses. My vision is to have a line that is monotonically decreasing with a steep slope. (over time, an increasingly smaller WSC is needed for some fixed work)

Impact: Today advances in computing is moving in two direction into to the mobile space, and the cloud. The impact of this vision is to reduce the environmental footprint and cost of providing the platform that is the cloud.

My related publications:

  1. “Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations” in MICRO 2011
  2. “Heterogeneity in “Homogeneous” Warehouse-Scale Computers: A Performance Opportunity” in CAL 2011
  3. “The Impact of Memory Subsystem Resource Sharing on Datacenter Applications” at ISCA 2011
  4. “Contention Aware Execution: Online Contention Detection and Response” at CGO 2010

addressing contention on multicore processors

Multicore processors allow multiple streams of execution to occur in parallel. However, on-chip caches, the bus, memory controller, and other components of the memory subsystem are shared among processing cores. Contention for these resources occurs when the working set of co-running processes or threads exceed the size of the private caches and relies on the shared memory subsystem. This contention can result in a significant amount of performance interference across cores, counteracting the ability to achieve the parallelism promised by multicore processors. When an application suffers a performance degradation due to contention for shared resources with an application on a separate processing core, we call this cross-core performance interference.

Research insight: Contention for shared resources is severely limiting our ability to realize the promised parallelism of multicore architectures, we must explore addressing this challenge by creating new system software solutions that facilitate the interaction between workloads and the underlying microarchitecture.

The goals: of this research direction is to fully understand the nature of contention as it occurs in real commodity multicore processors, and to exploit this understanding to devise innovative approaches and mechanisms for mitigating the negative impact of contention in current commodity multicore processors as this negative impact has had, and is having, negative implications in system performance, utilization, throughput, quality of service, etc.

Impact: Making significant progress in this direction will enable the continuation of Moore’s law via allocating transistors to implement higher degrees of parallel processing. (At the very least for multi-programmed workloads)

My related publications:

  1. “The Impact of Memory Subsystem Resource Sharing on Datacenter Applications” at ISCA 2011
  2. “Directly Characterizing Cross Core Interference Through Contention Synthesis” at HiPEAC 2011
  3. “Contention Aware Execution: Online Contention Detection and Response” at CGO 2010
  4. “Synthesizing Contention” at WBIA 2009 (workshop @ MICRO 2009)

practical online application restructuring

The online restructuring of native applications enables an entire class of application optimizations that can only be performed dynamically as they require information that is only available at runtime. However restructuring native application code layout dynamically demands a high level of complexity, while traditionally, the benefit for this cost has not motivated its practical and commercial adoption. The fact that applying such optimizations with natively compiled binary applications has proven to be so difficult can be attributed to two factors: a lack of source level information with binary to binary restructuring, and added complexity and overhead for achieving the online monitoring and code rewriting.

Research insight: The ‘rewriting’ doesn’t have to occur dynamically to enable dynamic restructuring. Why not identify the dynamic situations you wish to accommodate and statical specialize a number of instances (or versions) of targeted application code for a number of these dynamic situations.

The goals: We must understand the implications of specializing at varying granularities to understand their trade-offs. We must also demonstrate the unique capabilities of this “scenario based optimization” (SBO) and its practical applications.

Impact: I expect SBO style optimizations to one day be commonly applied optimization techniques. The only challenge currently is the lack of a standardized semantic for performance monitoring hardware and its ABI (application binary interface) and how it is to be interfaced by each layer of the software stack. As our community further establishes these semantics SBO can be easily adopted in numerous application domains.

My related publications:

  1. “Loaf: A Framework and Infrastructure for Creating Online Adaptive Solutions” at Exadapt 2011 (co-located with PLDI 2011)
  2. “Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations” at CGO 2009
  3. “Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems” at CGO 2007

general online optimization analyses for a new microarchtiectural environment

Online and dynamic optimization for native binary applications is not yet popular commercially, however  is proving to be one of the most promising research directions for continuing to realize performance optimization opportunities at the binary level. Online optimizers use information that is only available dynamically to predict future application behavior and microarchitectural events and exploit these predictions to allow the executing application to adapt to its execution environment, or allow the environment to adapt to the application. However, the required online analyses cause overhead to the application, and traditionally, in many cases the overhead outweighs the benefits of the optimizations themselves. As a result, effectively achieving online optimization has proved quite challenging, especially at the binary level, since traditional dynamic binary optimizers often limit themselves to perform only the least costly lightweight online analyses.

Research insight: A new microarchitecural environment is emerging, and traditional online binary optimization techniques and analyses may no longer be relevant. In this new environment we have sophisticated multicore, and lightweight hardware performance monitoring capabilities. By leveraging these capabilities intelligently, the cost of online monitoring and analysis begins to melt away.

The goals: Demonstrate the opportunity of unobtrusively perform sophisticated and much more beneficial online analyses for dynamic optimizations.

Impact: Successfully showing that reduction in complexity of harnessing binary level online optimization and demonstrating the benefits of leveraging the latest microarchitectural advances to re-approach online optimization problems should lead to evidence of the commercial and practical viability of online optimization for native application binaries.

My related publications:

  1. “Exploiting Hardware Advances for Software Testing and Debugging” at ICSE 2011 (NIER)
  2. “A Reactive Unobtrusive Prefetcher for Multicore and Manycore Architectures” at SHCMP 2008 (workshop @ ISCA 2008)
  3. “MATS: Multicore Adaptive Trace Selection” at STMCS 2008 (workshop @ CGO 2008)