Marty Humphrey

Research

Since August 2008, my research group has been participating in a cancer Biomedical Informatics Grid (caBIG) Architecture WorkSpace Working Group to create a design to support caBIG and caGrid using .NET. More specifically, we have prototyped .NET-based clients and services that show proof-of-concept for future .NET-based caGrid services. This includes support for WSRF on .NET. More importantly, this project produces a comprehensive plan for how to utilize .NET in the future of caGrid. Clearly, a .NET-based implementation of caGrid would greatly enhance the options available to caGrid developers and deployers. We are making our tutorials and source code freely-available and open-source.  We have recently produced the first caGrid Data Service that uses the Microsoft Azure Cloud Computing platform.

You can find more information about this project here: caBIG Architecture Workspace .NET Working Group

Scientists are increasingly required to navigate and integrate large data sets. A single scientist operating alone often cannot find the relevant data he/she needs, sometimes due to the sheer volume of data. The goal of an e-science collaboration is to help one another in their singular or collective discoveries. However, when one person cannot sufficiently explain his/her steps and/or advice, such that others in the collaboration can easily find and understand it, the collaboration can degrade into a collection of researchers acting alone. The challenge of an e-science collaboration is to enhance the productivity of everybody participating.

In late September, 2007, my research group collaborating primarily with Catharine Van Ingen of Microsoft Research and Deb Agarwal of Berkeley National Lab went live with a collaboration space for FluxNet (aka FluxData). We invite you to explore the public area of this collaboration space. We have also recently expanded this functionality to support the USDA’s National Soil Carbon Network.

Cancer Bioinformatics Grid (caBIG)

Collaborative Platforms for eScience

Marty Humphrey receives the 2009 caBIG Innovation Award on behalf of his research group. This award was presented at the 2009 caBIG Annual Meeting. From left: Dr. Ken Buetow (Associate Director of Bioinformatics and Information Technology, NCI), Marty Humphrey, Dr. John E. Niederhuber (Director, NCI)

Understanding hydrologic systems at the scale of large watersheds is of critical importance to society when faced with extreme events such as floods and droughts, or with minimizing human impacts on water quality.  Climate change and increasing population are further complicating watershed-scale prediction by placing additional stress and uncertainty on future hydrologic system conditions.  New data collection and management approaches are allowing models to capture water flow through built and natural environments at an increasing level of detail.  A significant barrier to advancing hydrologic science and water resource management is insufficient computational infrastructure to leverage these existing and future data resources within simulation models.

 

We have recently been awarded a National Science Foundation (NSF) “Computing in the Cloud” grant to advance hydrologic science and water resource management by leveraging cloud computing for modeling large watershed systems. We will use Windows Azure in three ways. First, we will create a cloud-enabled hydrologic model. Second, we will improve the process of hydrologic model parameterization by creating cloud-based data processing workflows. Third, in Windows Azure, we will apply the model and data processing tool to a large watershed in order to address a relevant hydrologic research question related to quantifying impacts of climate change on water resources.

 

This project is collaborative with Prof. Jon Goodall, University of South Carolina.

Using the Cloud to Model and Manage Large Watershed Systems

The use of virtual machines for Escience has been advocated both within the enterprise to replace aging machines and as the underlying technology of cloud computing whereby scientific researchers can “rent” servers on demand. However, both scenarios can lead to inadequate performance. Within the enterprise, with incorrect planning or under unexpected heavy or even moderate load, there might not be enough physical capacity for every virtual machine to achieve reasonable performance. In cloud-computing-based scenarios, the “renters” are largely subject to the informal service promises of the cloud provider based on a granularity that can be too coarse or at the wrong level of abstraction.  This project pursues a novel unified framework to ensure predictable Escience based on these two dominant emerging uses of virtualized resources. The foundation of the approach is to wrap an Escience application in a performance container framework and dynamically regulate the application’s performance through the application of formal feedback control theory. The application’s progress is monitored and ensured such that the job meets its performance goals (e.g., deadline) without requiring exclusive access to physical resources even in the presence of a wide class of unexpected disturbances. This project extends this foundation and early results in three important dimensions: creating support for non-specialists to use the framework; implementing these techniques in Eucalyptus, one of the major open-source cloud computing frameworks; and applying the techniques to “Software-as-a-Service” (SaaS), in which applications in the cloud are regulated to provide predictable performance.

Feedback-Controlled Management of Virtualized Resources for Predictable Escience