Computer Science Colloquia
Wednesday, September 5, 2012
Advisor: Andrew Grimshaw
Attending Faculty: Malathi Veeraraghavan(Chair)
Rice Hall, Rm. 242, 9:30 AM
Master's Project Presentation
Efficient Data Management in the Global Federated Filesystem
Scientific collaborations and the increasing computational needs of applications have warranted the development of grids. The ability of grids to provide a larger resource pool than what is available at a single site has helped solve problems previously believed to be intractable. On the other hand, the data-size used by these computations has been increasing, with the amount of data generated each year being greater than the sum over all previous years. Thus, managing such large amounts of data in computational grids becomes important. Researchers are interested in both analyzing this data and harnessing the vast computational resources available to them via the grid. The consequence is a situation where the data and computation are not collocated, warranting the need for efficient data management techniques. In this project, we develop a caching system and a new file staging technique called coordinated staging in the context of the Global Federated File System (GFFS). Coordinated Staging treats the GFFS from a computational grid's perspective, while caching treats the GFFS from both a computational grid and data grid's perspective. Coordinated staging is specifically designed for High Throughput Computing (HTC) applications, a common use-case for grids. Coordinated staging is based on the BitTorrent Protocol model where executions sites not only download the file from its source but also download chunks of the file from other execution sites.