Computer Science Colloquia
Wednesday, September 5, 2012
Avinash Kalyanaraman
Advisor: Andrew Grimshaw
Attending Faculty: Malathi Veeraraghavan(Chair)
Rice Hall, Rm. 242, 9:30 AM
Master's Project Presentation
Efficient Data Management in the Global Federated Filesystem
ABSTRACT
Scientific collaborations and the increasing computational needs of
applications have warranted the development of grids. The ability of grids
to provide a larger resource pool than what is available at a single site
has helped solve problems previously believed to be intractable. On the
other hand, the data-size used by these computations has been increasing,
with the amount of data generated each year being greater than the sum
over all previous years. Thus, managing such large amounts of data in
computational grids becomes important. Researchers are interested in both
analyzing this data and harnessing the vast computational resources
available to them via the grid. The consequence is a situation where the
data and computation are not collocated, warranting the need for efficient
data management techniques. In this project, we develop a caching system
and a new file staging technique called coordinated staging in the context
of the Global Federated File System (GFFS). Coordinated Staging treats the
GFFS from a computational grid's perspective, while caching treats the
GFFS from both a computational grid and data grid's perspective.
Coordinated staging is specifically designed for High Throughput Computing
(HTC) applications, a common use-case for grids. Coordinated staging is
based on the BitTorrent Protocol model where executions sites not only
download the file from its source but also download chunks of the file
from other execution sites.