return to main page

introductory note:
When we speak of policies in the context of data management in grids, caution is required. In the literature, many different people use the term "policy" to refer to a number of different things, and many other things that could be considered to be policies are not always referred to as such. Thus, it is important to be clear regarding what is meant when using this term.
many uses:
The authors of 2, 3 , and 4 , refer to policies as goals of the VO, upon which data placement algorithms may be based. The primary policies they refer to in 2 are related to file replication and dissemination. The also consider the relationship between data management and workflow execution and discuss how this may influence policy. They discuss two examples of real systems currently in use in the scientific community. The first system is PheDEx, a data management system for the high-energy physics community used by the Compact Muon Solenoid (CMS) experiment at CERN (see 5 ). This system distributes data throughout the VO after initial publication at CERN based on a tiered dissemination model. The second system is the Lightweight Data Replicator (LDR), used by gravitational wave physicists at Laser Interferometer Gravitational Wave Observatory (LIGO) for data distribution throughout the sites in the VO based on metadata queries by scientists at these sites (see 10 and 11). In these papers, the authors also classify policies into three groups: policies for staging data in to computational nodes during workflow execution, policies for staging data out after workflow execution, and policies for the purpose data reliability and durability in general, not necessarily during workflow execution.

In 15, the authors introduce a system for data management for the CEDPS, called the Managed Object Placement Service (MOPS) that is intended to place data intelligently according to data management policies of the VO or workflow manager. The authors in this paper use policies in much the same way as in 2 and provide similar policy examples that involve data replication and dissemination, for instance, referring to 1 .

In 8, 13 and 14, the authors present a data management system that enforces policies and is known as the integrated Rule-Oriented Data System (iRODS). This system uses a rule engine for policy expression and enforcement. The authors speak of management policies for data replication, pre- and post-processing, metadata extraction and assignment, administration, authorization, auditing, and accounting, as well as policies to enforce integrity, access restrictions, and data placement and presentation. The system iRODS is also referred to by the authors in 12, but the focus is, rather, more on digital curation and preservation, as opposed to the types of policies mentioned in 2. They also discuss other mechanisms for digital curation and preservation.

The authors in 6, 7, 9, 19, 20, and 21 also mention policies in the context of grid computing. However, they focus on resource usage and management policies as opposed to data management policies, so the use of the term "policy" in this context is not related to data management policies. Nevertheless, we mention this work here because resource usage policies do have repercussions for data management, so it is not possible to fully separate the two concerns.

Policies in grid computing are also mentioned in 17 and 18. However, these policies primarily relate to security and authorization, so again, these are not related to data management policies.

Lastly, in 16, the authors describe a system used in the European DataGrid project for replica management. While they do not explicitly use the term policy, the goals of their replica management system are more are less like the replication policies described in 2, 3, and 4.
current context:
In this work, when we refer to policies, we mean data management policies similar to those described in 2, 3, and 4. Primarily, we focus on data dissemination and replication policies. In the future, we will examine more complex policies that may incorporate other goals, such as policies that relate to data placement to improve workflow execution.

Up to now, we have implemented two practical policies that are enforced by our rule engine-based Policy-Driven Data Placement application. The first policy specifies a hierarchical or tier-based pattern for data dissemination to sites in a VO upon initial data publication. This policy is modeled after the PheDEx system used by CMS at CERN, as mentioned above and in 5. The second policy enforces a rule that every data file have at minimum number of copies at various storage elements within the system, subject to certain constraints. For instance, it stipulates that no file should have two copies on the same storage element and that copies should only be transferred to a storage element if it has a certain number of bytes free. This policy was implemented with the goals of availability and reliability in mind.
references:
  1. Bernholdt, D., et al. "The Earth System Grid: Supporting the Next Generation of Climate Modeling Research." Proceedings of the IEEE 93.3 (2005): 485-95.


  2. Chervenak, A., et al. "Data Placement for Scientific Applications in Distributed Environments." Grid Computing, 2007 8th IEEE/ACM International Conference on (2007): 267-74.


  3. Chervenak, A., et al. "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets." Journal of Network and Computer Applications 23.3 (2000): 187-200.


  4. Chervenak, A. L., and R. Schuler. "A Data Placement Service for Petascale Applications." Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing'07 (2007): 63-8.


  5. "The Compact Muon Solenoid, an Experiment for the Large Hadron Collider at CERN." CMS Project. 2005. http://cms.cern.ch/.


  6. Feng, J., et al. "Policy-Directed Data Movement in Grids." Proceedings of 12th International Conference on Parallel and Distributed Systems (ICPADS 2006): 12-5.


  7. Feng, J., G. Wasson, and M. Humphrey. "Resource Usage Policy Expression and Enforcement in Grid Computing." Grid Computing, 2007 8th IEEE/ACM International Conference on (2007): 66-73.


  8. Hedges, M., A. Hasan, and T. Blanke. "Management and Preservation of Research Data with iRODS." Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience (2007): 17-22.


  9. M. Humphrey, D. Agarwal, and C. van Ingen. "Publication and Curation of Large-Scale Shared Scientific Data."


  10. "Lightweight Data Replicator." LIGO Project. 2004. http://www.lsc-group.phys.uwm.edu/LDR/.


  11. "LIGO - Laser Interferometer Gravitational Wave Observatory." LIGO Project. 2004. http://www.ligo.caltech.edu/.


  12. Moore, R., R. Arcot, and R. Marciano. "Implementing Trusted Digital Repositories." Retrieved December 4 (2007): 2007.


  13. Moore, R., et al. Constraint-based knowledge systems for grids, digital libraries, and persistent archives (2006).


  14. Rajasekar, A., et al. "A Prototype Rule-Based Distributed Data Management System." High Performance Distributed Computing workshop on "Next Generation Distributed Data Management", Paris, France, May (2006).


  15. Schopf, J. M., et al. "End-to-End Data Solutions for Distributed Petascale Science.".


  16. Stockinger, H., et al. "Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project." Arxiv preprint cs.DC/0306011 (2003).


  17. Sundaram, B., and B. M. Chapman. "Policy Engine: A Framework for Authorization, Accounting Policy Specification and Evaluation in Grids." 2nd International Conference on Grid Computing, Nov 1.9 (2001).


  18. Sundaram, B., and B. Chapman. "XML-Based Policy Engine Framework for Usage Policy Management in Grids." Proceedings of the Third International Workshop on Grid Computing (Grid 2002).Baltimore, MD, November (2002).


  19. Verma, D., et al. "A Policy Service for GRID Computing." Proceedings of the Third International Workshop on Grid Computing (2002): 243-55.


  20. Wasson, G., and M. Humphrey. "Policy and Enforcement in Virtual Organizations." Grid Computing, 2003.Proceedings.Fourth International Workshop on (2003): 125-32.


  21. Wasson, G., and M. Humphrey. "Toward Explicit Policy Management for Virtual Organizations." Policies for Distributed Systems and Networks, 2003.Proceedings.POLICY 2003.IEEE 4th International Workshop on (2003): 173-82.




current as of September 2008