Fermilab Computing Division

CS Document 1438-v1

Filecules in High-Energy Physics: Characteristics and Impact on Resource Management

Document #:
Document type:
Submitted by:
Gabriele Garzoglio
Updated by:
Gabriele Garzoglio
Document Created:
24 Apr 2006, 11:00
Contents Revised:
24 Apr 2006, 11:00
Metadata Revised:
19 Oct 2006, 17:28
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Grid computing has reached the stage where deployments are mature and many collaborations run in production mode. Mature Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models, which in turn leads to the reevaluation of traditional resource management techniques.
This paper analyzes usage patterns in a typical Grid community, a large-scale data-intensive scientific collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. Our observations led us to propose a new abstraction for resource management in scientific data analysis applications: we define a filecule as a group of files that is always used together. We show that filecules exist and present their characteristics. The existence of filecules suggests a new granularity for data management, which, if incorporated in design, can significantly outperform the traditional solutions for data caching, replication and placement based on single-file granularity. We reason about the impact of filecules on resource management and show compelling evidence for using this abstraction when designing data management services.
Files in Document:
DocDB Home ]  [ Search ] [ Authors ] [ Events ] [ Topics ]

DocDB Version 8.8.10, contact Document Database Administrators