An Analysis of Bulk Data Movement Patterns in Large-scale Scientific Collaborations

Wenji Wu
Wenji Wu
07 Jan 2012, 07:44
07 Jan 2012, 07:44
07 Jan 2012, 07:44
Large-scale research efforts such as LHC experiments, ITER, and climate modelling are built upon large, globally distributed collaborations. For reasons of scalability and agility and to make effective use of existing computing resources, data processing and analysis for these projects is based on distributed computing models. Such projects thus depend on predictable and efficient bulk data movement between collaboration sites. However, the available computing and networking resources to different collaboration sites vary greatly. Large collaboration sites (such as Fermilab, CERN) have created data centres comprising hundreds, and even thousands, of computation nodes to develop massively scaled, highly distributed cluster-computing platforms. These sites are usually well connected to outside worlds with high-speed networks with bandwidth greater than 10Gbps. On the other hand, some small collaboration sites have limited computing resources or poor networking connectivity. Therefore, the bulk data movements across collaboration sites vary greatly. Fermilab is the US-CMS Tier-1 Centre and the main data centre for a few other large-scale research collaborations. Scientific traffic (e.g., CMS) dominates the traffic volumes in both inbound and outbound directions of Fermilab off-site traffic. Fermilab has deployed a Flow-based network traffic collection and analysis system to monitor and analyze the status and patterns of bulk data movement between the Laboratory and its collaboration sites. In this paper, we discuss the current status and patterns of bulk data movement between Fermilab and its collaboration sites.
Journal of Physics: Conference Series


