Run II Computing: Draft of September 2004 Review Charge

 

Context

 

Run II Computing is now fully operational and seems to be working quite well. Each experiment’s software is now fairly mature as are the tools they rely on such as C++ libraries, GEANT3, ROOT, frameworks, Event Data Model and I/O packages and software development and distribution environments.   There are a growing number of active sites outside of Fermilab with computing resources available for one or more of MC production, processing of data, or analysis of data.  The “service level” to the experiment of such off-site resources may vary greatly and range from a highly available professionally managed computing center, to a well run when needed (e.g. for reprocessing) physics department cluster, to a site able to support only their local community, and even to a site whose net contribution might be negative because of the large support burden they place on others.

 

Over the course of the past year, the luminosity of the Tevatron has grown such that in one recent store it exceeded 1032 cm-2sec-1. We might expect that the mean number of interactions per crossing and hence the occupancies might lead to stresses in the reconstruction and other effects.

 

Computing and Funding Model

 

The computing and funding models for Run II computing no longer assure adequate computing resources based at Fermilab for all processing, reprocessing of data and the generous level of computing sought for analysis.  This is due to expanded experiment  requirements in one or more of the categories of data rate/volume, amount of reprocessing, cpu usage for reconstruction programs, cpu usage for analysis programs or I/O needs for analysis.  In particular the ability to write data at an increased rate has created windows of opportunity to enhance the physics output of the experiments. In each case, the Physics Advisory Committee encouraged the eperiments to seek increased resources beyond those available from Fermilab to analysis these extra data.

 

Challenges

 

Although Run II computing and software activities are in an “Operations” phase there is clearly still much work to do and a number of challenges are present: -

 

  1. Scalability of the software with respect to incident luminosity.
  2. Scalability and performance of computing and data handling systems to meet the demands that more data and greater numbers of active users, at Fermilab and worldwide, will place on these systems.
  3. Scalability and reliability of systems to support the potentially large demands for data movement into the Fermilab site and out of the Fermilab site.
  4. The need to adapt to a computing model that relies on common shared Grid facilities at Fermilab and at many off-site locations.  This brings with it new requirements for portability of the entire computing environment in which work is done in order to
    1. Run on several versions of Redhat or even potentially other OSs and move away from the Kai compiler and associated debugging tools.
    2. Run on several underlying batch systems.
    3. Migrate to new compilers and debugging tools.
    4. Ensure data handling, job handling and storage services are portable and are available at all sites.
    5. Deal with a complex computer security environment such as restrictive firewalls, PKI certificate and Kerberos credentials. 
    6. Manage and account for how resources are used (e.g. Physics Group work, individual investigator, Official MC, private MC, etc.)
    7. Completely phase out reliance on legacy SGI systems
  5. The need to manage all of the available resources, both on-site and off-site, in a way that assures physics output is maximized.  The reliance on off-site computing resources is necessary, but also involves risks if pledges for resources do not materialize or if requirements and expectations expand further.

 

Preparation for Review and Charge

 

The Experiments and the Computing Division have been asked to organize a series of presentations that will assist the review committee to respond to the the following charge.

 

Consider and comment on:

 

  1. The status of each experiment in meeting the above challenges
  2. The status of the Computing Division planning, support, and infrastructure in helping to meet the challenges
  3. The adequacy of the anticipated funded resources  to meet these challenges and the adequacy of the new computing model on which this is based
  4. The status of the planning process for ongoing resources from Fermilab and experiment institutions for Run II computing and software infrastructure support, leading to MOUs
  5. Are there likely to be major paradigm shifts in any area which could lead to significant modifications to the computing approach during the rest of the lives of the experiments (data taking to mid 2009, analysis for some time thereafter.)

 

The committee is asked to present its findings, comments, and, where necessary, its recommendations, in order to help both the experiments and the Laboratory to meet the challenges above and to note any other challenges or concerns that they uncover in the course of the review.