Fermilab Computing Division

CS Document 5629-v1

Preparing HEP reconstruction and analysis software for exascale-era computing

Document #:
CS-doc-5629-v1
Document type:
Technical Note
Submitted by:
Marc Paterno
Updated by:
Marc Paterno
Document Created:
19 Oct 2015, 10:17
Contents Revised:
19 Oct 2015, 10:17
Metadata Revised:
19 Oct 2015, 10:17
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Abstract:
All HEP experiments rely on event processing software systems to manage
algorithms and data from detectors and simulations. The current
generation is designed to run well on commodity compute clusters such as
FermiGRID. The Office of Science is investing heavily in very
large-scale computing centers and has been pushing hard for all branches
of science to make use of these systems in the future. They are asking
us to consider how software must change, assuming that CPU cycles are free and
file I/O is costly. The future will be dominated by cores with little
high-speed memory and with several tiers of slower memory, by
ultra-high-speed networking, and by specialized resources for file I/O.
Our software systems are not ready for this shift. The purpose of this
research project is to produce a prototype software system for HEP event
processing that demonstrates an architecture and a design that will be
capable of scaling to greater than 100K cores and efficiently moving event data
through these new high-performance computing platforms in forms
necessary for algorithmic work. Fermilab's HEP framework developers do not have a record of
research in this DOE advanced computing community. Success in this
project will give us the foundation we need to participate in future DOE
grant calls.

While porting exercises are underway for performing specialized subsets
of HEP processing on current systems, no overall re-architecting has
been started to address the imminent larger structural changes. We
propose moving to a distributed memory, multi-process design to gain
back memory space across the application, and to eliminate all I/O to physical
storage, except at specialized aggregation points. We will leverage
state-of-the-art R\&D efforts focused on exascale computing, such as
Legion\cite{legion} , Charm++\cite{charm} , or MPI for
distributed programming technology, and HDF5\cite{hdf} for parallel storage.

A successful demonstration will enable a path towards using the enormous
compute power of the DOE-funded exascale facilities. Access to these
facilities will be predicated on their efficient use and our current
software system do not fit their constraints. The neutrino and muon
programs could directly benefit from these available cycles, seeing
significant reductions in turn-around time for large-scale processing
tasks.

Files in Document:
Authors:
DocDB Home ]  [ Search ] [ Authors ] [ Events ] [ Topics ]

DocDB Version 8.8.9, contact Document Database Administrators