CS Document 5706-v1
- Document #:
- Document type:
- Submitted by:
- Gabriele Garzoglio
- Updated by:
- Marcia A Teckenbrock
- Document Created:
- 31 Mar 2016, 11:18
- Contents Revised:
- 31 Mar 2016, 11:18
- Metadata Revised:
- 14 Mar 2019, 14:55
- Throughout any given year, the need of the High-Energy Physics community to consume computing resources follows cycles of peaks and valleys driven by holiday schedules, conference dates and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed.
To address this issue, the Fermilab Scientific Computing Division launched the HEP Cloud project in June 2015. Its goal is to develop a virtual facility that provides a common interface to access a variety of physical computing resources including local clusters, grids, high performance computers and community and commercial clouds.
The HEP Cloud project team successfully demonstrated the elastic provisioning model offered by the cloud in January and February, when team members, using Amazon Web Services, boasted the ability to add 58,000 cores to the pool of computing resources of the Compact Muon Solenoid (CMS) experiment ― an impressive 25 percent increase. Cost was contained using the Spot Instances Market and the HEP Clouds decision engine, the brain of the facility, which oversees market price fluctuation, looks at all available resources across the cloud and ensures that resource provisioning is optimal.
Fermilab seeks 3 student interns for the summer 2016 to contribute to the HEP Cloud project. Areas of focus include the following:
1) Integrate High Performance Computers (HPC) with the HEPCloud Facility. Learn how to execute a simple scientific workflow on a remote HPC site and then work with HEPCloud staff to submit and execute it through the HEPCloud infrastructure.
2) Work on functional monitoring of the HEPCloud on-demand-services, both in the AWS cloud and at FNAL. Write monitoring scripts (check_mk) to check the health of the service and integrate the results with the Fermilab site monitoring service. Investigate the forwarding of services logs, such as squid or glideinWMS-pilot services, to a central on-cloud logging server and, from there, to Fermilab.
3) Participate in the integration of bulk provisioning systems, such as the condor_annex and AWS Spot Fleet, with the HEPCloud Decision Engine. Add instrumentation to existing monitoring to record the number of instances running and their price. Investigate bulk submission of virtual machines via HTCondor to the Google Compute Engine.
DocDB Version 8.8.9, contact
Document Database Administrators