The Metrics Correlation and Analysis Service
Simple document list
(2 extra documents)
- Computing Techniques Seminars
- Architecting a Symbiotic Virtual Machine Monitor for Scalable High Performance Computing
- The Cloud Challenge
- The Metrics Correlation and Analysis Service
- Federated Identity and Authentication
- An overview of the CAPTAN network based pixel telescope readout architecture and data processing software
|Full Title:||The Metrics Correlation and Analysis Service|
|Date & Time:||18 Mar 2010 at 14:00|
|Event Location:||FCC1 Conference Room|
|Event Info:||Speaker: Andrew Baranovski, MCAS Project Manager, Fermilab Computing Division, Grid Dept.
The Metrics Correlation and Analysis Service (MCAS) project was started at the end of 2008 by the Fermilab Computing Division in response to three main considerations from its stakeholders:
1) Summarization of metrics that can be viewed at-a-glance would improve status checks of monitored services. This entails the development of efficient mechanisms to reduce large datasets to snapshots of the most relevant information.
2) When communities need to add monitoring for a component of an infrastructure, the effort is typically started from scratch, developing an end-to-end solution from basic tools (shell, python, RRDtools, R, ...). This entails duplication of effort to develop infrastructure over and over again to address common problems (data gathering, warehousing, analysis, and presentation).
3) Grid monitoring and troubleshooting is considered generally lacking in Grid systems MCAS was set up to address these issues by providing a common infrastructure and tailored to common use cases plug-ins for the gathering, organization, storage, analysis, and display of data. The system supports a series of understood use cases, including single page view of images from multiple sources, display of data in table format, bar graph representation of alarming-related metrics, and time series plots. It also supports administrator-friendly interfaces to manage data sources, operational tools for database backup / reduction / restoration, and easy infrastructure (re-)deployment. The current prototype offers its services for various communities, including CMS T1, DZero, and Minos. In this talk, we will discuss the MCAS system, the status of the project, and present options for future development and operations. Stakeholders are invited to give their feedback on their needs for the future.