Fermilab Computing Division

CS Document 2231-v1


Document #:
Document type:
Submitted by:
Selitha Raja
Updated by:
Selitha Raja
Document Created:
21 Jun 2007, 16:04
Contents Revised:
21 Jun 2007, 16:04
Metadata Revised:
21 Jun 2007, 16:04
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

A Distributed Monitoring System (NGOP) that scales for Run II computing has been developed at Fermilab. It provides active monitoring of software and hardware, customizable service-level reporting, early error detection, and problem prevention. NGOP provides persistent storage of collected data and is capable of executing corrective actions and sending notifications. NGOP is a framework for developing Monitoring Agents for monitoring the overall state of computers and software that are running on them. Several Monitoring Agents are available within NGOP that are capable of analyzing log files, and checking existence of system daemons, CPU and memory utilization, availability of web pages, etc. For the time being the NGOP is monitoring about 1500 nodes and 35000 objects. NGOP has proved to be a useful tool: multiple problems such as node resets, offline CPUs, hard drives errors, nfs problems and dead system daemons have been detected. NGOP provided system administrators with information required for better system tuning and configuration. The NGOP architecture and the current state of deployment will be presented.
Files in Document:
Associated with Events:
CHEP2003 held on 24 Mar 2003 in La Jolla, California
DocDB Home ]  [ Search ] [ Authors ] [ Events ] [ Topics ]

DocDB Version 8.8.9, contact Document Database Administrators