Fermilab Computing Division

CS Document 5508-v2

Root Cause Analysis for PRB000000001088 (SeaQuest Data Taking Interrupt)

Document #:
CS-doc-5508-v2
Document type:
Technical Note
Submitted by:
Stephan Lammel
Updated by:
Stephan Lammel
Document Created:
23 Jan 2015, 16:48
Contents Revised:
20 Feb 2015, 17:17
Metadata Revised:
20 Feb 2015, 17:17
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Other Versions:
CS-doc-5508-v1
23 Jan 2015, 19:04
CS-doc-5508-v0
23 Jan 2015, 16:56
Abstract:
On Friday November 7th, 2014 node e906-gat3.fnal.gov started to
contact the Fermilab Kerberos key distribution center, KDC, for
unknown client/servers at an excessive rate. Within days the rate
increased to around 400k requests a day. On Wednesday December 3rd
the rate more than doubled and Computer Security issued a network
block for the machine. The data acquisition, DAQ, application on
the machine could no longer access target data information and
data taking of the SeaQuest experiment came to a stop. People on
the experiment traced the stop to the network block and notified
Computer Security. The block was lifted 41 minutes after being
applied and data taking resumed. The excessive KDC contacts were
debugged and resolved in the evening of that day.
While the technical issue causing the excessive KDC contacts is
well understood, the fact that data taking of an active experiment
was interrupted triggered the investigation.
Files in Document:
DocDB Home ]  [ Search ] [ Authors ] [ Events ] [ Topics ]

DocDB Version 8.8.9, contact Document Database Administrators