(These minutes are being sent asap after the meeting and may be really
rough. Any major problems, please e-mail mailto:email@example.com
wish really accurate recording of their operation status are
invited to send
text to me >before< the meeting in the future.
Computing Division Operations Meeting
Minutes of CD Operations
Meeting - 2004/02/02
ES&H (Amy Pavnica)
755 days. =990,004 hours. Please be careful.
next week will have a million hours.
Walkthrough from Vince (ESE).
ITNA's are looking much better - thanks
to CMS. At 9% left.
morning (1st day in Feb) 43 people expired past due on GERT. A lot
(40) can take it online.
ISM Tripartite underway. Interviews will
continue thru this month.
CDF (Rick Snider)
dcache 32 Tb/day - 21 Mevents - 9.5 to go - hope to finish this week.
Wed. Feb 11th 6 am - 11 am down time. Discussion about handoff between
CDF and Jack
about ok to power down. Atalons run at full power
Intell's run at about half.
CMS (Michael Ernst)
Event production going well. MOre than 85 Mev
What's not going well is mass storage envivronment. Lots of
esp. during the weekend.
Drives getting locked up in
silos. Surprising because db mounted
Very important to fix. ME going around to CMS community to make sure
one writing to pnfs.
(DP) Does someone have this problem in
hand? (ME) No.
D0 (Amber Boehnlein)
Collected 10 M ev . PRocessed 12 M.
Working to set up
grid test bench. Using some old DZero nodes.
consumption is normal.
The production farm:
Total events collected: 10M.
events reconstructed: 12.5M
Reprocessing completed, nice
break not to have to run the farms all out for
Working with Stan
N and Steve T. to set up a grid test bench on fixed
Situation on D0ora2 is being monitored.
Projects Data Analyzed Events Analyzed Data
M 1.4 TB
M 7.2 TB
M 0.01 TB
M 0.2 TB
M 0.4 TB
fnal-cabsrv1 is the new cab station. Last week consumed 22 TB!
Running sPBS or whatever it's called today on Cabsrv1--running smooth,
continued problems with PBS on CAB.
209.7 TB in mezosilo,
222.2 TB in 9940b, 140.4 TB in LTO. More data on
9940bs than 9940a.
EAG (Chris S.)
ended -- being shipped here.
Drill file made.
Working with Enstore
people re direct use.
Review coming up.
Visitors into talk about
Supernovae -- extension of Sloan.
EXP (Liz Buckley-Geer)
Interesting User Unbllocking.
Visitor. Laptop - infected - Beams cleaned up - Unblocked - Came to
WH. Blocked. Why?
Questions and Discussion.
Relationship to telling FCIRT. Basis was
diff between DHCP
CCF (Don Petravick)
- Data Comm
Preparing for move that AB discussion (old fixed target
Upgrades to Portakamps.
CMS - expecting nodes in April time frame
- need details
BEG building 70 offices
Working on ???
room installs at DZero. Trouble getting in. AB will clear
New Muon Lab - Planning CDF materials. Submitted to Rick Columbo.
Acnet installed from XG in FCC - need some more materials.
about numbers of fibers, etc.
Administrative matters - participated in ISM
reviews (with Chuck) last week.
- Networking (Phil DeMar)
Postponed border router main from last week to this - users
Fixed target farms down
DHCP installation - will try again tomorrow (out at
related to getting DHCP onto DZero.
CDF CAF. 10 Gig links. Removed antenuators. Lab 8
problems - new bridge firm ware now stable
Upgr CD LAN from WH to FCC to 1
No apparent "MyDoom" traffic.
LB - WH rollout? PD - probably
will be last to do.
Discussion about "outside the lan" capability here --
Brookhaven) -- Can there be a portion outside the rules ?
- Security (Matt Crawford)
Computer Center Operations (Mike Stolz)
Lots of pages over the weekend.
- Storage (Bonnie Alcorn)
All 4 DLT drives timed out
(STKen) over the weekend. Could not find
any reason. All kinds
of problems on STKen this weekend.
ADIC problem - logged to ADIC to fix.
CSS (Mark Kaletka)
Utilization Meeting - Two vans we have need to average 100
a month. We are at 80 miles a month. Encouraging people to make
use of the Vans if you are a typical user.
SVX parts review
Two infocus projectors. STill in prep but also four dell
that one can check out. Want to make some available at WH.
BSS may open access to Sunflower so may have local reps trained to be
able to change.
d0ora2 problems--hit 12 day mark no problems.
but then last week early
had problems - EMC and Parkplace both in to work on
it. - Three more
disk drives replaced. Now, things running
flawlessly. Continue to
watch. May be out of the woods.
Raid array working better but RMAN
back up taking 12 hours instead of 2
Linux support - officiallyt released version based on RH open
Ready to have FUE source tested. People have been
testing anti-spam and
seem to like it.
ANSI not wanting to run
properly in fnalu. Working on this. Farms
struggling with upgrade.
BTEVsrv 1 and 2 now on 24 by 7
Patty at CERN. Vince just back from vacation.
Problem with CDF Silicon. Added a rate limiter for the L1 trigger
rate. Resulted in occurence of hangs - 5 events that have occurred
hang the whole DAQ. Working the problem.
Customer Support (Steve Wolbers)
Have loaded budget. Will have meeting later this week. Project
meeting this week so can come to see Tevatron BPM.
WPAS - numbers
finished. Mike and Vicky worked on this. Words will be
Record luminosity this morning-- 55 (in the usual units)
Projects (Joy Hathaway)
The Project Status
Meetings scheduled for this week are:
Wednesday, February 4th at 9:00 in
FCC1: Accelerator Support
Thursday, February 5th at 9:00 in
For more information and future meetings see:
(AB) Lots of meetings in the CD. Discussion
over the number of
meetings. Whether or not one can go to all of
(JH) Making efforts to have talks linked into status
meeting page before
it starts. (DP) Important to go through all projects
once a quarter.
Longer list of projects. (DP) Allocate the time by
Roots of WBS's is
Operations (Gerry Bellendir)
Also a DZero Farms outage tomorrow morning 6 am - 11
am. Will be
powered down by 6 am (per Joe Boyd). (Rick Theis)
Help desk not
informed about this. Not finalized until last Friday.
Minutes - Send comments to mailto:firstname.lastname@example.org< mailto:email@example.com>
Submissions by e-mail:
CSS Ops Report 2004-02-02
2. Fleet Usage
Committee, Mike Behnke reported; that the usage
logs for the CD vans
are at approx. 80 miles for the month of January.
The 2 CD vans are
assigned and listed as "Non-Discretion" vehicles,
which must maintain
a minimum usage of 100 miles per month, averaged
over a 3 month
3.1 Run II Support
-SVX spare parts review
is near completion, will compile and forward
our findings to CDF
& DO this week.
-D0 has fabricated (10) new VRB
(VME Read Out Board). ESS will program
FPGA's and performance
-Repaired &/or tested;
(1) Fermi: 1553 Rack Monitor
-Repaired &/or tested;
MVME2400-0363 SBC (Single Board Computer)
(1) ACDC: EL750B Electronic
-(4) New Dell
projectors available for CD short term check out at
the PREP counter.
Proposing Bill Finstrom, Shirley Knauf, Mike Behnke for "DPR" training:
>> (BSS) plan on rolling out a division/section property
>> function across the laboratory. At a
high level, a person in this role
>> will be responsible for custodial
changes, location changes, and property
>> passes for
inventory/trackable and non-inventory assets. Also, we will be
>> creating a role that has the ability to just maintain/create
>> passes in the system.
d0ora2 crashed Tuesday a.m. EMC
installed fbi (addt'l monitoring) on Tuesday p.m. & we have been sending
them logs daily, changed out lcc's, fiber channels, BUT the change that seemed
to make the difference this time was 3 drives on Friday. We have had 0 trepasses
(controller failovers) since then.
We continue with ZERO trespasses
since 11:15 a.m. on Jan 30.
We are still very cautious and Park
Place has sent 2 technicians
that are meeting with Steve now.
have Drive 5-7 go bad at 6:03 p.m. Saturday & Steve
was in Sunday and
replaced disk 5-7 in the array. This has brought
us back to normal by
freeing up the hot spare.
We are investigating slow access to our
backup disk/tape area(s).
Interruption of ~15 minutes to cdf offline db. Fcdfora4 machine. Machine
rebooted by accident. (same rack as d0ora2)
- LTS 3.0.1 (Long Term Support) officially released! This
is our first build
of Linux enterprise.We are now ready to begin FUE
with CMS regarding their backup requirements
- Informed users at pc
manager and unix users meeting about hepa testing.
Already getting positive
- Stress testing boxes. Pumping 9k per hour and they seem to be
D0ORA2 - Rick VanConant was
paged for hw support on D0ORA2. RickT and AdamW
were also paged yesterday.
(from Sue Winter)
Decision One Traditional - PO 542268 assigned but not yet issued.
Decision One Labor - PO 542269 assigned but not yet issued.
Decision One PerEvent - PO 542310 issued on 1/23.
4. Decision One
Upgrade/Install - PO 542303 issued on 1/21.
Dell HW T&M
- $5K. CD93716. New Blanket contract. Department
LSI Logic Storage Systems - CD93516. New support for Dave Fagan.
SGI HW & SW - MMS #169099 assigned to
Clark. No change from last week.
Received quote from Tricia; going
Cisco HW & SW - Have responses from BS and
D. Wohlt. Working on
Agilent SW -
Received vendor quote. D. Tang says OK to renew.
IBM Rational SW - MMS
#169650 assigned to Clark.
Matrix One SW - Waiting for J. Trumbo to return
on 2/2 for OK. Have
Wind River SW- MMS #169652
assigned to Clark.
During the week there were 166 new Remedy tickets created with 37 of
those still open. Overall there are 213 open Remedy tickets.
Remedy automation created 9 tickets during the week.
* The Ansys issue of not being able to run on an SGI using AFS is
being worked on by
Candies, she is the product maintainer of Ansys.
The issue is using multiple cpu's.
While the application runs in AFS
space, it does a file lock which fails and causes
the application to
die. This only happens when using multiple cpus's, it does not fail
is you only request one cpu. This is how we are running at the present time.
* Reinstalled fsun01 and al things work again. It looks like a patch
install failed somewhere
along the way and broke the serial consoles.
* Upgrade os on fnsfo/h,
found a problem with exporting drives and then found a patch
* Working to put together numbers for broken CDF nodes that we
want Atipa to fix in one big chuck.
These are repeat offenders, or
broke for a long time.
* Final cleanup of Angst issues. There is a rumor
afoot to move these 96 nodes into the CAB
instead of Reco. I guess
it's D0's call.
and 2 are now 7x24. Web info is available.
* Tuning Gige intefaces.
Ktev has decided to add four nodes with tape drives
to read tapes to
Enstor. Bonnies group is doing this.