(These minutes are being sent asap after the meeting and may be really
rough.  Any major problems, please e-mail mailto:ritchie@fnal.gov

Persons who wish really accurate recording of their operation status are
invited to send text to me >before< the meeting in the future.

--David Ritchie)

Computing Division Operations Meeting
Minutes of CD Operations Meeting - 2004/02/02

ES&H (Amy Pavnica)

755 days.  =990,004 hours.  Please be careful.
This time next week will have a million hours.

Walkthrough from Vince (ESE).  ITNA's are looking much better - thanks
to CMS.  At 9% left.
This morning (1st day in Feb) 43 people expired past due on GERT.  A lot
(40) can take it online.
ISM Tripartite underway. Interviews will continue thru this month.

CDF (Rick Snider)

dcache 32 Tb/day - 21 Mevents - 9.5 to go - hope to finish this week.
Wed. Feb 11th 6 am - 11 am down time.  Discussion about handoff between
CDF and Jack
about ok to power down.  Atalons run at full power despite load. 
Intell's run at about half.

CMS (Michael Ernst)

Event production going well.  MOre than 85 Mev collected.
What's not going well is mass storage envivronment.  Lots of problems
esp. during the weekend.
Drives getting locked up in silos.  Surprising because db mounted
read-only.  Discussion.
Very important to fix.  ME going around to CMS community to make sure no
one writing to pnfs.
(DP) Does someone have this problem in hand?  (ME) No.

D0 (Amber Boehnlein)

Collected 10  M ev . PRocessed 12 M.
Working  to set up grid test bench.  Using some old  DZero nodes.

tape consumption is normal.


The production farm:
   Total events collected:  10M.
   Total events reconstructed:   12.5M

Reprocessing completed, nice break not to have to run the farms all out for
Working with Stan N and Steve T. to set up a grid test bench on fixed

Situation on D0ora2 is being monitored.

Analysis Stations:
             Projects   Data Analyzed   Events Analyzed   Data transferred
D0mino         414           9.7 TB          324 M           1.4 TB
CAB            583          26.7 TB          431 M           7.2 TB
fnal.-cabsrv1  333           8.5 TB          317 M           0.01 TB
Clued0         220           0.3 TB           13 M           0.2 TB
D0Karlsruhe     29           1.6 TB           73 M           0.4 TB

fnal-cabsrv1 is the new cab station.  Last week consumed 22 TB!
Running sPBS or whatever it's called today on Cabsrv1--running smooth,
continued problems with PBS on CAB.

209.7 TB in mezosilo, 222.2 TB in 9940b, 140.4 TB in LTO.  More data on
9940bs than 9940a.

EAG (Chris S.)
Dark Run ended -- being shipped here.
Drill file made.
Working with Enstore people re direct use.
Review coming up.
Visitors into talk about Supernovae -- extension of Sloan.

EXP (Liz Buckley-Geer)

Interesting User  Unbllocking.
Visitor.  Laptop - infected - Beams cleaned up - Unblocked - Came to
WH.  Blocked.  Why?
Questions and Discussion.  Relationship to telling FCIRT.    Basis was
diff between DHCP and Static.

CCF (Don Petravick)

- Data Comm (Ron Cudzewicz)
Preparing for move that AB discussion (old fixed target nodes)
Upgrades to Portakamps.
CMS - expecting nodes in April time frame - need details

BEG building 70 offices
Working on ???
Conference room installs at DZero.  Trouble getting in.  AB will clear
the way.
New Muon Lab - Planning CDF materials.  Submitted to Rick Columbo.
Acnet installed from XG in FCC - need some more materials.
Discussion about numbers of fibers, etc.
Administrative matters - participated in ISM reviews (with Chuck) last week.

- Networking (Phil DeMar)

Postponed  border router main  from last week to this - users should
not  notice
DZero  mainte
Fixed target farms down on  Thursday
DHCP installation - will try again tomorrow (out at DZero).  Struggle
related to getting DHCP onto DZero.
Problems with CDF CAF.  10 Gig links.  Removed antenuators.  Lab 8
wireless problems - new bridge firm ware now stable
Upgr CD LAN from WH to FCC to 1 Gbit.
No apparent "MyDoom" traffic.
LB - WH rollout?  PD - probably will be last to do.
Discussion about "outside the lan" capability here -- (have at
Brookhaven) -- Can there be a portion outside the rules ?  Being discussed.

- Security (Matt Crawford)


- Computer Center Operations (Mike Stolz)

Lots of pages over the weekend.

- Storage (Bonnie Alcorn)

All 4  DLT  drives timed out (STKen) over the weekend.  Could not find
any reason.  All kinds of problems on STKen this weekend.
ADIC problem - logged to ADIC to fix.

CSS (Mark Kaletka)

Fleet  Utilization  Meeting -  Two vans we have need to average 100
miles a month.  We are at 80 miles a month.  Encouraging people to make
use of the Vans if you are a typical user.

SVX parts review

Two infocus projectors.  STill in prep but also four dell projectors
that one can check out.  Want to make some available at WH.
BSS may open access to Sunflower so may have local reps trained to be
able to change.

d0ora2 problems--hit 12 day mark no problems.  but then last week early
had problems - EMC and Parkplace both in to work on it.  - Three more
disk drives replaced.  Now, things running flawlessly.  Continue to
watch.  May be out of the woods.  Raid array working better but RMAN
back up taking 12 hours instead of 2 hours.

Linux support - officiallyt released version based on RH open source. 
Ready to have FUE source tested.  People have been testing anti-spam and
seem to like it.

ANSI not wanting to run properly in fnalu. Working on this. Farms
struggling with upgrade.  Patches made.

BTEVsrv 1 and 2 now on 24 by 7

CEPA (Vince)
Patty at CERN.  Vince just back from vacation.
Problem with CDF Silicon.  Added a rate limiter for the L1 trigger
rate.  Resulted in occurence of hangs - 5 events that have occurred that
hang the whole DAQ.  Working the problem.

Planning and Customer Support (Steve Wolbers)

Have loaded budget.  Will have meeting later this week. Project status
meeting this week so can come to see Tevatron BPM.
WPAS - numbers finished.  Mike and Vicky worked on this.  Words will be

Record luminosity this morning-- 55 (in the usual units)

Projects (Joy Hathaway)

The Project Status Meetings scheduled for this week are:

Wednesday, February 4th at 9:00 in FCC1:  Accelerator Support

Thursday, February 5th at 9:00 in FCC1:  ROOT

For more information and future meetings see:

Joy and Ruth

(AB) Lots of meetings  in the CD.  Discussion over the number of
meetings.  Whether or not one can go to all of them.  (Not).
(JH) Making efforts to have talks linked into status meeting page before
it starts. (DP) Important to go through all projects once a quarter.
Longer list of projects.  (DP) Allocate the time by Roots of WBS's is

Operations (Gerry Bellendir)
Also a DZero Farms outage tomorrow morning 6 am - 11 am.  Will be
powered down by 6 am (per Joe Boyd).  (Rick Theis) Help desk not
informed about this.  Not finalized until last Friday.

Minutes - Send comments to mailto:ritchie@fnal.gov< mailto:ritchie@fnal.gov>

Submissions by e-mail:

CSS Ops Report 2004-02-02
1.      ES&H
2.      Fleet Usage

Fleet Utilization Committee, Mike Behnke reported; that the usage
  logs for the CD vans are at approx. 80 miles for the month of January.
  The 2 CD vans are assigned and listed as "Non-Discretion" vehicles,
  which must maintain a minimum usage of 100 miles per month, averaged
  over a 3 month period.

3.      ESS
3.1     Run II Support

-SVX spare parts review is near completion, will compile and forward
  our findings to CDF & DO this week.

Dzero Support:
-D0 has fabricated (10) new VRB (VME Read Out Board).  ESS will program
  FPGA's and performance test.

-Repaired &/or tested;
  (1) Fermi: 1553 Rack Monitor

CDF Support:
-Repaired &/or tested;
  (1) Motorola: MVME2400-0363 SBC (Single Board Computer)
  (1) ACDC: EL750B Electronic Load

3.2     Logistics

-(4) New Dell projectors available for CD short term check out at
  the PREP counter.

Proposing Bill Finstrom, Shirley Knauf, Mike Behnke for "DPR" training:

>> (BSS) plan on rolling out a division/section property representative (DPR)
>> function across the laboratory.  At a high level, a person in this role
>> will be responsible for custodial changes, location changes, and property
>> passes for inventory/trackable and non-inventory assets.  Also, we will be
>> creating a role that has the ability to just maintain/create property
>> passes in the system.


4.      DSG
4.1     D0

d0ora2 crashed Tuesday a.m. EMC installed fbi (addt'l monitoring) on Tuesday p.m. & we have been sending them logs daily, changed out lcc's, fiber channels, BUT the change that seemed to make the difference this time was 3 drives on Friday. We have had 0 trepasses (controller failovers) since then.
We continue with ZERO trespasses  since 11:15 a.m. on Jan 30.

We  are still very cautious and Park Place has sent 2 technicians
that are meeting with Steve now.

We did have Drive 5-7 go bad at 6:03 p.m. Saturday & Steve
was in Sunday and replaced disk 5-7 in the array.  This has brought
us back to normal by freeing up the hot spare.

We are investigating  slow access to our backup disk/tape area(s).

4.2     CDF

Interruption of ~15 minutes to cdf offline db. Fcdfora4 machine. Machine rebooted by accident. (same rack as d0ora2)

5.      CSI
5.1     Linux Support

- LTS 3.0.1 (Long Term Support) officially released! This is our first build
of Linux enterprise.We are now ready to begin FUE certification.

5.2     Backups

- Meeting with CMS regarding their backup requirements

5.3     SPAM

- Informed users at pc manager and unix users meeting about hepa testing.
Already getting positive feedback!
- Stress testing boxes. Pumping 9k per hour and they seem to be holding up!

6.      CSG
6.1     Off-Hours

D0ORA2 - Rick VanConant was paged for hw support on D0ORA2. RickT and AdamW
were also paged yesterday.

6.2     Contracts

Contract Maintenance (from Sue Winter)
1.  Decision One Traditional - PO 542268 assigned but not yet issued.
2.  Decision One Labor - PO 542269 assigned but not yet issued.
3.  Decision One PerEvent - PO 542310 issued on 1/23.
4.  Decision One Upgrade/Install - PO 542303 issued on 1/21.

Dell HW T&M - $5K.  CD93716.  New Blanket contract.  Department
LSI Logic Storage Systems - CD93516. New support for Dave Fagan.
Division Checkout.
SGI HW & SW - MMS #169099 assigned to Clark.  No change from last week.
Received quote from Tricia; going through this.

Cisco HW & SW - Have responses from BS and D. Wohlt.  Working on
Darryl's list.

Agilent SW - Received vendor quote. D. Tang says OK to renew.
IBM Rational SW - MMS #169650 assigned to Clark.
Matrix One SW - Waiting for J. Trumbo to return on 2/2 for OK.  Have
vendor quote.
Wind River SW- MMS #169652 assigned to Clark.

6.3     HelpDesk

During the week there were 166 new Remedy tickets created with 37 of
those still open.  Overall there are 213 open Remedy tickets.

Remedy automation created 9 tickets during the week.

7.      SCS
7.1     FNALU

* The Ansys issue of not being able to run on an SGI using AFS is being worked on by
  Candies, she is the product maintainer of Ansys. The issue is using multiple cpu's.
  While the application runs in AFS space, it does a file lock which fails and causes
  the application to die. This only happens when using multiple cpus's, it does not fail
  is you only request one cpu. This is how we are running at the present time.

* Reinstalled fsun01 and al things work again. It looks like a patch install failed somewhere
  along the way and broke the serial consoles.

7.2     Farms

* Upgrade os on fnsfo/h, found a problem with exporting drives and then found a patch
  that fixed it.
* Working to put together numbers for broken CDF nodes that we want Atipa to fix in one big chuck.
  These are repeat offenders, or broke for a long time.
* Final cleanup of Angst issues. There is a rumor afoot to move these 96 nodes into the CAB
  instead of Reco. I guess it's D0's call.

7.3     BTeV

* Btevsrv1 and 2 are now 7x24. Web info is available.

7.4     KTeV

* Tuning Gige intefaces. Ktev has decided to add four nodes with tape drives
  to read tapes to Enstor. Bonnies group is doing this.