Minutes of Mar 3, 2003 CD Operations
- 419 days without a lost time injury.
- 9 people past due for GERT, 4 qualified for online class.
- dcache: Errors down at the 1/10,000 level now. Staging tests of 2 TB
did not reveal any problems. System usage still rather low: 1-2 TB/day.
- CAF fileservers.
- WD released official version of it's patched firmware
which will be re-installed this week on 18 CAF fileservers. More
fileservers will receive the patch beginning on March 10. WD will install.
- Ron Rechenmacher fixed a significant problem with the OS on the
CAF fileservers. Ron determined that version 1.1 of xfs was causing
hangs, and upgraded to xfs 1.2 which worked fine. The fileservers on
the dCache test stand will all receive this fix as well.
- Last week upgraded to RH7.3. Fixing a few problems that were observed.
Going pretty well and happy with Rocks as an install tool.
- GRID: transforming grid testbed into a production testbed. Small test
involving tier 2 sites this weekend.
- Understanding performance of Aztera fileservers. Back to 150 MB/s writes,
250 MB/s reads with a few hundred clients. Fileserver was on loan and
will be returned to the company.
- 5.5 M events collected, 21 M processessed.
Rerocessing data with improved calorimeter corrections
- Central systems:
Next scheduled downtime is March 4. Downtime is extensive and
affects all systems
- Networks: Brief outage that affects the robots
- D0mino: Downtime 8 hours to go to 128 cpus.
- D02ka: Fix some hardware problems--should be invisible since D0mino
and CLuED0 will be down.
- CLuED0: down from 8am - 12 noon for reboot and hopefully moving the
cluster master ldap server to DAB2.
- Enstore: ADIC (MC and recent tmbs) robot down for several hours to
upgrade a PC
- DB: Downtime 8 hours to remove bad MC parameters. Contingent on
finishing testing. Will also make some minor schema changes.
- FARMS: Down to change automount configuration.
Less than 100 tapes in mezsilo, will need to add more.
- Analysis SAM stations
221 projects, 0.7TB consumed, 500 GB transferred in, and 50 GB out
D0Karlsruhe 24 projects, xx TB consumed, 0.0TB transferred in
CAB 221 projects, 7 TB consumed, 2.5 TB transferred in
D0mino: 636 projects run ~10 TB consumed (5.0/0.5 TB in/out),
- No report.
- Friday STKen disk cache upgraded. Caused some problems for transfers.
- ADIC cannot come out tomorrow. D0 downtime to upgrade pc cancelled till
1st Tuesday of next month. Developers want to upgrade mover code and
restart movers, around 10 a.m. tomorrow.
- New tapes went into D0 robot this morning.
- CDF migration. Finished reading CDF tapes a second time. Will recycle
the tapes. Waiting for CDF list of additional tapes to copy.
- Thursday 3/6, 6:30 a.m. reverse path check re-enabled on fcc core router
interfaces. No distruption anticipated.
- Migrating OSPF config on routers to more secure (MD5) authentication.
CD LAN interface & WH/bourder router link this week.
No disruption anticipated.
- Lot of electrical work on FCC2.
- SMTP gateways behind a load balancing switch running new code.
- Listserver slow response issue is back. Scheduled to upgrade software
- Sitewide NETbios block April 1.
Exceptions for servers that need offsite access.
- Farm eval's ramping up this week. Lots of interaction with vendors.
- Farms installed a new rsync product.
- Downtime on D0bbin tuesday morning to improve performance.
- Maintenance contracts: getting ready for charge out process in early May.
Another round in June.
- CDserver down from 3-6 on saturday for upgrade to w2000.
- Two fess servers upgraded to win2k and moved into fermi domain.
- Four latest security patches applied to production machines. Done.
- Another disk added to rman backups for d0ofprd1. Last of 17 GB disks
allloted for backups. When it fills, plans to send backups to another disk
- Designer work for final erd for CDF sam changes.
- Project account work coming up: shutdown 3 p.m. friday march 28- Monday
march 31 for miscomp.
- Oracle freeware taking stock meeting coming up.
- Trouble shooting boards for far detector electronics for Minos.
- Testbeam board for BTeV finished.
- Scheduled moved on FCC3 completed.
- 3 technicians will receive iron man certificates for no sick leaves taken.
- progress on WPAS.
- Julie showed metrics for databases
Average of 10 licenses used at CDF, and 10 at D0.
- Developments on work on D0 event model. Phillipe will have
to work on making some changes on root. Two projects to work
on root at D0.
- project with D0 on level 2 trigger and alpha processor board.
Has delivered project to D0.
- At CDF there has been a 12 KHz limit
imposed by silicon. Limit is while they investigate problems that
occur at high trigger rate. On Friday there was a shift from
investigation to implementation phase: a monitor will be implemented
and the limit will be taken off.
Planning and Customer Support
- WPAS due Wednesday.
- Travel budget allocation will be sent out soon.
- Congress approved budget: but we do not have budget for directorate yet.
Then will spread it out over budget codes.
- Visual media services will have a statement about the use of the
- Send your news items to gotnews.
- Office moves went well last week.
- D0 silo cleaning.
Expecting to receive final report from Chemir.
PO prepared to clean the empty silo. Vendor has experience with clean rooms.
- PC farm heat problem
Modified information to vendors.
Received report from Bob Forster.
- Get your WPAS in on time.
- Don't panic over effort reporting. We're working on improving it.
- P5 review may cause us some extra effort.
Open presentations on March 26 in 1W.