Diagnosis of RD027 prosessing


Subject: Diagnosis of RD027 prosessing
From: Erik Katsavounidis (Erik.Katsavounidis@lngs.infn.it)
Date: Sat Feb 12 2000 - 07:48:09 EST


This is the summary for RD027

TAPE JOB --- Downloaded 30 files in 2 hours (20:07-22:07)
Analysis --- Started with first RUN (11543) at 22:08
Analysis --- Ended with last RUN (90108) at 04:56 i.e. 7 hrs analysis
Calibrations --- found 4 calibration runs -- I checked that they were indeed
                    saved in the [MACRODATA.RAW.CAL] area.
                    CALIB: RD027 RUN090103 CAL1 (HIST)/1 (STAT)
                    CALIB: RD027 RUN090104 CAL1 (HIST)/1 (STAT)
                    CALIB: RD027 RUN090107 CAL1 (HIST)/1 (STAT)
                    CALIB: RD027 RUN090108 CAL1 (HIST)/1 (STAT)

Errors --- There were 8 runs that gave ERRORs:

Subj: MYSTATS LOG ERRORS RUN11543 RD027 SIZE= 1-NO
CPU time limit expired

Subj: MYSTATS LOG ERRORS RUN11544 RD027 SIZE= 1-NO
CPU time limit expired

Subj: MYSTATS LOG ERRORS RUN11545 RD027 SIZE= 1-NO
CPU time limit expired

Subj: MYSTATS LOG ERRORS RUN11546 RD027 SIZE= 1-NO
CPU time limit expired + SAERP1 sanity errors

Subj: MYSTATS LOG ERRORS RUN11550 RD027 SIZE= 1-NO
Sanity check in SAERP1 yielded ERROR --- nothing to do about it

Subj: MYSTATS LOG ERRORS RUN11553 RD027 SIZE= 1-NO
CPU time limit expired

Subj: MYSTATS LOG ERRORS RUN11563 RD027 SIZE= 653-NO
CPU time limit expired + ERP MUON buffer yielded ERROR

Subj: MYSTATS LOG ERRORS RUN11566 RD027 SIZE= 1-NO
Sanity check in ERP MUON buffer yielded ERROR --- nothing to do about it

ALL CPU time limit expire jobs were running on AXPGS4. I have manually
relauched the 6 jobs (MY_MONITOR). We have a problem here with AXPGS4.
It is the only FAST queue that may lead to CPU time limit expiration
for the MY_MONITOR job. I'm trying to think of an effective way of solving
this problem besides simply launching the jobs on ALPHA$BATCH. It is true
though that when the cluster is not used by other users, ALPHA$BATCH should
work equally fast with ALPHA$FAST

After relaunching the jobs that had CPU EXPIRE error, four errors remained
for which there's nothing to do about.

RD027 is now complete. It took 2+7=9 hours running in batch mode.
It took me another 30 minutes to follow up on the relauch of the
jobs that crashed due to CPU limit and also to compile this e-mail.

--Erik



This archive was generated by hypermail 2a24 : Sat Feb 12 2000 - 07:48:12 EST