Subject: Re: Mr_monitor strange coincidence
From: Christopher Orth (corth@budoe.bu.edu)
Date: Thu Sep 23 1999 - 09:41:36 EDT
Hey,
Would the use of a "lockfile" be more robust? Have the program check the
existence of a file, say in the MACROUSA home directory. Then it would
create the file as a sign to other jobs not to run. At the end of the
job, *or on errors*, the file should be deleted. As a safeguard you could
have the lock file be ignored (deleted/replaced) if it is more than 4
hours old.
Presumably this is only a problem during crashes and wednesdays.
Chris
On Thu, 23 Sep 1999, LNGS US MACRO group wrote:
>
> Ciao a tutti,
>
> This is the second time that 2 MR_MONITOR_RUNxxxxxx jobs are "competing"
> each other and thus blocking the rare-dst processing. It happens when
> *both* are launched at exactly the same time (hh:mm:ss) on the same queue.
> In this way, they are executed in parallel, they both find another job
> MR_MONITOR running, exit and are lauched (on exactly the same time) and
> stay like that until you kill one of them... This time, it happened
> for runs 18299 and 91542. I took care of it - just keep an eye on it.
>
> Ioannis
>
This archive was generated by hypermail 2a24 : Thu Sep 23 1999 - 09:41:49 EDT