shift report - page 2

Subject: shift report - page 2
From: James Musser (musser@blackhole.astro.indiana.edu)
Date: Mon Dec 06 1999 - 14:24:33 EST

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Next message: Erik Katsavounidis: "Power interruption saga"
Previous message: James Musser: "shift report - Nov16 to 27"

Shift report - addendum

Taking up where I left off after hitting the wrong button on my email
program and sending off the shortest shift report in the
history of MACRO, the most important news to impart from the last couple
weeks at the Gran Sasso is a problem in the ERP
readout which was detected by Ionnis during a calibration run a couple
weeks ago. In a laser run performed at that time
a box was known to be off for repair. Mysteriously, its next door
neighbor also was not present in the data. For reasons known only to
God, this occured the week prior to my arrival at the lab. The next
Wed, Ioannis, Erik, and I worked to track this down, and discovered a
previously unknown problem in the ERP readout. To make a long story
short, for roughly 15% of the S/H modules, in events in which channels 2
and 3 fire, only channel 2 is seen in the data. If channels 2,3,and 4
fire, all are seen. Any one channel firing is also OK, as is all 4
firing. The problem seems not be be associated with particular
hardware, and more work is going to be required to isolate the problem.
The next day, Erik did a scan of data, looking at the number of
times tank n was seen in the same event as tank n+1. The
problem is quite easy to identify in this distribution. Rather than
paraphrase him, I reproduce below his summary, which
includes a more complete description of the pathology.
The primary event topology that would be effected by this are events
in which the muon passes through two counters in a plane - in some
cases, only one of these will be seen in the data. Needless to say,
analysers should verify that they are not for whatever reason rejecting
this class of events.

*** This Begins Erik's Summary *****

We have recently discovered a problem with the ERP system (but maybe
*not*
*within* the ERP system) that is relevant to muon analyses: the ERP does
not
seem to respond as expected on situations that involve MORE THAN ONE
COUNTER
per ERP module, namely, it MISSES AT LEAST ONE OF THE COUNTERS INVOLVED.
This
seems to be present in ~20/124 ERP modules in the detector.
The symptom can be described as follows:

let us name C1,C2,C3,C4 the four counters (ordered from the top of the
module
to the bottom) that are processed by an ERP Sample and Hold module (+
Trigger
Processor). By "ON" here I mean "active" which for a work bench test
implies
inputing pulse signals of 1-3Volts:

When simulating: We read:
C1 C2 C3 C4 C1 C2 C3 C4
case 1) OFF ON ON OFF OFF ON OFF OFF
case 2) ON ON ON OFF ON ON OFF OFF
case 3) OFF ON ON ON OFF ON ON ON
case 4) ON ON ON ON ON ON ON ON

i.e., it is C3 that misses from the registers on certain box patterns.
In all these cases, the trigger LED on the Trigger Processor fires
correctly
suggesting that the problem is readout related.

The hardware behaves as expected when only ONE counter fires at a time.

The problem was first identified during a calibration run by Ioannis. It

was on 18-NOV-1999 when looking at the event display during a
calibration
run (during which ALL counters are lit up) he observed that while
3C16 was turned off and no hits were expected for this, its adjacent
counter 3C15 was not firing in the ERP also. This triggered a direct
test on the hardware that Ioannis and Roberto performed on the same
day verifying that having 3C16 off, 3C13, 3C14 and 3C15 could not fire
at the same time when triggered; instead only 3C13 and 3C14 were
present.
At that time they'd actually seen this behavior being present in other
SM3 ERP cards too.

Jim Musser has been on shift at GS and he had the opportunity to have
a direct look at the problem. Last Wednesday, together with Jim, we went

ahead and checked for this behavior on all ERP cards in the detector (a
total
of 124 cards each servicing 4 counters). The symptom that was described
above
is present in at least 20 out of the 124 ERP cards. This was derived by
injecting logic pulses into the system and reading it back with the
mini-acq
system. The problem cards are:

SM1 card #21 servicing 1N01-1N04
     " #22 " 1N05-1N07
SM3 card #01 servicing 3B01-3B04
     " #02 " 3B05-3B08
     " #03 " 3B09-3B12
     " #04 " 3B13-3B16
     " #05 " 3C01-3C04
     " #07 " 3C09-3C12
     " #08 " 3C13-3C16
     " #09 " 3W01-3W07
     " #13 " 3W08-3W11
SM5 card #17 servicing 5T01-5T04
     " #19 " 5T09-5T12
     " #20 " 5T13-5T16
SM6 card #01 servicing 6B01-6B04 work bench test not reliable
     " #02 " 6B05-6B08
     " #04 " 6B13-6B16
     " #05 " 6C01-6C04 work bench test not reliable
     " #07 " 6C09-6C12 work bench test not reliable
     " #19 " 6T09-6T12
     " #20 " 6T13-6T16
     " #21 " 6S01-6S04
     " #22 " 6S05-6S07

We tried to understand if a specific part of the ERP hardware is
responsible
for this behavior. We swapped S/H, T/P and Supervisors (one at a time)
between channels that do and do not display this symptom. This did not
change
the behavior, i.e., hardware that did not display this symptom before
was
actually buggy when moved to a problematic station and vice versa. Jim
has brought up other potential sources that were rather excluded:
length of bus cables, termination of modules within a crate. The current

working hypothesis is that this behavior has to do the camac/vme bus or
some other faulty module that "contaminate" the ERP crate.

The key questions behind this problem are:
1) how does it affect box efficiencies and analyses cuts,
2) was it *always* present in the above cards or did it move around
the detector during the course of years,
3) can we insert it in the detector simulations?

I think this behavior needs to be simulated in our detector Monte Carlos

starting with a run-by-run monitoring of it presence (see below). Its
final
effect might as well fall within the quoted detector acceptance
uncertainty
but it might not. It seems to be a very subtle and mean failure mode so
I'm not ready to bet one way or the other.

I have looked first of all in the real data in order to:
1) verify that this behavior is there as seen using in our tests,
2) see if it has always been located to specific channels in the
detector.

Indeed, in the cosmic ray data we can see this behavior to be present
at:
SM1 cards 21,22
SM3 cards 1,2,3,4,5,7,8,9,13
SM5 cards 17,19,20
SM6 cards 1?,2,4,5,7?,19,20,(no data to check 21,22)

I went back to the June 1995 data and this behavior was present at:
SM1 cards no data
SM3 cards 1,2,3,4,5,7,8,9,13,14,20
SM5 cards 17,18?,19,20?
SM6 cards 1?,2,4,5,6?,7,19,20,21

i.e., at first look, the cards that display this behavior seem to have
changed slightly over time.

In the list above, the question mark (?) indicates that the behavior
of the card was not fully consistent with the general symptom I outlined

above, i.e., displayed one but not the other. It is a bit tricky
using the real data to do the above study. You are certainly statistics
limited, you have to keep track of the dead boxes in the detector
and you have to group together runs with the same configuration.
I can elaborate on how I did the pattern analysis if you are interested.

I have put at Caltech's /macro10/kats/erpdbg/
some plots that you can look at; let me know though so that I can
provide you with details.

Ioannis had the idea to use the LED system to map all 2 and 3 box
combinations
within a supermodule (i.e. ERP crate) and see what the system reads out.

This indeed will be conclusive. He is currently working to prepare the
necessary LED code and we tentatively plan to proceed with this next
Wednesday.

In the meanwhile, if you have any idea on the possible cause of this
behavior,
insight of the possible effect on our analyses or any specific test that

you'd like us to perform, just let me know.

Talk to you later,
--Erik

PS#1:Both in the work bench tests and in the study of the real data we
were time
     limited by the constraints of the maintenance day. We have not
tested all
     fifteen patterns possible given the 4 counters per ERP card. It is
     actually almost impossible to manually exhaust all the hit pattern
     in all ERP cards. The special LED RUN will address a good chunk of
them
     while high statistics real data can probe even more subtle box
     combinations.
PS#2:Jim, by "work bench test not reliable" I refer to the counter pairs
that
     showed missing maybe in ~half of the events. This is most likely
related
     to the fact that we are firing the LED too fast while manually
changing
     ERP S/H. However, we need to check these cards again next
Wednesday.

Next message: Erik Katsavounidis: "Power interruption saga"
Previous message: James Musser: "shift report - Nov16 to 27"

This archive was generated by hypermail 2a24 : Mon Dec 06 1999 - 14:04:03 EST