Chris W. has really done a good job on this. The fact that he's down to
such a small number of unknown problems is a good sign. I have written a little
blurb after most of the events below, but I suspect hardware readout failure in
most of them, and would only suggest going after the cards if they repeat in the
same place. I mostly worry about the bad events associated with reaching the
byte limit... the big events could quite conceivably cause lots of nasty and
subtle readout problems. Plus they may be associated with interesting physics.
I suggest looking into the possibility of raising the byte limit, but I don't
know the derivitive d(Total Data)/d(Byte Limit).
Note: there will always be a place where the readout breaks down... I
think Chris may be stuck with this, because even if the WFD works perfectly
the rest of the detector doesn't. Certainly the readout ain't going to make
it for a 350-muon event, or for a 10^3 TeV shower from a single muon; as long
as Chris can prove that it works well for a reasonable range, though, he is
fine. We have always in the past ignored the fact that the detector may not
survive the largest events; in particular, we ignore this for the multiple muon
analysis, which strikes me as a little dangerous. For most of Chris' search,
this is not a big deal, but for the part which looks inside the high energy
showers, it may have to be considered. I am going to look again at the
high-multiplicity showers that come without streamer tube data, and try to
find out where we have true failures (as opposed to events in which the streamer
tubes just weren't workling anyway).
- Nat
>Event 43:
>************************************************************************
>Tank 3T03 chan 0x321:
>The readout for this tank stopped too early. I have no explanation for
>why. Because of it we are missing all of the muon data.
>
>Tank 3C03 chan 0x309:
>There looks like there is a problem with the time on this channel.
>However, that is only because I turned of my "first-sample" fix which
>corrects for what happens if the first sample is not separated by four
>time words from the second(Which can happen if the system is
>digitizing during a stop).
I'm not sure what Chris means by "digitizing during a stop." Possible the same
cause for both problems? If these do not represent much of the data, maybe the
fix can be replaced by something which tosses out the events... I would have to
know a little more about what "digitizing during a stop" meant, though.
>Event 1308:
>************************************************************************
>Tank 3W8/10/12/14 chan 0x319:
>
>This channel has the muon showing up too early. It appears that this
>box missed a roll-over word. Some boxes (especially verticals) seem to
>sometimes miss a roll-over word. Perhaps the rate is not high enough
>in these boxes. The effect of this is that the muon seems to show up
>328 uSec too early(328uSec = 65536*5ns).
>
>Tank 3B12/14 chan 0x307:
>
>This chan overflowed(hit the byte readout limit). However all of the
>ADC data is 0 and all of the dBits are FFFF!
>
>Tank 3B09/13 chan 0x304 0x305:
>
>These overflowed but because of a very large pulse. The WFDs are OK
Again, I'm suspicious that the "not understood" problem occured on the same
event as an "understood" problem. How are we on the byte limit? Can we move
this up without affecting the data too much? We may not be able to trust the
events that hit the limit... but it would sure kill some of the big showers if
we just tossed them out. This may be a problem for a readout guru. Do we often
miss rollover words in the same events where we have hit the byte limit?
>Event 4269:
>************************************************************************
>All of these channels seem to have the muon offset by 2.5 usec. I
>actually reported this quite some time ago. Everything on SM4 is
>offset by this amount. This one happened to be so close to the edge
>of the timing window that part of the muon got cut off. At some point
>I seem to have commented my fix for this out of the code. So this is
>really a STOP-master problem. However the STOP master thinks it only
>counted for 1 ms. So it looks like the clock chip itself was/is
>slightly off.
>
As I understood this, Chris has already fixed it in software.
>Event 4767:
>************************************************************************
>Tank 3E04 chan 0x317:
>missed roll-over word
Can we find out which tanks most often miss rollover words and try swapping
cards?
>Event 4986:
>************************************************************************
>Tank 2W08 chan 0x218:
>In this event the time-word "glitched" and went up instead of going
>down. Because of this the readout routine got confused(thinking a
>roll-over happened) and stopped the readout thinking it had read
>enough data:
>
>Here is the section of the data:
>Time Timewd dBits AD0 AD1 AD2 AD3
>999345 51396 1111 27 26 28 27 0.2 1.0 -0.6 0.2
>999365 51392 1111 28 31 34 38 -0.6 -3.0 -5.4 -8.7
>1326745 51452 1100 37 33 29 26 -7.8 -4.6 -1.4
If this only happens once, I suppose it can be chucked.
>Event 5577:
>************************************************************************
>Tank 6B12 chan 0x607
>The data is missing in the relevant section of the buffer for this
>event. However it is clear something went wrong during the stop as
>the ADC and dBit values in the first sample are nonsensical:
>
>Time Timewd dBits AD0 AD1 AD2 AD3
>0 8478 1D1A 242 81 0 0 -8697.8 -54.7 0.0 0.0
>59905 62033 0000 26 26 26 26 0.2 0.2 0.2 0.2
Again, it just looks like bad readout bits... could be anything... CAMAC, bad
disk write... if it's rare enough (ie one time only), can it be chucked?
>Event 7414:
>************************************************************************
>Tank 6E04/06 chan 0x617:
>Missed roll-over
>Tank 6C08 chan 0x60B:
>The second ADC value is stuck at the value 9
Both sound like hardware. If they happen again, we could look at the cards.