wfd summary as of Texas meeting (minutes of pre-mtg)

Ed Kearns (kearns@budoe.bu.edu)
Mon, 16 May 1994 03:31:44 -0400

13-May-94
Ed Kearns

Minutes of the
WFD Working Group Meeting
Corpus Christi, 5-May-94

This message contains meeting notes from the WFD pre-meeting May 1st
prior to the general collaboration meeting in Texas. To a large extent,
these are expanded transcriptions of transparencies

------------------------------------------------------------------------------

At the Bari Meeting (no pre-meeting held), we listed these accomplishments:

o Power supplies, Crates, Fans Cables delivered to Gran Sasso (TAMU)
o 5 prototype WFDs at BU, Gran Sasso and TAMU
o frontend daughtercard design progress
o first readout with Gran Sasso mini-DAQ (Surdo, Walter, Hong, Lu)
o STOP/START/CLOCK fanout prototyped (BU)
o STOP manager designed (R.Liu)
o VME power <-> fanout noise studied (Sanzgiri)

Since the Bari meeting, we have made the following progress:

I. WFD Motherboard
---------------
o design changes incorporated into new PCB (delivered Feb 1994)
o Production Prototype built and tested OK (April 1994)
o all IC's ordered (kudos to A. David)
- PM7226GP due June 3, MAX901 "mid-June".
- we may socket these to start assembly and testing
o new information on rollover problem (see below)

II. WFD frontend daughtercard
-------------------------
o discriminating at low threshold solved (add amplifier)
o more post-samples solved by stretching comparator output
- tried _SUP input to ASIC. Doesn't work.
o output pulse shape improved (tweaked PCB layout)
- but is it good enough? See C.Walter info.
o production prototype pcb delivered (while at meeting)
*** update - has now been assembled at BU and is being tested ***
o nearly all IC's ordered
o calibration and stability ... no new work!

III. WFD System Issues
-----------------
o realistic readout feasible
o small # of wfd's in acquisition? _NO_
o brief mini-DAQ test with LIP trigger (Walter/Sanzgiri/Kearns)
o benchmark readout speed and data size (see EK MACRO memo)
o noise (PMT->WFD ok) (PMT->fanout->WFD bad)
- progress made (4 mV... not 1 mV yet) (TAMU, esp. A. Sanzgiri)
o START/STOP/CLOCK fanout 6 boards + 1 spare being built
o 200 MHz clock being built
o STOP manager initial tests at GS. Back at CIT for fixes.
o DAQ buffer size problem (F. Ronga) not addressed or understood yet
o for budgetary reasons, we will commission the system with 80 cards
(4:1 fan-in). We will stockpile those parts that we have for 110
cards (2:1) and consider that as a potential future expansion of
the system.

------------------------------------------------------------------------------

New information on rollover problem:

Summary: the rollover problem is caused by borderline logic timing
inside the ASIC. When the internal counter reaches 0xFFFF, the ASIC is
supposed to generate a cycle (64 bits of data). However, by the time the
logic recognizes the 0xFFFF pattern, it may be generate an internal
signal to shift in the current data too late. The result is one or more
bytes that is written to memory is a duplicate of the previous cycle. If
the byte belongs to the timeword part of the cycle, it may be difficult
to recognize that this cycle was generated by timeword rollover. See
Bari meeting transparencies for an example illustrated in wfd raw data.

The proposed solution was to put regular pulses into the _SUP input of
the ASIC, briefly turning off zero suppression. The regular pulses would
be divided off the 200 MHz clock at the START/STOP/CLOCK fanout board
and fed into the NIM_CYCLE input.

1) NIM_CYCLE -> _SUP doesn't work reliably. :-(
2) The rollover problem is still not a problem if:
a) occupancy high + rollovers off
b) rollovers on + algorithm more sophisticated
3) We have learned that HOT ASIC => problem worse, COOL ASIC => problem better
4) Results of a study of the *onset* of bad data as a function of MSPS:
conditions: 13 different ASICs in warm lab without forced air
(i.e. hotter than in Gran Sasso. See 3.)

not ok <- | -> ok
| XXX
| XXX XXX XXX
XXX | XXX XXX XXX XXX XXX XXX XXX XXX
-+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
f_onset (MSPS)

Each histogram entry was made by running a channel at a given sampling
rate and noting the highest frequency that contained 100% good rollover
timewords in the 64K memory buffer in ten repetitions of the test. The
number plotted above is that frequency plus 1 MSPS, or in other words,
the frequency at which failures started to show up. Note that under
adverse conditions (hot ASIC), 12 out of 13 ASICs nominally pass this
test.

Despite this, I still label this a "problem" because: a) it isn't 100%
of the ASICs, b) there are ASIC's showing the problem only slightly
above 200 MSPS, c) hand checking the gate delays in the ASIC design
confirms the existence of this effect, d) we have no data about whether
the ASICs that pass will continue to pass with >99% reliability, e) we
have no data whether the problem will get worse with time.

One solution would be to run at 190 or 180 MSPS. This is a very
inconvenient number, and I would prefer to see what happens at 200 MSPS
first. To improve our chances of success, I propose to run this onset
test on each channel and to replace those ASICs that fail some
criterion.

------------------------------------------------------------------------------

No new information was presented on the readout issues discussed in the
recent MACRO memo by ETK. To remind you, for 4:1 fan-in, we can expect 6
kB per channel per ms, .2 Mb for a full supermodule. The readout speed
is .5 Mb/s using the current configuration.

Francesco Ronga has reminded us that reading out a large amount of data
in one event is a problem for the online system. I do not fully
understand this problem as yet. This is something we need to look in to
ASAP. For now, I shall simple quote F.R.:

"The maximum size for an event allowed from the current acquisition is
64 kbytes for each microVAX. (I have told you many times!). It's
difficult to increase by a large amount (may be a factor of 2 could be
possible perhaps) due to a lack of memory available in the KAV-30 and in
the host computer. Moreover, the dead time is dependent on the maximum
buffer size. Generally all the acquisition performances are dependent
(because the time necessary to allocate large buffers is bigger than the
time for smaller buffers). In any case with the current acquisition for
large buffer transfer bigger than 64kbytes you shouls have errors due to
the memory allocation."

"You could understand the reason looking on our work on the DAQ.
(Perhaps the work presented at the 1989 IEEE is a bit more clear). The
acquisition is based on a message buffer scheme. Due to this fact for
each event you need to allocate a new buffer. In order to reduce the
dead time for each job a queue of many buffers is allowed. In the
current version we have around 60 possibles event buffers in different
queues. This means 60*64 kbytes = 3.8 Mbytes for a maximum event
allocation. N.B. this scheme is used in all the DAQ that I know, and is
due to the fact that drivers are organized as separate jobs receiving
data on message passing scheme. The same limitation aplly to the part
of the acquisition running on VMS. We use a CERN package (MBM) for
buffer allocation. This package becomes very inefficient for small
events if you requires large buffers as maximum size."

Francesco is advocating that we solve this problem, and the speed
problem, by using a computer local to the wfd VME crate to write *into*
the KAV memory. I would like to try some tests, but it is not obvious
what this will help. It is not apparent to me that this addresses the
event buffer issue at all. It is also not necessarily going to provide a
significant improvement in speed. Even with my 68040 based MVME-167 on
the wfd VME backplane , the time readout algorithm only improves to .8
Mb/s. And after that we still have to do a subsequent transfer across
the VICbus to the KAV30 (this could be a BLT perhaps, but it still adds
time). Also, we end up spending money on the VIC interface as well as
the in-crate computer and we need to spend an uncertain amount of
development time on a more complicated system integration.

I presented a list of possible choices that we should consider. This list
may not be comprehensive!

1) Instead of VIC-8250, buy VIC-8251 (extra $1k per module). This does
nothing for us now, because the KAV30 uVaX is a bigger bottleneck But
it seems to me that the VIC-8250 is a dead-end, whereas the VIC-8251
handles the full VIC protocol andis a better match if the online uVaX
ever gets upgraded. It may provide better performance in the case
where we have a computer in the WFD crate- I'm not sure yet. It may
also depend on whether a VIC-8251 receives the data in the system crate
(currently it is a VIC-8250).

2) Add a VME computer to the WFD crate (as discussed above) and use
the VICbus for transfer.

3) Add a VME computer to the WFD crate (as discussed above) and use
ethernet for transfer. Ethernet is much cheaper that the VICbus, and
we would trade the cost of VIC-8250 modules against the VME computer.
It is not apparent how to integrate an ethernet transfer into the
current online software (sockets is the standard approach; J.Hong has
made some preliminary investigation). Even studying this may be
difficult because the uVaX do not have IP software installed (no ftp
for example).

4) Add a VME computer and hard disk to the WFD crates. Take data in
parallel and merge offline.

5) It may be possible to buy a modern-day uVaX as the VME computer and
rely on DECnet instead of ethernet for 3) or 4). Alpha based single
board computers are quite reasonably priced. Unfortunately, the
software may not be. Again, this is a big development project.

ETK will continue to forge ahead on these issues, but would very much
appreciate additional people, suggestions or other assistance.

------------------------------------------------------------------------------

In the mini-DAQ tests of 3/94, Chris has noticed some glitches on the
leading edge of the waveform for muon pulses. These appear at the point
where the amplifier is changing from high gain to low gain as it passes
through the breakpoint. This problem had been observed before, and I had
thought that it was fixed. Three of the daughtercards used for this run
were of an older design, but one was of the new design, and I did not
expect the glitch to show up in that channel. This is a difficult
problem for a signal whose slew rate is asked to quickly change.
Benchtop measurements at BU indicate that the glitch is negligible for
waveforms with a risetime of 10 ns. But that is contradicted by Chris's
analysis. There is a likely solution where we slow down the input
signal slightly, but this may also increase the signal stretching at the
trailing edge and if we need to slow it down a lot, it may effect the
relative timing of the leading edge with respect to the discriminator
bit and we may lose a pre-sample. This issue is under study.

------------------------------------------------------------------------------

Timetable:

WFD motherboard frontend daughtercard
---------------------------------- ----------------------------------
80 motherboards have been ordered. Prototype daughtercard looks good.

Assembly in early June. QC at TAMU. Assembly in late June?

I would like to get 1 SM worth (12) Calibration not designed yet.
to GS ASAP for full crate tests. Try to get 12x4 by mid-July.

>>> Plan is to solve system issues with 1 crate of wfds.
>>> Commissioning of the rest to be discussed later.

------------------------------------------------------------------------------
Attention WFD workers: please keep me apprised of your travel plans to G.S.