wfd testing update

Ed Kearns (kearns@budoe.bu.edu)
Wed, 6 Oct 93 13:21:49 -0400

10/6/93 Ed Kearns, Bill Earle
-----------------------------------------------------------------------------

INTRODUCTION:

This note is to report on progress on wfd testing. Although we can go
over it more at the upcoming collaboration meeting, I do not want to
wait to solicit comments. We are finalizing the design of the wfd
motherboard. There are still some technical issues and tests we wish to
address for the daughtercards; but the details can be reasonably
factored out and need not delay the motherboard.

Finalizing the main wfd pcb design consists of the following steps:

(1) Reporting all changes in the pcb layout to the layout designer
for revision.

(2) Ordering the finalized pc boards. This will again consist of
a production prototype which we will stuff and test before
requesting the full production run. This is to catch last minute
errors in the artwork.

(3) Ordering parts in quantity. The lead times for our parts are
likely to be 4 to 8 weeks; but it is possible that one part
or another may take longer and determine the time until we
can begin fabrication.

*** Proceeding with these steps is a decision to commit much of the
remaining financial resources of this project. *** In this note I will
review the studies and tests we have done with the wfd. In particular, I
will go into some detail regarding a problem in the ASIC, including a
discussion of possible workarounds. I am soliciting comments and
feedback of all types, including tests we may have overlooked. But the
central issue is to decide *when* enough testing has been done; or in
other words, how to trade off more confidence in the design versus
getting the system installed and doing physics.

-----------------------------------------------------------------------------

GENERAL REPORT:

The current board (WFD-C) was a major design revision from the first VME
prototype (WFD-B). On the first prototype we got a single channel
working and discovered and solved a variety of design and layout errors,
ranging in severity from trivial "typos" to those requiring significant
redesign. We patched each error and in the end, we had a working wfd
channel that digitized monopole-like signals at 200 MSPS.

To continue with the current board design (WFD-C), we took the
modifications, as well as a handful of improved features and generated a
new design (schematics) and layout (pcb). The changes were significant
enough that an entirely new pcb layout was called for. Not surprisingly
then, this pcb again had a number of new errors (~8), now all of the
relatively trivial kind: slightly incorrect land patterns (hole or pad
geometries), reversed connections on transistors or caps, and one
connection that shouldn't have been made and had to be drilled out. All
in all, pretty good for an entirely new layout of this complexity.

We discovered the new problems with our first home assembled prototype,
and made a list of special instructions to compensate (for example:
"insert this capacitor opposite to the polarity indicated"). We were
able to have four more prototypes assembled professionally (Texas
Instruments) by giving the special instructions to the assembly
contractor. The boards came back either in working order or with 1 or 2
errors on a board due to assembler errors. ** This is evidence that we
have reached the stage where we can produce boards in quantity without
the excessive individual attention a prototype invariably attracts. **

The wfd motherboard can be broken down into three sections: (1) VME
Interface (2) RAM and (3) ASIC/FADC. Here is some more detail about
how we tested these sections.

(1) VME Interface. This has been tested with my MVME-167 computer
(Motorola 68040 running OS9). The possible VME functions are documented
in a MACRO memo by our engineer, Bill Earle, _VMEbus Specification for
the MACRO Waveform Digitizer_. I have specifically looked for each of
the following features to work correctly.

32 bit addressing + 32 bit readout and 16 bit readout [TESTED]
24 bit addressing + 32 bit readout and 16 bit readout [TESTED]
8-bit readout * [UNTESTED]
unaligned readout * [UNTESTED]
block transfers ** [UNTESTED]
dip switch selection of the base address [TESTED]
control register bits:
0 zero supression on/off [TESTED]
1 interrupt enable ** [UNTESTED]
2 read+write mode enable [TESTED]
3 current address pointer readback [TESTED]
4 rollover word on/off [TESTED]
5 vme reset [TESTED]
6 stop time enable [TESTED]
7 threshold dac clock [TESTED]

* not obvious how to do this given h/w shortcuts used by MVME167,
also, not particularly essential to run the wfd.
** requires considerable effort; we are not likely to use this

As far as we know, the wfd complies with IEEE 1014-1987 and has no
special needs for readout with a conforming VME master.

(2) RAM. Besides the generally correct performance of the readout,
we measured with a scope the timing of the control pulses to the
latches and ram. I regularly run a program which loads the memory
with various hex patterns and then read it back, requiring that I
readout identical values.

(3) ASIC/FADC. Many of the ASIC functions are also tested when the
operations with the control register are executed. We have verified that
data is written in the correct address order by resetting the wfd
(pointing the address pointer at RAM bottom) and loading it with ASIC
data slowly, watching the RAM fill up correctly. I also run a program
which goes through the wfd memory banks and looks for certain features
such as rollover words and ascending time words.

(1+2+3) I have made many plots of real waveforms and compared them to a
fast oscilloscope, which tests the whole sequence. With zero suppression
off, we have run at 300 MHz and reconstructed a waveform correctly. We
have checked that the delta-t measurement for pulses sent to two
different channels is correct. There still should to be more complete
tests of the system, and that is ongoing.

-----------------------------------------------------------------------------

ASIC ROLLOVER PROBLEM:

During our tests we sporadically observed invalid data in the RAM. The
problem manifested itself as duplicated data apparently associated with
the rollover digitization. The problem seemed to be prevalent when we
ran at clock speeds greater than 200 MHz or when for some reason the
clock we provided was of low quality. (In fact, we learned a lot about
tuning up the clock signal in this process). We learned that we could
repeatably induce and study this in the following way: use a pulse
generated signal with a short duration (60 ns) but a long period (2 ms).
The 200 MHz 8-bit time-word rolls over every 327.680 us. Therefore, in
the 2 ms interval between pulses, we should see 6 rollover
digitizations. I include below two sets of data from the digitizer, one
with incorrect data taken at 215 MHz, and the same data that I corrected
by hand.

Digitizer records data from bottom to top

CORRECT DATA PATTERN INCORRECT DATA PATTERN
=============================== ===============================
time discr adc samples time discr adc samples
----- ------- --------------- ----- ------- ---------------
65535 0 0 0 0 20 21 20 20 55336 0 0 0 0 46 54 79 100
55336 0 0 0 0 46 54 79 100 55336 0 0 0 0 46 54 79 100
55332 1 1 1 1 127 142 144 143 55332 1 1 1 1 127 142 144 143
55328 1 1 1 1 143 144 143 143 55328 1 1 1 1 143 144 143 143
55324 1 1 1 1 144 144 143 142 55324 1 1 1 1 144 144 143 142
55320 1 1 0 0 151 100 23 22 55320 1 1 0 0 151 100 23 22
65535 0 0 0 0 21 20 21 20 14585 0 0 0 1 48 63 89 114
65535 0 0 0 0 19 20 21 20 14585 0 0 0 1 48 63 89 114
65535 0 0 0 0 20 20 20 20 14585 0 0 0 1 48 63 89 114
65535 0 0 0 0 21 20 20 19 14585 0 0 0 1 48 63 89 114
65535 0 0 0 0 20 21 21 20 14585 0 0 0 1 48 63 89 114
65535 0 0 0 0 20 20 19 20 14585 0 0 0 1 48 63 89 114
14585 0 0 0 1 48 63 89 114 14585 0 0 0 1 48 63 89 114
14581 1 1 1 1 135 143 143 143 14581 1 1 1 1 135 143 143 143
14577 1 1 1 1 143 143 143 144 14577 1 1 1 1 143 143 143 144
14573 1 1 1 1 144 143 143 143 14573 1 1 1 1 144 143 143 143
14569 1 1 0 0 149 40 20 20 14569 1 1 0 0 149 40 20 20
65535 0 0 0 0 20 21 20 21 39367 0 0 0 0 51 59 80 104
65535 0 0 0 0 20 19 20 20 39367 0 0 0 0 51 59 80 104

The correct data has several features. Note the two sections with
non-zero discriminator data: this is the pulser connected to input 1.
The leading edge is separated by 55320 + 5*65535 + (65535-14569) time
units (433961) which is approximately 2 ms at 215 MSPS. The rollover
time-words have the counter full-scale value of 65535. During rollover,
the discriminators are off and the adc is at pedestal. The most
essential feature is the rollover time word. We need to identify
rollovers to count intervals in excess of 327 us. We do not really
need the associated discriminator or adc data (although it is a ready
supply of pedestals).

The incorrect data has the notable feature that between digitized
input pulses (recorded with no error), the apparent location of the
rollover word is filled with a copy of the last data from the trailing
edge of the pulse.

We determined that the duplicate data was coming directly out of the
ASIC, and was not a timing problem in writing to the external latches or
RAM. Of course, a problem internal to the ASIC is much harder to
correct. As mentioned above, we learned how to generate an excellent
clock signal; once the uncertainty over clock quality was eliminated, we
were able to determine that the effect had an onset as a function of
frequency (and not, for example, as function of duty cycle).

As we raised the frequency, various parts of the data would begin to be
duplicated. The first effects are seen at 200-203 MHz, where the
discriminator bytes are affected first. At these speeds, the
time-word usually retains its correct value of 65535. The time-word
typically shows problems at 206-210 MHz. We have eight channels at
BU currently (8 ASICs), and I tested them all for the onset of the
problem. Tabulated below are the clock frequencies at which I saw _no_
defective timewords in 10 repetitions of my test procedure:

199 MHz, 201 MHz, 206 MHz, 207 MHz, 208 MHz, 209 MHz, 210 MHz, 210 MHz

After getting a lead on the culprit, we reviewed the relevant aspects of
the ASIC design and identified the likely cause. With zero suppression
on, there are two ways to start a digitization. The first is with the OR
of the four discriminator bits that come from the front-end amplifier
card. The second is when the internal timer rolls-over. This is
implemented by recognizing the full-scale counter value and feeding that
around to the same OR-gate that monitors the discriminator bits. (All
of this is internal to the ASIC). *** Unfortunately, the timing for this
signal is borderline by the time it gets back around to the critical
internal shift register for the digitizer data. The timing for
discriminator initiated digitization is fine and this data seems
fine up to 300 MHz. ***

-----------------------------------------------------------------------------

ASIC ROLLOVER SOLUTION(S):

The main impact of the problem is in choosing an algorithm for
intelligent readout of the wfd. My working plan has been to read through
the data, checking if the timeword is a rollover and counting rollovers
to readout the proscribed amount of data. This can be quite fast; with
my computer the transfer rate is in excess of 4 MBytes/s.

In MACRO data, we should always have high enough occupancy that every
327 usec rollover interval has some data in it. In this case, one can
also tell that the timer has rolled over by taking the difference
between sequential timewords- if it is negative, there was a rollover in
between. I am currently working on readout speed tests using this
method. Note that we may have wanted to readout this way anyway, since
327 us is rather large for a time-windowed readout of 1 ms maximum.

We are also considering a scheme where we generate ensure a digitization
between every rollover, using an external signal. externally. The ASIC
has an control input that can be used to generate data cycles (I have
also tested this) and we are considering scaling the clock and using
that to force at least one digitization every rollover period. But the
value of the timeword will be generally indeterminate. This data-cycle
control is available as an external input to the board; the scaled clock
will come from the clock-fanout module if we choose to implement this.

Furthermore, since most of our sample of eight ASIC's function properly,
it is conceivable that we can build boards that will actually operate as
desired at 200 MHz. This will depend on two things: 1) whether the
majority of the ASIC's continue to holdup at 200 MHz under further tests
and 2) whether the rate at which ASIC's that exhibit this problem at 200
MHz (currently 1/8) is small enough to consider a test-and-replace
scheme.

Finally, if this effect becomes too much of a problem, we can run
at a slower clock speed than 200 MHz.

While reviewing the ASIC problem, we also came up with several simple
design fixes. These could either be implemented in a new mask (new
chips) or by chip brain-surgery. The simplest fix requires one severed
connection and one jumper. The cost of all new ASIC's is much too high:
$12,000 nre and $95/part. We are investigating a chip surgery outfit:
ACCUREL, which claims they can do this fix. They will do a couple of
chips for free, which is a good opportunity for us to completely verify
that we understand the problem. They estimate $50 per part. This is also
too expensive- but it may come down. Q: does anyone have any experience
with ACCUREL, or know of other outfits such as this that we can contact?

*** In summary: there is an annoying but non-fatal flaw in the ASIC
involving the rollover word. This flaw usually appears above 200 MHz,
but in some chips appears at or below 200 MHz. The non-rollover
digitization is unaffected and works at 300 MHz. Although this flaw is
unsettling, we have a spectrum of workarounds that require no
modification to the wfd motherboard. Therefore, we propose that the
motherboard digital design is tested and complete and we will proceed
from here, first by ordering some of the long lead-time chips. In
parallel, we will continue to work with and test the boards, staying
alert for new problems. There is still much work to do with the
daughtercards and system integration. But we are confident that those
issues we are still struggling with will be solved without modifying the
motherboard design. ***