waveform data distribution issues -- monopole analyses meeting

Erik Katsavounidis (Erik.Katsavounidis@lngs.infn.it)
Wed, 4 Feb 1998 20:09:01 +0100 (CET)

Dear MACRO colleagues,

By now, there's at least one person in each MACRO-US institution who's
interested in receiving what I call RARE DSTs and start a monopole analysis.
There are several questions that were raised over the last couple of months
regarding wider distribution of such data, WFD analyses coverage and
platform on which they can be performed. I'm posing a number of issues
to the entire "macro-general" rather than the "macro-rare" as it might be
of interest to the entire collaboration.

Overview
========

This note describes the status of WAVEFORM DIGITIZER (WFD) DATA distribution
and poses the question to the US Institutional representatives if they want
to spend the money for providing approx. ONE 10GB DLT PER MONTH of data
taking. This is in order to receive a reduced version of WFD data to their
institution. If the answer is yes, DLT tapes should be sent to Gran Sasso
BY THE NEXT MONTH if you want to have the tapes produced AUTOMATICALLY FOR
YOU NOW (Texas A&M has already provided DLTs for RARE DSTs for all MACRO's
running lifetime).

At the same time, I want to invite people to consider a MONOPOLE WORKING GROUP
MEETING for the weekend of 7/8 March 1998 AT GRAN SASSO in order to introduce
the DSTs to people intesting in carrying out analyses, explain the present
state of the art regarding platforms at which analysis software is already
available and define analyses goals.

People interested in receiving WFD data and/or participating in a monopole
analysis effort, PLEASE go through the following material and let me know
if you have any questions, comments or suggestions.

Status of the DATA collection and DLT production at Gran Sasso
==============================================================

In the last two and a half years of MACRO running (since Aug.95) the WFD data
present the vast majority of our total data volume: a typical 5.5hrs MACRO RUN
comes with 362 MB of *raw* data of which 285 MB are *only* WFDs, i.e., more
than 78%. The WFD data are the heart of any monopole and LIP scintillator
analysis. The RAW MACRO data are automatically processed OFFLINE in order to
produce a ZEBRA version of each file. The ZEBRA VERSION of MACRO DATAFILES IS
THE ONLY minimal-effort, 99.9%-automated, all-analyses-covering way to
distribute the FULL MACRO datastream right now. The ZEBRA version guarantees
compatibility of the data format with ANY computing platform (unix, vms,
osf etc). The only drawback when it comes to WAVEFORM ANALYSES (only) is that
ZEBRA FILES AS PRODUCED UNTIL NOW HAVE THE ID=1 and ID=2 WFD BUFFERS
SPLIT. The only method at this moment that can analyze the WFD data in this
format is C. Walter's Farfalla based on C++. As of now, I am not aware of any
FORTRAN-based method that performs WFD buffer recombination on MACRO ZEBRA
data.

The standard MACRO full datastream fills one 10GB DLT in 5 days right now.
There are more than 150 DLTs produced since Aug 95. Caltech is the only
US institution that received such DLTs until Dec 1996.
No data distribution scheme exists for US institutions right now.

A Reduced Dataset for WFD (Monopole) Analyses
=============================================

In September 1996, I've initiated and maintained so far a data reduction
mechanism aimed primarily to make monopole WFD-based analyses easier. That
defined the so-called RARE DSTs. What RARE DSTs offer:
-- coverage of all monopole beta regimes for the scintillator system,
availability of full streamer tube info for all accepted events,
limited acceptance/beta regime streamer tube based analysis capabilities.
-- calculations for detector acceptance, efficiencies and monitoring
performed on the FULL datastream.
-- 75-80% reduction in data volume after event selection based on (mostly)
scintillator face multiplicity (at least 2).
-- events written to disk in RAW format, (fortunately) NOT in ZEBRA, thus
allowing WFD buffer recombination in an easier way.

Addressing monopoles analyses starting with the RARE DSTs is subject only
to acceptance loss due to selection of events that involve more than one
scintillator faces. They do not suffer from any loss of efficiency or
system redundancy. At the low velocity regime, events are selected based
on the slow monopole trigger (SMT). At least two faces involved is required
for a slow monopole event to be accepted. Roughly 23% of the total SMTs
satisfy this criterion. For monopole velocities above few times 10^-3
the FMT and HIPT system is used for providing monopole coverage. All
such triggers are accepted. On top of that the ultra-high-energy trigger
events (LAMOSSKA, created to provide WFD insurance against heavy afterpulsers)
is saved. The streamer tube 8-plane coincidence monopole trigger that reads
the scintillator wfds is also accepted.

The RARE DSTs are produced automatically at the end of each RUN at Gran Sasso.
Their management is reasonably automated but there is space for improvement.
A typical 5.5hrs MACRO RUN results in approx. 85MB worth of a RARE DST. Four
to five such DSTs are produced daily.

How do we want the RARE DSTs to be distributed?
===============================================

As mentioned before, the RARE DSTs are produced in the RAW data format.
These files when on disk CAN be read by ANY OTHER machine if you FTP the
file prompty. This is clearly not a solution for everybody and for all the
data we've produced so far or will produce in the future.
The 16 RARE DSTs tapes I've produced so far are all in the RAW format,
i.e. they can not be read TRIVIALLY on a non-VAX platform.
In order to get the existing and future dsts in a MACHINE INDEPENDENT
format we need to convert them to ZEBRA. It is desirable at this stage
though that the WFD buffers belonging to the same event are all put together
and not remain split as with the standard RAW-->ZEBRA full MACRO RUN copy
jobs. Alec Habig and Ed Kearns back in March 1997 provided a piece of code
that takes cares of exactly this: it reorganizes the raw data into
ZEBRA structures and matches all WFD split IDs. This code was meant to be
included in the official MACRO data copying procedure; for one reason or
the other, this never became official. We plan to use this facility to
transform the RAW data to ZEBRA and put them in a machine independent way
to tape.

I'm planning to send a sample RARE DST tape prepared in a machine-independent
way to every US institution very soon (nexct week). The idea is to allow
people interested in monopole analyses to have a first look at what RARE DSTs
can offer them, verify they can access the data tape on their preferred
computer platform etc.

Immediately following this, we expect people to send DLT tapes to Gran Sasso
for automatic production of RARE DST copies in ZEBRA format of the past and
present data (Texas A&M doesn't need to send tapes).

What tapes shall I send to Gran Sasso and how?
==============================================

DLT tapes, generation III (10GB) is what I recommend. There are 4 DLT
drives on the Gran Sasso main cluster ALL of which can handle ONLY 10GB ones.
Generation IV (20GB) will be accepted, but IT WILL ADD a relative HEADACHE
to us now and to you later if you ever find yourself alone at Gran Sasso
trying to make a DLT copy. The reason why is that the ONLY 20GB DLT drive
available at Gran Saso is attached on the official MACRO data-copying CPU
(AXMAC1) where NOT all of us have access and it is 40% of the time busy
with the standard MACRO DLT production. From my DLT cost research in Italy,
it seems to be of equal value per GB both the 10GB and 20GB ones.
Notice that currently there is NO DLT drive on any of the unix cpus at Gran
Sasso (risc, osf).

There are 16 10GB RARE DSTs done until now. The last 8 of them reflect
the data taking period AFTER the WFD hardware fix (i.e. no WFD loss).
The RARE DSTs are currently produced at a rate of 1 per month. You should
allow 20% more DLT tapes in order to have a Gran Sasso copy always
available. You should also allow 1 DLT per MACRO's lifetime if you want
calibration data too.

Try to contact our February/March MACRO shiftworkers coming to Gran Sasso.
Don't simply put DLT tapes in the mail; it will cause us troubles receiving
them here.

Need for a Monopole Meeting?
============================

I believe there's much to share among old and new people in the hunt for
monopoles. Trying to resolve fundamental and practical questions on issues
like computing platforms for performing analyses, existing code, overlaps
and redundancy of analyses, avoid duplication of work etc, could easily be
achieved in a 1-2 day meeting at Gran Sasso. Video conference migh be an
alternative, but having not interacted for quite some time now, getting all
interested people at the same place without a time limit seems to be more
useful. The weekend of 7/8 March 1998 provides overlap in the shifts of
two of the interested parties: Chris Orth of BU and Brajesh Choudhary
of Caltech. This will minimize travel expenses for these. The meeting can
include Saturday in order to reduce airticket fare. Ideally, if we can have
at least one person from each institution interested in monopoles at Gran
Sasso for such a meeting, it will be important. Sophia K. and Lori Gray who
have been working on monopole-related calibrations will also be here. Apart
from the introductory issues I mentioned above, a number of action items
should also be addressed: finilizing calibration procedures and results
(low light and WFD saturation), finilizing RARE DSTs event selection (single
face monopole inclusion?-- important before spinning and producing tens of
tapes!), analysis of data from the WFD-problem period etc.

If you have any other ideas or suggestions regarding the possibility of such
a meeting (alternative days, places etc), please let me know.

--Erik