EDOC061
MIDAS Spectrum Access Server - Spectrum Format
Edition 2.3
June 1998
Nuclear Physics Support Group
Central Laboratory of the Research Councils
Daresbury Laboratory
Introduction
Previous discussion documents have proposed structures for a spectrum database and for data storage. It has also previously been suggested that the same format be used for on-line spectra, disc based spectra and for spectra archived on tape. The requirements for a spectrum format for both on-line and off-line use have been specified in the document EDOC060. This document describes a format which meets the requirements of EDOC060 and can be used for on-line spectra in the histogrammer or sorter and for off-line spectra held on disc or tape. Requirements for implementation are defined.
The document EDOC060 should be read for formal specification of the requirements. The document EDOC062 should be read for details of program access to the spectrum data.
Document History
Version 1.0 Dec 1990 - Initial version
Version 2.0 May 1991 - re-order fields in the header and reduce data arrays to just spectrum and an optional error spectrum. Unused pointers now contain -1 so that pointers may have the value 0.
Version 2.1 Sept 1991 - Removal of graph type arrays, change to data array descriptor, n dimensional arrays stored following C rather than Fortran ordering convention.
Version 2.2 Jan 1992 - Number of Information strings increased to 32.
Version 2.3 June 1998 - This version - document converted to html format
Terminology
Throughout this document several terms in common usage are used
which are defined here to avoid misinterpretation.
Spectrum - any collection of data items which includes
the conventional 1D histogram and 2D matrices in common use.
Channel - an individual data item in a spectrum.
On-line - spectra held in memory (normally VME global memory
or a processor memory) which is potentially being incremented
by either hardware or software. This includes spectra
being generated by a post experiment data replay.
Off-line - spectra held on secondary storage (normally disc)
which may be read or mapped into processor memory. The data
contents are not normally being changed.
General Format
Each spectrum consists of three parts; a header, an string space and
a counts space.
For on-line spectra the components may not be contiguous but
would be allocated within a large memory space. For off-line
spectra the components would normally be contiguous within a
space defined by the disc file holding the spectrum with the
header as the first component in the file.
The spectrum header is a fixed length 512 byte data structure which
is used to access all items in the string and counts spaces. The
associated string space and counts space are allocated in units
of 256 bytes.
The string space is used to hold character strings which contain
name, title, annotation and calibration information etc. Each
character string is allocated one (or more) units of
256 bytes from the string space. The
character string is represented using the standard XDR format
which is a 32 bit integer field containing the number of
significant characters which follow. The characters are padded
to fill the allocated space with trailing null characters.
Note: The minimum space allocation allows for strings of upto
252 characters in length but any required length is possible
by allocating the necessary number of contiguous 256 byte
fragments.
The address of each character string expressed as an offset into
the string space is held as a pointer in the spectrum header. Unused
pointers which do not have string space allocated are set to -1.
The counts space is used to hold data arrays which contain
the spectrum counts and is used and referenced in the same
way as the string space.
Header Format
Offset 0 - magic number. A 32 bit integer field set to the
value 412900921 (decimal). The magic number
can be used as a test that a disc file probably does contain
a spectrum. |
Data Array Descriptor
Offset 0 - array layout. A 32 bit integer field which is used
to define the format of channel items within a data array (see below).
This field will be -1 for an unused array descriptor.
|
The array layout defines the format of each channel item and the ordering of channel items within the data array.
array layout |
For a spectrum having dimension n
histogram/matrix - each channel item consists only of a
data item ordered following the C language convention for a n
dimensional array.
histogram/half matrix only stores the upper diagonal of the array.
The array type defines the format of the data item component of each channel item within the data array.
array type |
Signed integers are held in 2s complement format and floating point numbers in IEEE format.
Information Strings
Up to 32 information strings are available per spectrum.
String 1 is allocated for spectrum title information.
String 2 is allocated for information about the
experiment which created the spectrum - for example
experiment name, beam energy, beam ion species and target species etc.
String 3 is allocated for information about the
run within the experiment - for example
run name, run number etc.
Strings 4 and 5 are available for a general comment indicating
the content of data arrays 1 and 2.
The remaining strings are available for general user comments
or additional information.
All information strings are free format text and may contain
any printable characters including LF (UNIX end of line).
Annotation Strings
One string per dimension is available to allow for a description of the "significance of the dimension". For example it could contain the "units" (MeV, cm etc) of the calibration.
Calibration Strings
One string per dimension is available to hold calibration information for that dimension. The information is held free format in character representation. The first field in the string is the name of the calibration method and the remaining fields are parameters to that calibration routine.
Efficiency Strings
One string per dimension is available to hold efficiency information for that dimension. The information is held free format in character representation. The first field in the string is the name of the efficiency calculation algorithm and the remaining fields are parameters to that algorithm.
Data Interchange
This document describes a common format for spectra held in memory or on disc. On-line spectra will at the end of a run be "saved" and become off-line spectra held on disc. The disc files can be copied to any available magnetic tape medium in the same format for backup and archive purposes using standard system utilities (for UNIX one would use tar or dump). Spectra can be moved in this format between the UNIX workstations in the Eurogam colaboration using the utility tar or by a network connection.
For interchange of data with laboratories outside the Eurogam colaboration and for those running other data analysis systems which may wish to import the Eurogam data a more system independant format which can be used for writing spectra onto tape is desired. This format can trade efficiency of access and compactness of the data for maximum easy of interchange. A suitable format for this purpose is described in the document EDOC080.
Implementation
Suggestions as to methods of implementation to be provided.