\magnification=\magstep1
{

\hoffset=1truein
\hsize=5.25truein
\vsize=10.25truein
\font\small=cmssbx10 at 14.4truept
\font\medium=cmssbx10 at 17.28truept
\font\large=cmssbx10 at 20.74truept
\nopagenumbers
\hrule height 0pt
\parindent=0pt
%\parskip=0pt
\hskip 3.9truein
\large
EDOC014\par
\vskip .5truein
\large
EUROGAM PROJECT\par
\vskip 1.5truein
\hrule height 2pt
\vskip 20pt
\large
NSF DATA ACQUISITION SYSTEM\par
\vskip .5truein
Event Builder and Sorter Control\par
\vskip 20pt
\hrule height 2pt
\vskip 1truein
\medium
Edition 1.0\par
\vskip 5pt
June 1990\par
\vfill
\medium
Nuclear Structure Software Systems Group\par
\vskip 5pt
Department of Physics\par
\vskip 5pt
University of Liverpool\par
\vskip .5truein

\eject
}
\pageno=1
\parskip 10pt plus 1pt
\parindent 0pt
\font\bigbf = cmbx10  scaled \magstep2
\font\bignf = cmr7  scaled \magstep3
\baselineskip=1.25\baselineskip
 
The aim of this document is to describe the software design of the
Event Builder and Sorter subsystems.
The two subsystems are discussed together as they have
similar requirements on an abstract level, 
and also some common components.
A common design strategy should result in fewer distinct software
components, more uniformity across the system as a whole
and hence result in easier maintainability.
The common components, at least initially,
will be the crate controllers (MVME147),
the processing cpus (MVME165)
and the interface controllers connecting the two subsystems.
 
The primary design goal for each subsystem is to provide 
a structure that is efficient in terms of data flow,
modular (allowing component change with the minimum of fuss)
and capable of failing gracefully.
The last criterion allows, for example, one particular processor
to crash and be restarted by the crate controller
without affecting the others.
 
The requirement for submodule change is taken care of
in several ways.
Symmetry between the input and output interfaces
enables the same code to function in the output interface of
the Event Builder and the input interface of the Sorter.
The form of the I/O interfaces needs to be defined in terms
of the minimum requirements to be supported
(e.g. each interface has its own controller cpu with the
correct connections to reduce to one the number of data passes
over VME or VSB bus).
Code specific to a particular interface should only be
coded in the controller cpu associated with that interface.
 
{\bf Crate-wide Controls}
 
The overall control of crate activities will be the responsibility
of the crate controller cpu.
This cpu will run (amongst other tasks) a crate controller
application control process (CCACP) 
to coordinate the activities of the interfaces and processors.
This process has several responsibilities.
It has to ensure that all components
are activated in a safe manner on reset.
It will provide a watchdog function to make sure all 
components are active.
It will pass commands to the components as necessary.
Thus the CCACP will probably have several subtasks
to ease the burden of coping with specific components.
 
The CCACP will auto-configure for the set of boards in the crate.
The current offline sort engine already uses this scheme
for both processor and memory boards.
During initialisation the CCACP will find all the components
present at known VME addresses, and take that as the correct
working set. 
This information will be visible for users to inspect.
If a board failed and no spare was immediately
available, then the system would continue to function
without software change, with reduced capabilities.
 
All components will be regularly checked by a watchdog system
in the CCACP
to try and diagnose failures.
This will be mainly for the processors.
The CC will keep a local copy of the downloaded code and data
to enable failed cpus to be restarted.
This will also allow power-up reset to result in a working crate.
 
During reset initialisation
the CCACP will check that all components are active.
Then relevant information will be passed to defined data regions
in each processor.
This information will enable the various processors in the crate
to communicate as necessary.
Once all the components in a crate have been initialised
then messages will be sent to activate each component.
 
{\bf Intra-crate communications}
 
We require efficient message passing communication
between all cpus within a crate.
Ideally it would be good if the same method applied to all cpus.
However, this is not very easy as there is no VME standard
mechanism for message passing.
We have to invent one (or two!).
 
The whole system has to be studied as to message passing requirements.
For example, can we assume that there will be only one message
outstanding between any two processors.
In the case of the input interface controller and a processing
cpu, will there only be one data block message and no control
messages simultaneously ?
Do we need a priority system for the CCACP to interrupt
data transfer activity ?
 
A further investigation revolves around whether interrupts
are necessary to inform the recipient of a message arrival.
The usual form this would take would be a mailbox interrupt.
However, the cpu boards under consideration do not support
a "true" mailbox interrupt (i.e. writing data to a location
in dual-ported memory).
A more generally available capability involves writing to a
globally visible control register on the remote board in 
order to generate an interrupt.
In this case the data has to be written separately to agreed
locations in dual-ported memory.
It should be noted that boards that support the "true" mailbox
interrupt can also be used in this manner.
 
{\bf I/O interfaces}
 
Each interface subsystem will be under the control of one cpu only.
The input interface to the Event Builder will be the HSM8170 
controlled by a FIC8230.
The output interface of the Event Builder and the input
interface of the Sorter will be fibre optic in nature, the particular
boards as yet undecided.
The output interface of the Sorter is the connection to the
tape server.
At Daresbury this will be a
parallel interface to the GEC, the details of which are not yet
fixed.
 
All of the interface controller cpus will be coded in the same
style.
The code will divide into three sections.
There will be a cpu board specific initialisation section.
There will be an interface specific control section.
Finally, there will be a section that communicates with the
CCACP for control and with the processing cpus for data 
transfers.
 
The design of the Event Builder and the VME Sorter
necessitate that all interfaces except the output interface of the
Event Builder be connected to VSB.
The input interface to the Event Builder requires data transfer
over VSBbus to optimise transfer rates.
Both interfaces in the Sorter require transfers over VSBbus
to free VMEbus for spectrum access.
 
Whilst VSB will be necessary for data transfer reasons,
it is expected that all control communication will be via VMEbus.
 
{\bf Input Interface - Processor transfers}
 
The FIC8230 (a contender for both input interfaces)
will transfer data blocks to the processors by DMA.
Making the interface controller as opposed to the processors
responsible for the data transfers
will be the most efficient way.
This leaves the processors free to do only what they are supposed
to do - process data.
 
Therefore, to be efficient, the interface controller must be
allowed to transfer a data block to a processor
whilst it is already processing a previous block.
This implies that each processor must allocate at least
two input data buffers.
In practice, two data buffers will be sufficient.
 
Upon activation by the CCACP, each processor will send a message
to the input interface controller informing it of
the address of a buffer in its own dual-ported memory space.
The interface controller will process these messages 
in rotation.
Upon completion of a transfer it will send a message back to
the processing cpu informing it of the availability
of the data block.
Thence, whenever a processor has completed processing of a
data block, a message will be sent to the input interface controller
with the address of the free buffer.
 
{\bf Processor - Output Interface transfers}
 
In the same way as for the input,
the output interface controllers will be responsible
for transferring data blocks from the processors.
This function has to be performed as efficiently as reasonably
possible due to the high data rates involved, and so some form
of DMA transfer will be useful.
 
Whenever one of the processors output buffers becomes full
it will send a message to the output interface controller
asking for the block to be transferred.
Therefore a two buffer system will be needed as for input.
The processor will switch to using its other buffer
immediately (as long as it is free) and continue processing.
 
{\bf Message passing scheme}
 
The aim is to provide a reasonably universal and simple scheme.
Each cpu will have a block of memory at a known fixed address
known as a message buffer
for each message type it is involved with.
At initialisation, the CCACP will allocate these blocks as required.
The message types will be control messages and data transfer messages.
This differentiation is purely because data transfers occur
between interfaces and processors, and that control messages
pass between these and the CCACP.
 
A message passing will consist of the following sequence of events.
The message sender will write to the corresponding message buffer
in the recipient cpus dual-ported memory space.
The write will consist of a flag word (16bits) and a data word
(32bits).
This will then be followed by a write to an address that will
cause a local interrupt to the recipient cpu.
The local interrupt will be either a mailbox, signal,
fifo or abort interrupt depending on cpu board type.
The interrupted cpu will then execute the message request
and reply with an acknowledgement message to the sender.
The reply will consist of flag and data words in the same way.
followed by writing to a suitable interrupt address.
The receiver of the message will have a status word (16bits)
which will be used to provide an indication of the state of
execution of the request if necessary.
 
The control messages will only pass between the CCACP and
the other cpus in the crate.
There will not be any control messages passing between the other 
cpus.
Examples of control messages would be reset, enquire status,
halt processing, start processing ...
In the case of control messages the flag word will just be set to 
indicate the presence of a message.
The data word will be an operation code.
The recipient cpu will respond to the request and return
to its corresponding message buffer in the CCACP
an acknowledgement
in the data word.
 
Data transfer messages will be organised in the following way.
Each processor will have two data transfer buffers for input
and (at least) two data transfer buffers for output.
For input, a particular processor will know the address of its
two message buffers in the input interface controllers
dual-ported memory.
Similarly, the controller will know the address of the corresponding
message buffers in the processor in question.
The processor will write the address of one of its data buffers
into the data word of the message buffer.
The flag word will indicate message presence.
Upon starting to execute the queued transfer, the controller
will set its status word appropriately.
Upon finishing the transfer, a message will be sent to the
processor indicating transfer complete.
 
Interrupts to signal message arrival are not strictly necessary
for data transfers between the processors and interface controllers.
The above design is an unordered but controlled queue.
Whenever an interface controller is ready for a transfer
it will search its (small) message buffer list for an available
buffer.
Whenever a processor is ready for the next data block
its will inspect its message buffer.
This would work because the cpus have no other tasks to
accomplish and can spend time polling their local message buffer(s).
 
The CCACP is a more complicated communication problem.
It will form part of a multi-tasking system with several tasks to 
accomplish.
In this case, the control messages will require interrupt usage.
The CCACP will require immediate response to some control 
message types sent to the processors, and may not be prepared
to wait until the next end of block for the processor to
check its control message buffer.
 
Whether its sensible to use interrupts in one case and not in the
other is open to question.
 
{\bf Processor code organisation}
 
The Event Builder and Sorter processors
are very similar in terms of code organisation.
Both require data block input and output with a single
task processing the data.
External control is required for both and will be supplied
by the CCACP using the same commands.
There will be extra application specific commands which will
not apply to both system, but there is a basic common set
to allow code downloading etc...
 
Since the same cpu board type (MVME165) will be used in both
systems (at least initially)
it seems sensible to provide a single solution
for both systems.
The code existing on each of the processors will consist
of two parts.
There will be a small bootstrap section in PROM.
This will contain cpu board specific initialisation code,
and a section to process commands from the CCACP.
It will provide a board independent platform to support
a single code module.
 
Each system will have code downloaded into the processors
under the control of the CCACP.
The code will be produced by different means, but
there is no reason to make the underlying "system" different.
The current offline sort engine has a minimal bootstrap code
in PROM.
The same idea will be used and enhanced for our purposes.
The code in PROM provides cpu board specific initialisations
and supports a small number of simple commands
(reset, go, halt, etc ... ).
 
A typical sequence of commands involves the CCACP sending
a halt command and then writing the new downloaded code
and data into position, followed by a go command.
The local cpu needs to know the start address of the downloaded code.
 
There will be one code and variable data module
downloaded to a defined address in 
dual-ported memory.
Other data transfers will be necessary in order to change
data used by the applications.
 
For the Sorter the basic structure already exists.
Some changes will be required to allow for extra online requirements.
These mainly relate to data layout to allow for
parameter changes "on the fly".
 
For the Event Builder the code will be generated on the UNIX
workstations using the GNU C compiler which has been
configured to function as a cross-compiler
running on the SPARCstation and producing 68k series code.
This has the advantage that the compiler can be ported
to other workstations easily in the future since the source
is available.
More importantly it enables Event Builder code to be tested
and debugged entirely on the UNIX workstation
using standard debugging tools.
Once the code is adequately tested it will be recompiled
and linked ready for downloading.
This technique of using a test harness separates the processing
code from the control and data transfer code.
The latter can be seen to be stable and relatively simple
consisting of only three or four C functions
and requires debugging only once.
The part of the system that requires more regular debugging
is separated out into a good test environment. 
Examples of these functions
would be "get\_data\_block", "send\_data\_block"
and "process\_command".
 
{\bf Event Builder code production}
 
The main requirement will be to process blocks of input data
as efficiently as possible, and to output re-constructed data
blocks. Since the code may be relatively complicated, it is
necessary to use a high level language.
The most appropriate language to use is C.
This is because it is the most widely supported language
on the type of computing platforms we will be using.
 
The requirements of code running on an Event Builder cpu are
quite limited.
There is no point in using general I/O in a 
multi-processor environment.
Many of the library functions commonly supplied with computing systems
would not be used.
The only appropriate I/O would be data block transfers and a few
control commands. If functions were written to cope with these
requirements, then the code could be debugged externally.
The functions would look something like ...
 
\line{}
\line{        input(bufptr); \hfill}
\line{        output(bufptr); \hfill}
\line{        send\_command(id); \hfill}
 
The current sort engine has a similar communication method.
A foundation layer of software exists in each cpu, and contains
the code to execute these functions.
Equivalent functions could be written on the SUN to allow testing
of the basic processing code. These functions would read data from
a file, write to a file and communicate commands with the screen.
 
To this end it is proposed to use the GNU C compiler to generate
the processing code. When optimised, the resultant code is
acknowledged to be very efficient.
The GNU C compiler is readily available to use on most UNIX machines.
It is a sophisticated product with several attractive features.
It has builtin the ability to use inline macro expansions.
We can define our own set of macros where necessary.
Assembler statements can be included easily and use C variables.
We have the source, so porting to other machines should be possible.
 
Work has already taken place using the SUN SPARCstation to
re-configure the GNU C compiler and linker to function as a
68k cross-compiler to use on a "raw" cpu board.
This consists of the basic C language together with
a limited set of functions that would be required in such an
application. The functions include the maths library, string
manipulations, ctype, malloc ...
 
The three interface functions described above could be provided
as inline expandable macros producing, for example ...
 
\line{} 
\line{         input: \hfill}
\line{ \qquad        jsr  [\$400] \hfill}
\line{ \qquad        move.l +(sp),bufptr   \hfill}
 
The foundation layer of code supporting this function would have
its entry address placed at a fixed address (e.g. \$400).
 
Alternatively the functions could be provided in GNU C in the form
of a library to be linked together with the users program.
This way reduces the size of the foundation code layer.
 
The equivalent SUN code would be a C function.
Instead of handling VME/VSB bus transfers, normal file I/O
would be used on the SUN for testing purposes. This allows
the standard SUN debugging tools to be used for program development.
 
This technique requires no operating system or kernel on the
processing cpus, only on the crate controller.
\end