Release notes for DAQ/HLT-I S/W Release tdaq-02-00-01


General notes from s/w librarian

General info

This is a general DAQ/HLT-I release which is intended to be used for the ATLAS data taking starting from April 2009. It is compatible with LCG56 s/w and offline-15 releases branch. W.r.t. tdaq-02-00-00 it does not contain incompatible API changes, but rebuilding of all user code is required.

Supported platforms, compilers and compatibility

The production tags ofr this release are i686-slc4-gcc34-[opt,dbg], however for testing purposes new tags are available: i686-slc4-gcc43-opt and x86_64-slc4-gcc43-opt. Please note that the s/w is not validated for thess configurations.

i686 s/w runs on x86_84 architecture (provided that h/w is described in configuration DB with i686-slc4 tags). You can also run on SLC5 nodes, provided the compatibility gcc and stdc++ libraries are installed.

System and compiler
 CMTCONFIG
Compatibility list
i686 Linux 2.6.9 (SLC4), gcc-3.4.x i686-slc4-gcc34-opt
SLC4.x, RHEL4, SLC5 32/64bit
i686 Linux 2.6.9 (SLC4), gcc-3.4.x i686-slc4-gcc34-dbg - ~ -
i686 Linux 2.6.9 (SLC4), gcc-4.3.2 i686-slc4-gcc43-opt
x86_64 Linux 2.6.9 (SLC4), gcc-4.3.2 x86_64slc4-gcc43-opt SLC4.x, RHEL4, SLC5 64bit

External s/w and run-time environment tuning.

 tdaq-common-01-12-00
 dqm-common-00-08-02
 LCG 56
 Java Runtime Environment 1.6.0

Release distribution

This release is distributed in RPM format, along with all required dependencies.

Development environment

Versions and paths of used external s/w are defined in TDAQExternal package.

Tools needed for development:

The default compiler is gcc-3.4.x on SLC4, no need to install it additionally.
To set up gcc 4.3 compiler from afs:
> source /afs/cern.ch/sw/lcg/contrib/gcc/4.3.2/slc4_ia32_gcc43/setup.sh

CMT v1r20p20080222 (installed with RPM in <inst_root>/CMT/v1r20p20080222)
JDK 1.6.0 (installed with RPM in <inst_root>/sw/lcg/external/Java/JDK/1.6.0)

Important changes requiring user actions:

1) The OKS databases at P1 have to be imported to /oks/tdaq-02-00-01. This can be done by detector OKS DB experts using oks-import.sh script (recommended way) or by TDAQ experts on explicit detector request (CVS bulk import from /oks/tdaq-02-00-00).

Packages and tags used in the release

ac v4r4p6
AccessManager AccessManager-00-06-06
clips clips-06-24-00
cmdl cmdl-01-05-00
cmem_rcc v2r0p23
coca coca-01-09-06
config config-02-00-01
coral_auth coral_auth-01-10-01
dal dal-01-15-08
DAQPanel DAQPanel-07-00-06
DataflowPolicy v1r6p0
dbe dbe-01-01-00
dccommon v1r1p31
dcmessages v2r4p8
ddc ddc-05-05-05
DFConfiguration v9r4p14
DFDebug v2r0p19
DFExceptions v3r2p0
DFM v2r22p5
DFRelease DFnightly-00-01-01
DFSubSystemItem v6r5p2
DFTests v2r1p6
DFThreads v2r4p1
dnc dnc-01-00-12
dqm_config dqm_config-00-06-03
dqm_display dqm_display-00-01-09
dqmf dqmf-00-06-06
dvs dvs-00-34-03
dvs_gui dvs_gui-00-01-06
dvs_tests dvs_tests-00-01-01
dynlibs v1r3p2
ed ed-00-02-21
efd efd-01-16-08
efio v2r10p3
emon emon-00-04-05
errorRecovery v3r2p25
ErrorReporting v3r0p1
FarmTools FarmTools-02-01-05
file_sampler tdaq-02-00-01
gatherer v9r3p2
genconfig genconfig-03-00-01
gnam gnam-04-02-02
gnamDummyLib gnamDummyLib-01-03-01
histmon histmon-00-03-02
hltinterface v0r0p25
igui igui-01-01-56
Igui Igui-00-00-14
instrumentation v1r3p8
io_rcc v2r0p42
ipc ipc-04-12-00
is is-05-07-00
ispy ispy-00-00-29
Jers Jers-01-00-05
ktidbexplorer ktidbexplorer_tdaq-02-00-01_00
l2dummy v1r8p22
l2pu v1r19p46
l2rh v2r1p8
L2streamTest v1r2p1
l2sv v1r15p3
ls ls-01-02-12
mda mda-05-00-03
mda_browser mda_browser-00-00-07
MonaIsa tdaq-02-00-01-p1
mrs mrs-01-09-10
msg v1r3p2
msgconf v1r4p17
msginput v1r4p1
msgsctp v1r2p5
msgtcp v1r3p2
msgudp v1r2p3
NetPanel NetPanel-01-01-03
node2 node2-00-00-01
NSG NSG-02-00-02
oh oh-00-00-92
ohp ohp-03-04-02
ohpplugins ohpplugins-01-02-03
oks oks-05-00-06
oks2cool oks2cool_tdaq-02-00-01_00
oks2coral oks2coral-02-02-01
oksconfig oksconfig-02-03-04
OMD OMD-00-00-47
omni omni-04-13-01
omniPy omniPy-00-00-03
onasic onasic_tdaq-02-00-01_00
OnlinePolicy online-00-23-00
OnlineRecovery OnlineRecovery-02-03-04
OnlineRelease OnlineRelease-00-00-68
opmon opmon-00-00-10
owl owl-00-01-00
PackageID v1r0p21
PartitionMaker PartitionMaker-06-08-01
PmgGui PmgGui-00-00-14
ProcessManager ProcessManager-01-02-33
pt v4r6p3
ptdummy v3r13p1
PTIO PTIO-03-11-02
QTUtils QTUtils-00-00-02
queues v1r1p2
rcc_corbo v2r0p5
rcc_error v2r0p5
rcc_rodbusy v2r0p8
rcc_time_stamp v2r0p10
rcdal rcdal-00-03-00
RCDBitString v1r5p0
RCDExampleModules v2r3p5
RCDExampleTriggers v0r3p2
RCDJtagChain v1r3p0
RCDLtp v2r1p2
RCDLtpi v2r1p3
RCDLtpiModule v2r1p0
RCDLTPModule v2r1p0
RCDMenu v1r5p1
RCDModuleDesign v4r2p0
RCDTtc v2r1p0
RCDUtilities v1r5p0
RCDVme v2r1p2
RCInfo RCInfo-00-02-03
RCUtils RCUtils-01-04-10
rdb rdb-06-00-06
rdbconfig rdbconfig-01-08-01
rm rm-02-03-09
Rm-Gui Rm-Gui-01-00-04
rn rn-02-00-01
robin_ppc v0r0p98
RobinTestSuite v2r1p27
RODBusy v2r1p0
RODBusyModule v2r1p0
roib v2r8p4
ROSApplication v6r7p1
ROSBufferManagement v2r4p1
ROSCore v9r1p1
rose v1r19p14
ROSEventFragment v2r2p5
ROSEventInputManager v2r2p1
ROSfilar v2r0p35
ROSGetInput v2r0p4
ROSInterruptScheduler v1r1p0
ROSIO v7r4p3
ROSMemoryPool v2r2p2
ROSModules v2r8p9
ROSMonitor v1r1p8
ROSObjectAllocation v2r1p0
ROSRCDdrivers ROSRCDdrivers-00-00-45
ROSRobin v0r1p77
ROSslink v2r0p11
ROSsolar v2r0p50
ROSUtilities v2r5p0
RunController RunController-02-01-03
SFI v4r13p14
SFIOEmulators SFIOEmulators-00-08-05
SFO v2r19p7
siom v2r1p6
sysmon v2r3p2
sysmonapps v2r4p5
system system-00-00-13
TDAQExternal TDAQExternal-00-14-02
TDAQPolicy TDAQPolicy-00-07-15
thread_allocator v1r0p7
threads v1r1p13
tidb2 tidb2_tdaq-02-00-01_00
tmgr tmgr-01-07-02
training training-00-05-04
transport v1r1p13
TriggerDB TriggerDB-00-00-07
TriP TriP-00-00-44
ttcpr v2r2p0
TTCviModule v2r0p0
vme_rcc v2r0p50
wmi wmi-00-03-14
xmext xmext-01-02-09

Changes in packages (in ABC order)

dal |  DAQPanel |  dccommon |  dcmessages |  ddc |  DFM |  dnc |  dvs |  dvs_gui |  errorRecovery |  gatherer |  histmon |  igui |  Igui |  l2pu |  l2rh |  ls |  ohp |  ohpplugins |  oks |  OnlineRecovery |  owl |  PmgGui |  ProcessManager |  RCDLTPModule |  RCUtils |  robin_ppc |  RunController |  SFI |  SFO |  system |  training |  wmi | 

dal

Message Passing Node ID

Disabled Algorithm

1. Remove obsolete check_parents parameter
C++ and Java code have to use disabled() algorithm with the only parameter pointing to the partition object. Second parameter check_parents has to be removed (note, the second parameter was optional in C++). The disable status of parents is always taken inot account.

2. Implement why_disabled() algorithm in Java
As requested by https://savannah.cern.ch/bugs/?34962, the Component class implements why_disabled() algorithm to provide textual description of a reason to disable given resource or segment, e.g.:
See dal/jexamples/TestDisabled.java for example of usage.

3. Set used disabled and enabled components in Java
Like in C++, DAL implements two dal.Partition algorithms to set temporarily disabled or enabled components:

    /**
     *  The method temporarily marks given components as 'disabled'
     *  without modifying partition object in database.
     *  Above is overwritten by consequent set_disabled() call.
     *  It is automatically clean when database is updated or reload.
     *
     *  @param objs  vector of components to be temporarily disabled.
     */


  public void set_disabled(dal.Component objs[]) ...

    /**
     *  The method temporarily marks given components as 'enabled'
     *  without modifying partition object in database.
     *  Above is overwritten by consequent set_enabled() call.
     *  It is automatically clean when database is updated or reload.
     *
     *  @p
aram objs  vector of components to be temporarily enabled.
     */

  public void set_enabled(dal.Component objs[]) ...


See dal/jexamples/TestDisabled.java for example of usage.

4. Multiple configuration objects and user enabling/disabling in Java code
Since by efficiency reasons the dal.Component.disabled() algorithm uses singleton map to keep information about disabled status of components, a user disabling/enabling changes status of components application wide. Also, there is a problem, when Java application creates more than one configuration objects and call disabled() algorithm on them (e.g. disabling status should be different, when IGUI works with RDB and RDB_RW servers and changes disabling on RDB_RW) or uses different partition objects.

The problem can be solved using an object of dal.Resources class used to implement above mentioned singleton map. Each object of dal.Resources class can be used separately with configuration object, partition and user disabled/enabled components. In such case instead of using DAL algorithms of of Component and Partition class, use appropriate methods of dal.Resources class:

DAL algorithm
Appropriate method of dal.Resources class
dal.Partition.set_disabled() public void set_disabled(dal.Component objs[])
dal.Partition.set_enabled() public void set_enabled(dal.Component objs[])
dal.Componet.disabled() public boolean get_disabled(dal.Component obj, dal.Partition p, config.Configuration db) throws config.SystemException, config.NotFoundException
dal.Componet.why_disabled() public String why_disabled(dal.Component c, config.Configuration db, String prefix, boolean recursive) throws config.SystemException

For more info see file dal/jsrc/dal/Resources.java

Example of simultaneous usage of default and user-specific Resources object:
config.Configuration db = new config.Configuration("rdbconfig:RDB");
dal.Partition p = dal.Algorithms.get_partition(db, "be_test");

config.Configuration db2 = new config.Configuration("rdbconfig:RDB_RW");
dal.Partition p2 = dal.Algorithms.get_partition(db2, "be_test");

dal.Resources resources = new dal.Resources(); // will use with
db2 scope

  // disable in db2 scope resources "r1" and "r2"

dal.Component[] disabled_objs = new dal.Component[2];
disabled_objs[0] = dal.Component_Helper.get(db2,"r1");

disabled_objs[1] = dal.Component_Helper.get(db2,"r2");
resources.set_disabled(disabled_objs);

  // check status of "r1" in db and report why if disabled:
dal.Component r1 = dal.Component_Helper.get(db,"r1");
if(r1
.disabled(p)) {
 
System.out.println("object r1 in db1 is disabled because " + r1.why_disabled("  ", true));
}

  // check status of "r1" in db2 and report why if disabled:
dal.Component r1_2 = dal.Component_Helper.get(db2,"r1");
if(resources.get_disabled(r1_2, p2, db2)) {
 
System.out.println("object r1 in db2 is disabled because " + resources.why_disabled(r1_2, db2, "  ", true));

DAQPanel

The DAQPanel is part of the TDAQ release since tdaq-02-00-00. A detailed descripion of changes is available in the ChangeLog file available in the package root directory.

To start the DAQPanel just execute the daqPanel binary; the following command line options are available:
To have default values for the setup script, database name and partition name when the Get Default button is pushed the following environment variables have to be defined: TDAQ_SETUP_SCRIPT (the setup script full path), TDAQ_DB_DATA_DEFAULT (the database file full path), TDAQ_PARTITION_DEFAULT (the partition name).

Some applications will not be started because only available (or using customized scripts) at P1:
Changes with respect to release tdaq-02-00-01:
         



dccommon

General changes

Package Description

This package provide the classes

dcmessages

Changes since tdaq-02-00-00

The receive method
size_t receive(MessagePassing::Buffer*& buf, void*& addr)
was updated to guarantee that after being called the parameter buf points to a DC buffer with a single page, i.e., buf can be updated point to a new buffer.

ddc

Introduction

The complete documentation for the DAQ - DCS Communication package may be found at CERN/EDMS as https://edms.cern.ch/document/684955/5.5

General changes

The DIM library version 18 release 4 installed, where a series of DIM problems is fixed.

Changes in the Command Transfer

NT Command Server: Since this release the IS server DDC is used for the NT-commands. This server is included by default into the infrastructure of any TDAQ partition.
API changes:
The controller name, which a command addresses to, is introduced as a parameter std::string ctrlName into all public methods of the DdcCommander interface. This is strongly caused by the requirement that a controller should not accept the NT-commands addressed to other controllers.

    // For a command defined in the configuration
    bool ddcSendDaqCommand(Configuration* confDb, std::string ctrlName, std::string daqCommandName);
   
    // For "direct" command defined by known PVSS DIM-RPC-service
    bool ddcSendCommand(std::string ctrlName, std::string commandName, std::string commParameters,
                        unsigned int timeout, bool BM = false);

    // For "direct" command with receiving response (high level interface)
    int ddcExecCommand(std::string ctrlName, std::string command, std::string params,
                       unsigned int timeout, bool queued = false);
   
    // To remove last command from the RunCtrl IS server
    bool ddcRemoveCommand(std::string ctrlName, std::string cmdName);
  
    // To build the IS entry name for the command response
    std::string    makeResponseName(std::string ctrlName,std::string commandName);
    std::string    makeResponseName(Configuration* confDb, std::string ctrlName, std::string commandName);

Internal Improvements:
1. Fixed a bug in sending parllel non-transition commands.
2. Cleaning of "hanging" NT-commands moved from the FINAL transition to the start up of the DDC controller

No functional changes in the Data and Message Transfer applications

Fixed an internal synchronasation bug of dynamic subscribe/unsubscribe PVSS data by a TDAQ application



DFM

Introduction

The Data Flow Manager (DFM) is the manager of the Event Building part of the DAQ; responsible for assigning L2-accepted events to Event Builder applications (SFIs), and for sending clear messages to the Readout System for all events which have either failed L2, or which have been accpeted by L2 and been built.

Changes since tdaq-02-00-00



dnc


dvs

Changes since tdaq-02-00-00

Fixes:


dvs_gui

Package to implement display for dvs core



errorRecovery

General changes




gatherer

Merged the gatherer and MonGatherer packages.

Removed IS_OH from the schema. Named the binary Gatherer instead of GathererApplication.

Parallelize the gatherer.

Introduced message forwarding.



histmon

General changes

Histograms published with history by default. The history tag is derived from current time & update interval. If original histograms were designed to be LBN aware then LBN is used as a tag.


igui

Introduction

The IGUI (Integrated Graphical User Interface) web page is at:
   http://atlas-onlsw.web.cern.ch/Atlas-onlsw/components/igui/welcome.html

General changes  


Bug fixes

Igui

This is the new IGUI implementation. The current version does not implement the full igui functionalities yet, but it can be used to fully control a partition.

It includes:
Database committing and reloading is handled centrally (via the Commit & Reload button placed in the tool-bar at the top of the main frame) and all the panels do not need anymore to implement such an action using custom buttons.

The IguiPanel interface has been modified to meet the changes in the Igui core design and user panel developers can find any needed information in the javadoc documentation:

$TDAQ_INST_PATH/share/doc/Igui/javadoc/Igui/index.html.

To start the new Igui the following script can be used:
Igui_start -p <partition> -d <database> <vm properties>

Now the IGUI can be used for partition starting/stopping procedure as well: just start the setup_daq script with the -newgui switch.



l2pu

General changes

Internal improvements and bug fixes only.
No changes from user point of view.

l2rh

General changes



ls

Introduction

This package substitues the obsolete logService. A new requirement whereby database technologies other than ORACLE had to be dropped came in around spiring 2007. This new requirement meant the re-writing of the logService package, which was higly dependent on MySQL. This opportunity was taken to refactorize the code, especially the log manager, which was never very user friendly. The database access in C++ is done using the CORAL interface, which hides the underlying technology. For the log manager, JAVA was the language chosen, since it brings in the flexibility requiered to make this tool more intuitive. The resulting java application can be run from the console, or remotely using the Java Web Start technology.

Known issues/bugs

In the Log Manager, when righ-clicking on the Messages Table (right hand side panel) or Partition Table (left hand side panel) a Refresh button pops up. This is used to update the current view of partitions and messages, respectively. Refreshing these objects only works if one right-clicks the mouse without moving it; that is, press and release without changing the X and Y pointer of the mouse. This may be a problem with Java. In the meantime, one can also perform a Refresh via the options in the top menu File: 'Refresh tree' and 'Refresh table'.

To be implemented

Add an option to display statistics, internal and from IS.

Changes from previous release

None

Example applications

None exist at the moment.

Applications

Log Manager

Usage: log_manager 

Instruction on how to use this tool are on the Help menu in the application itself. The Log Manager can also be run using Java Web Start technology from the following link: https://pcatdwww.cern.ch/jnlp/logmanager/logmanager.jnlp A Java application is downloaded and started. You MUST have at least Java 1.6 installed on your local machine.

lsReceiver

Description: This application subscribes to the MRS service to receive and log on a database messages produced by TDAQ applications.
Usage: lsReceiver [-p partition-name] [-u user-name] [-n IS-server-name] [-s threshold-size] -c connect-string [-S subscribe-expression]
Options/Arguments:
        -p partitionName     Partition name
        -u userName          User name
        -n ISserverName      Name of the Information Service to publish the message rate into.
        -c connectionString  Database connection string. 

Test units

logTest

Description: Test binary for the Log Receiver application.
Usage: logTest -c connect-string [-p partition-name] [-l complexity-level]
Options/Arguments:
        -p partitionName      Partition name
        -l level           Level of Complexity of the test [1: open/close - 2: tests the Log Service Infrastructure].
        -c connectionString  Database connection string. 

Utilities

logSelect

Description: Application to retrieve log messages for a given partition according to the search criteria specified. By default, messages are dumped on std::cout.
Usage: logSelect -c connect-string -p partition-name [-i message-name] [-m machine-name] 
                 [-a application-name] [-l time-low] [-u time-up] [-s severity]  
                 [-x text] [-r parameters] [-d order-list] [-e max-rows] [-f offset-row]
Options/Arguments:
        -c connectionString   Database connection string. 
        -p partitionName      Partition name.
        -u userName           User name.
        -n run-number         Run Number.
        -i message-name       Message name or ID.
        -m machine-name       Machine name where the message was issued.
        -a application-name   Application name where the message was issued.
        -L time-low           Lower time threshold (in UTC time).
        -U time-up            Upper time threshold (in UTC time).
        -s severity           Message severity:
                                 0 - SUCCESS
                                 1 - INFORMATION
                                 2 - DEBUG
                                 3 - WARNING
                                 4 - ERROR
                                 5 - FATAL
        -x text               Text in the message body.
        -r parameters         Message parameters.
        -d order-list         Parameter to sort the messages by (MSG_ID, MACHINE_NAME, APPLICATION_NAME, ISSUED_WHEN, SEVERITY, MSG_TEXT, PARAMETERS, RUN_NUMBER).
        -e max-rows           Maximum number of rows to retrieve from the database; 100 by default. If 0, all entries are retrieved.
        -f offset-row         Offset in the table to retrieve the messages from.

logDelete

Description: Application to remove log messages for a given partition according to the search criteria specified.
Usage: logDelete -c connect-string -p partition-name [-i message-name] [-m machine-name] 
                 [-a application-name] [-l time-low] [-u time-up] [-s severity]  
                 [-x text] [-r parameters]
Options/Arguments:
        -c connectionString   Database connection string. 
        -p partitionName      Partition name
        -u userName           User name.
        -n run-number         Run Number.
        -i message-name       Message name or ID.
        -m machine-name       Machine name where the message was issued.
        -a application-name   Application name where the message was issued.
        -L time-low           Lower time threshold (in UTC time).
        -U time-up            Upper time threshold (in UTC time).
        -s severity           Message severity:
                                 0 - SUCCESS
                                 1 - INFORMATION
                                 2 - DEBUG
                                 3 - WARNING
                                 4 - ERROR
                                 5 - FATAL
        -x text               Text in the message body.
        -r parameters         Message parameters.

logGetPartitionNames

Description: Application to retrieve the list of partition names.
Usage: logGetPartitionNames -c connectionString 
Options/Arguments:
        -c connectionString  Database connection string. 

ohp

General changes

For more details, please refer to the "README" file (in the ohp installation directory). New in this version of OHP is the possibility to add a documentation string to plugins through configuration file. For example:
< plugin ..... >
< doc >"Your text explaining what the plugin does" < /doc >
< /plugin >
Since tdaq-01-09-01 ohp supports retreiving histograms from multile servers. To use this feature the configuration files need to be slightly modified: the general block should change from:
< general >
< partition name="be_test" />
< server name="Histogramming" />
< subscription name="Provider/.*" />
</general>

to:
< general >
< partition name="be_test" />
< subscription server="Histogramming" provider="Provider" histogram=".*" />
</general>

In the configuration file every istance (in tabs or options) of the file names have to be changed to include the OH server name as in the following examples:
< display >
< tab name="Tile" >
< histogram name="TileProv/Tile/Drawer1/h1"/>
< /tab>
< /display >
< globaloptions >
< histogram name="TileProv/Tile/Drawer1/h1" >
< DRAWOPT=LEGO/>
< /hisgogram >
< /globaloptions >
has to be changed in:
< display >
< tab name="Tile" >
< histogram name="Histogramming/TileProv/Tile/Drawer1/h1"/>
< /tab>
< /display >
< globaloptions >
< histogram name="Histogramming/TileProv/Tile/Drawer1/h1" >
< DRAWOPT=LEGO/>
< /hisgogram >
< /globaloptions >
In share directory a commented example (example.conf.xml) is provided with instructions on how to write a configuration file for OHP.
Configuration of OHP through old ASCII format is deprecated, ohp will still work with ASCII but new features are compatible only with XML.
OHP has a plug-ins system. Users can develop their own GUIs to extend or modify OHP functionalities. The standard GUI has been migrated to a plug-in, even if nothing changes in the aspect/usage. You can refer to ohpplugin package as an example of developing plugins. More information on plug-ins system can be found on OHP twiki.
Links:
OhpMonitoring TWiki
OhpUserGuide TWiki

To be implemented/known issues:



ohpplugins

General changes

This package contains common plug-ins for the OHP general enough to be used by all ATLAS sub-systems. Also specific examples from detectors are included.
The package contains the follwing plug-ins:

Plug-ins list

Under Development


oks

OKS Server

Read more details on the TWiki page.

OKS Performance Improvements


OnlineRecovery

Introduction

The OnlineRecovery package is responsible for all recovery mechanisms from the RunControl point of view. It consists of two main parts. The first is a plug-in to the new RunController and will reproduce all recovery related behavior seen in the old RunControl (such as restart, ignore, etc). In addition it will include some more advanced recovery mechanism and also better statistics. The second part is a stand-alone server which will handle errors with a system wide impact and will also receive information from the RunController plug-ins.

RunController expert system

This is integrated as a plug-in to the new RunController. It receives updates directly from the controller and decides what to do in error-cases (such as ignoring, restarting, etc). It implements the ExpertSystemInterface defined in the RunController package.

Server

The OnlineRecovery server is responsible for handling all system wide errors. Currently the automatic disabling of RODs and the notification to the corresponding ROS has been enabled.

Core functionality

The OnlineRecovery takes the decision what to do in case of applications dying, going into the error state, failed test, etc. Normally the action taken will be according to the configuration settings for the specific application (IF-FAILS, IF-DIES, IF-ERROR) with the following exceptions:
A detailed description of core and specialized OnlineRecovery behavior can be found at on the cc webpage

Expert system server

Currently the main responsibility of the expert system server is dealing with stopless recovery. This is done whenever a ROD is reported as faulty by a RODBusyModule. The corresponding InputChannels and their ROS(s) are found. The ROD and the channels are them automatically disabled. The information about disabled channels are stored in COOL. The behavior of the recovery can be dynamically configured with a combination of three different settings: These settings can be modified either using the error_viewer application described below, or using the dedicated buttons in the DAQPanel (P1 only).

Changes since tdaq-02-00-00

Known bugs

Utilities

A graphical utility that allows a complete view of all errors in the system is available. It is possible to select partition and expert system server to retrieve the list of errors from.
error_viewer 

owl

new features

Implementation of thread pool pattern is added. See for complete information. Use:
#include 
There is no library to link.

PmgGui

Introduction
The PmgGui package includes GUI interfaces to the ProcessManager system.

Changes wrt tdaq-02-00-00

PmgISPanel

ProcessManager

Changes since tdaq-02-00-00

Server
        



RCDLTPModule

Introduction

This package contains RCD Software for the Local Trigger Processor. Please see ATLAS Timing Signal Distribution and https://edms.cern.ch/document/588024/1 for further details.

General changes


RCUtils


robin_ppc


RunController

Introduction

This package contains the Run Control for the TDAQ. It provides two C++ interfaces which allow the user to introduce case-specific actions carried out when a command is received: UserRoutines.h and Controllable.h. The two interfaces have completely different purposes!
A developer shall extend from UserRoutines.h only if he is customizing a controller at an intermediate level of the run control tree (i.e. with child applications). He can create UserRoutines objects to be called at the corresponding state-transitions and commands before they are processed by the Controller itself (i.e. transmitted to the children).

A developer of leaf applications (run control applications without children) shall only extend the Controllable.h interface. He creates a Controllable object to be called at the corresponding state-transitions and commands.

API additions since Previous Release

The UserRoutines class has been extended with 2 methods: virtual bool actionEnable(), virtual bool actionDisable(). These methods allow users to specify actions when a command ENABLE/DISABLE is received. The use case is a controller which is also a leaf application dealing with the stopless recovery (pattern which is discouraged since several years but still in use...).

Visible Changes since Previous Release

Internal Changes since Previous Release

APIs

States Class

This class exists in the daq::rc namespace. Detailed documentation can be found in the Doxygen documentation. Here we briefly list the public API:
daq::rc::States::T_State toState(string stateName); // translates the state name into its numerical value
std::string toString(daq::rc::States::T_State stateValue); // translates a state into its textual representation

Barriers Class

This class allows applications to wait on each other, for the purpose of synchronizing within a state transition. The mechanism used underneath is IS. Details of the API can be found in the Doxygen documentation.
An application who wishes to use it can follow this example:
#include "RunController/Barrier.h"
myMethod() {
daq::rc::Barrier a(partitionName, appName, subStateA_Name);
daq::rc::Barrier b(partitionName, appName, subStateB_Name);
try {
a.up();
b.up();
}
catch(ers::Issue &e) {
ers::fatal(e);
// do something or return;
}
// do whatever you need to do; if you encounter a fatal error, and you
// don't want all other applications to wait forever on you, lower the barrier with a.down()
// Now wait that the other applications have reached the synchronization point subStateA
try {
a.wait(timeout);
}
catch (ers::Issue & e) {
ers::fatal(e);
// do something or return;
}
// continue with what you need to do
//Now wait that the other applications have reached the synchronization point subStateB
try { b.wait(timeout); } catch (ers::Issue & e) { ers::fatal(e); // do something or return; } // continue with what you need to do return; }

Errors

The RunController hardly takes any decisions when errors occurred. Instead, the corresponding error are forwarded to the Expert System (Online Recovery) which takes a decision accordingly. This decision currently depends on the settings defined on the configuration database. Three fields are available to the user to define the behaviour: IfFails, IfDies, IfError. Note on restart: If the option restart is selected for any of these fields and the restart fails repeatedly, it will cause an error instead.

Timeouts

There are currently three different timeouts:
Action timeout - used for transition and standard commands (RESTART, STOP, etc). For a controller the actual transition timeout used is its action-timeout plus the highest action timeout among its children.
Short timeout - Used for killing applications and for testing.
Init timeout - After an application is started by a controller, the former must send a pmg_sync to notify that it has started correctly. Not doing this within the init timeout, will cause an error on the controller side. This implies that the init timeout should always be LESS than the ACTION timeout). This synchronization is handled internally for all RunControl applications and should not be done in the user code.

Known issues/bugs

... to be filled after testing...

Change of API planned for next major release!

For the next major release the API of the Controllable and UserRoutines will be changed, following the request from a number of users:

Applications

rc_setup

Description: Binary to start the basic infrastructure needed by the Root Controller.
Options/Arguments:
        -p partition      Name of the IPC partition (default $TDAQ_PARTITION)
        -d database       Name of the database (TDAQ_DB)
        -s segmentname    Name of the segment
        -n controller     Name of the controller
        -R expertSystem   Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.

run_controller

Description: The RunController is a general purpose control entity for the ATLAS Online infrastructure.
Options/Arguments:
        -p partition      Name of the IPC partition (default $TDAQ_PARTITION)
        -P parentname     Name of the parent
        -s segmentname    Name of the segment
        -n controller     Name of the controller
        -u substates      Library with the substates definition. The library name has to be given without the 'lib' prefix nor the '.so' extension.
        -R expertSystem   Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.

rc_test_controller

Description: Test unit for rc_controller binaries.
Options/Arguments:
        -p partition    IPC partition name (default $TDAQ_PARTITION)
        -n controller   Controller names

SFI

Changes since tdaq-02-00-00


   



SFO

Introduction

Sub Farm Output (SFO) is an application of the Event-Building / Data-Collection system; it receives and writes to the appropriate files the events which have been accepted by the Event Filter.

Changes since tdaq-02-00-01


   

system

Changes since tdaq-02-00-00


training

Introduction

A new version of the Training Manual (Version 5.0) is available at:
    $TDAQ_INST_PATH/share/doc/training/training_doc.pdf

General changes





wmi

Introduction

Web Monitoring Interface, one of the software components of the TDAQ Software sub-system of the ATLAS TDAQ, is intended to give to remote users a view of the status of the data acquisition system and its sub-systems.

New features


Generated: Fri Apr 3 16:57:31 CEST 2009 by /afs/cern.ch/atlas/project/tdaq/cmt/adm/bin/do_release_notes (c)