Release notes for DAQ/HLT-I S/W Release tdaq-02-00-01
General notes from s/w librarian
General info
This is a general DAQ/HLT-I release which is intended to be
used for the ATLAS data taking starting
from April 2009. It is
compatible with LCG56 s/w
and offline-15
releases
branch. W.r.t. tdaq-02-00-00 it does not contain incompatible API
changes, but rebuilding of all user
code is required.
Supported platforms,
compilers and compatibility
The production tags ofr this release are i686-slc4-gcc34-[opt,dbg],
however for testing purposes new tags are available:
i686-slc4-gcc43-opt and x86_64-slc4-gcc43-opt. Please note that the s/w
is not validated for thess configurations.
i686 s/w runs on x86_84 architecture (provided that h/w is described in
configuration DB with i686-slc4 tags). You can also run on SLC5 nodes,
provided the compatibility gcc and stdc++ libraries are installed.
System
and compiler
|
CMTCONFIG
|
Compatibility
list
|
| i686
Linux
2.6.9 (SLC4), gcc-3.4.x |
i686-slc4-gcc34-opt
|
SLC4.x,
RHEL4, SLC5 32/64bit
|
| i686
Linux
2.6.9 (SLC4), gcc-3.4.x |
i686-slc4-gcc34-dbg |
-
~ -
|
| i686
Linux
2.6.9 (SLC4), gcc-4.3.2 |
i686-slc4-gcc43-opt |
|
| x86_64
Linux
2.6.9 (SLC4), gcc-4.3.2 |
x86_64slc4-gcc43-opt |
SLC4.x,
RHEL4, SLC5 64bit |
External s/w and
run-time environment tuning.
tdaq-common-01-12-00
|
dqm-common-00-08-02
|
LCG
56
|
Java
Runtime Environment 1.6.0
|
Release distribution
This release is distributed
in RPM format, along with all required dependencies.
Development
environment
Versions
and paths of used external s/w are defined in TDAQExternal
package.
Tools needed for development:
The default compiler is gcc-3.4.x on SLC4, no need to
install it additionally.
To set up gcc 4.3 compiler from afs:
> source
/afs/cern.ch/sw/lcg/contrib/gcc/4.3.2/slc4_ia32_gcc43/setup.sh
CMT v1r20p20080222 (installed
with RPM in <inst_root>/CMT/v1r20p20080222)
JDK 1.6.0 (installed with RPM
in <inst_root>/sw/lcg/external/Java/JDK/1.6.0)
Important changes
requiring user actions:
1) The OKS databases at P1 have to be imported to /oks/tdaq-02-00-01.
This can be done by detector OKS DB experts using oks-import.sh script
(recommended way) or by TDAQ experts on explicit detector request (CVS
bulk import from /oks/tdaq-02-00-00).
Packages and tags used in the release
| ac |
v4r4p6 |
| AccessManager |
AccessManager-00-06-06 |
| clips |
clips-06-24-00 |
| cmdl |
cmdl-01-05-00 |
| cmem_rcc |
v2r0p23 |
| coca |
coca-01-09-06 |
| config |
config-02-00-01 |
| coral_auth |
coral_auth-01-10-01 |
| dal |
dal-01-15-08 |
| DAQPanel |
DAQPanel-07-00-06 |
| DataflowPolicy |
v1r6p0 |
| dbe |
dbe-01-01-00 |
| dccommon |
v1r1p31 |
| dcmessages |
v2r4p8 |
| ddc |
ddc-05-05-05 |
| DFConfiguration |
v9r4p14 |
| DFDebug |
v2r0p19 |
| DFExceptions |
v3r2p0 |
| DFM |
v2r22p5 |
| DFRelease |
DFnightly-00-01-01 |
| DFSubSystemItem |
v6r5p2 |
| DFTests |
v2r1p6 |
| DFThreads |
v2r4p1 |
| dnc |
dnc-01-00-12 |
| dqm_config |
dqm_config-00-06-03 |
| dqm_display |
dqm_display-00-01-09 |
| dqmf |
dqmf-00-06-06 |
| dvs |
dvs-00-34-03 |
| dvs_gui |
dvs_gui-00-01-06 |
| dvs_tests |
dvs_tests-00-01-01 |
| dynlibs |
v1r3p2 |
| ed |
ed-00-02-21 |
| efd |
efd-01-16-08 |
| efio |
v2r10p3 |
| emon |
emon-00-04-05 |
| errorRecovery |
v3r2p25 |
| ErrorReporting |
v3r0p1 |
| FarmTools |
FarmTools-02-01-05 |
| file_sampler |
tdaq-02-00-01 |
| gatherer |
v9r3p2 |
| genconfig |
genconfig-03-00-01 |
| gnam |
gnam-04-02-02 |
| gnamDummyLib |
gnamDummyLib-01-03-01 |
| histmon |
histmon-00-03-02 |
| hltinterface |
v0r0p25 |
| igui |
igui-01-01-56 |
| Igui |
Igui-00-00-14 |
| instrumentation |
v1r3p8 |
| io_rcc |
v2r0p42 |
| ipc |
ipc-04-12-00 |
| is |
is-05-07-00 |
| ispy |
ispy-00-00-29 |
| Jers |
Jers-01-00-05 |
| ktidbexplorer |
ktidbexplorer_tdaq-02-00-01_00 |
| l2dummy |
v1r8p22 |
| l2pu |
v1r19p46 |
| l2rh |
v2r1p8 |
| L2streamTest |
v1r2p1 |
| l2sv |
v1r15p3 |
| ls |
ls-01-02-12 |
| mda |
mda-05-00-03 |
| mda_browser |
mda_browser-00-00-07 |
| MonaIsa |
tdaq-02-00-01-p1 |
| mrs |
mrs-01-09-10 |
| msg |
v1r3p2 |
| msgconf |
v1r4p17 |
| msginput |
v1r4p1 |
| msgsctp |
v1r2p5 |
| msgtcp |
v1r3p2 |
| msgudp |
v1r2p3 |
| NetPanel |
NetPanel-01-01-03 |
| node2 |
node2-00-00-01 |
| NSG |
NSG-02-00-02 |
| oh |
oh-00-00-92 |
| ohp |
ohp-03-04-02 |
| ohpplugins |
ohpplugins-01-02-03 |
| oks |
oks-05-00-06 |
| oks2cool |
oks2cool_tdaq-02-00-01_00 |
| oks2coral |
oks2coral-02-02-01 |
| oksconfig |
oksconfig-02-03-04 |
| OMD |
OMD-00-00-47 |
| omni |
omni-04-13-01 |
| omniPy |
omniPy-00-00-03 |
| onasic |
onasic_tdaq-02-00-01_00 |
| OnlinePolicy |
online-00-23-00 |
| OnlineRecovery |
OnlineRecovery-02-03-04 |
| OnlineRelease |
OnlineRelease-00-00-68 |
| opmon |
opmon-00-00-10 |
| owl |
owl-00-01-00 |
| PackageID |
v1r0p21 |
| PartitionMaker |
PartitionMaker-06-08-01 |
| PmgGui |
PmgGui-00-00-14 |
| ProcessManager |
ProcessManager-01-02-33 |
| pt |
v4r6p3 |
| ptdummy |
v3r13p1 |
| PTIO |
PTIO-03-11-02 |
| QTUtils |
QTUtils-00-00-02 |
| queues |
v1r1p2 |
| rcc_corbo |
v2r0p5 |
| rcc_error |
v2r0p5 |
| rcc_rodbusy |
v2r0p8 |
| rcc_time_stamp |
v2r0p10 |
| rcdal |
rcdal-00-03-00 |
| RCDBitString |
v1r5p0 |
| RCDExampleModules |
v2r3p5 |
| RCDExampleTriggers |
v0r3p2 |
| RCDJtagChain |
v1r3p0 |
| RCDLtp |
v2r1p2 |
| RCDLtpi |
v2r1p3 |
| RCDLtpiModule |
v2r1p0 |
| RCDLTPModule |
v2r1p0 |
| RCDMenu |
v1r5p1 |
| RCDModuleDesign |
v4r2p0 |
| RCDTtc |
v2r1p0 |
| RCDUtilities |
v1r5p0 |
| RCDVme |
v2r1p2 |
| RCInfo |
RCInfo-00-02-03 |
| RCUtils |
RCUtils-01-04-10 |
| rdb |
rdb-06-00-06 |
| rdbconfig |
rdbconfig-01-08-01 |
| rm |
rm-02-03-09 |
| Rm-Gui |
Rm-Gui-01-00-04 |
| rn |
rn-02-00-01 |
| robin_ppc |
v0r0p98 |
| RobinTestSuite |
v2r1p27 |
| RODBusy |
v2r1p0 |
| RODBusyModule |
v2r1p0 |
| roib |
v2r8p4 |
| ROSApplication |
v6r7p1 |
| ROSBufferManagement |
v2r4p1 |
| ROSCore |
v9r1p1 |
| rose |
v1r19p14 |
| ROSEventFragment |
v2r2p5 |
| ROSEventInputManager |
v2r2p1 |
| ROSfilar |
v2r0p35 |
| ROSGetInput |
v2r0p4 |
| ROSInterruptScheduler |
v1r1p0 |
| ROSIO |
v7r4p3 |
| ROSMemoryPool |
v2r2p2 |
| ROSModules |
v2r8p9 |
| ROSMonitor |
v1r1p8 |
| ROSObjectAllocation |
v2r1p0 |
| ROSRCDdrivers |
ROSRCDdrivers-00-00-45 |
| ROSRobin |
v0r1p77 |
| ROSslink |
v2r0p11 |
| ROSsolar |
v2r0p50 |
| ROSUtilities |
v2r5p0 |
| RunController |
RunController-02-01-03 |
| SFI |
v4r13p14 |
| SFIOEmulators |
SFIOEmulators-00-08-05 |
| SFO |
v2r19p7 |
| siom |
v2r1p6 |
| sysmon |
v2r3p2 |
| sysmonapps |
v2r4p5 |
| system |
system-00-00-13 |
| TDAQExternal |
TDAQExternal-00-14-02 |
| TDAQPolicy |
TDAQPolicy-00-07-15 |
| thread_allocator |
v1r0p7 |
| threads |
v1r1p13 |
| tidb2 |
tidb2_tdaq-02-00-01_00 |
| tmgr |
tmgr-01-07-02 |
| training |
training-00-05-04 |
| transport |
v1r1p13 |
| TriggerDB |
TriggerDB-00-00-07 |
| TriP |
TriP-00-00-44 |
| ttcpr |
v2r2p0 |
| TTCviModule |
v2r0p0 |
| vme_rcc |
v2r0p50 |
| wmi |
wmi-00-03-14 |
| xmext |
xmext-01-02-09 |
Changes in packages (in ABC order)
dal |
DAQPanel |
dccommon |
dcmessages |
ddc |
DFM |
dnc |
dvs |
dvs_gui |
errorRecovery |
gatherer |
histmon |
igui |
Igui |
l2pu |
l2rh |
ls |
ohp |
ohpplugins |
oks |
OnlineRecovery |
owl |
PmgGui |
ProcessManager |
RCDLTPModule |
RCUtils |
robin_ppc |
RunController |
SFI |
SFO |
system |
training |
wmi |
dal
Message Passing Node ID
- Add calculation of the message passing Node ID for message
passing resources, which are not applications, i.e.
- the object has to be linked with segment's or resource-set's
resources
- one of it's base classes has to be "DFMessagePassingNode" class
- it cannot be casted to "BaseApplication"
class
- example: objects of "RobinReadoutModule"
class from data-flow schema linked as resources
- Rename unused daq::Application::ROB
type to daq::Application::ROBIN
to support above application types
Disabled Algorithm
1. Remove obsolete check_parents parameter
C++ and Java code have to use disabled() algorithm with the only
parameter pointing to the partition object. Second parameter
check_parents has to be removed (note, the second parameter was
optional in C++). The disable status of parents is always taken inot
account.
2. Implement why_disabled() algorithm in Java
As requested by https://savannah.cern.ch/bugs/?34962,
the Component
class implements why_disabled() algorithm to provide textual
description of a reason to disable given resource or segment, e.g.:
- explicitly disabled resource:
object
'BCM_ROD_Spare@BCM_ROD_Module'
is disabled because
it is explicitly disabled
- disabled because of direct parent:
object
'TRTBarrelC_LTP_BusyChannel_BC0@BusyChannel' is disabled because
it's parent TRTBarrelC_LTP@LTPModule is
disabled because:
it is explicitly disabled
- disabled because of chain of parents:
object
'PixelL12_TIM4_busy@BusyChannel' is disabled because
it's parent PixelL12_RODBusy@RODBusyModule is
disabled because:
it's parent PixelL12_LTP@LTPModule
is disabled because:
it's parent
PixelDisks_LTP@LTPModule is disabled because:
it's
parent PixelL0_LTP@LTPModule is disabled because:
it's parent PixelLTPi_Global@LTPiModule is disabled
because:
it's parent RCDPixelTtcCratePitL0@RCD is disabled
because:
it's parent PixelTtcCrateBLayer@Segment is disabled
because:
it's parent Pixel@Segment is disabled because:
it is explicitly disabled
- resource-set OR disabling:
object
'ROS-TIL-LBC-ROL-16@ResourceSetOR' is disabled because
it is "resource-set-OR" and at least one child
TileLaserModule@TileLaserModule is disabled because:
it's parent TileLaserRCD@RCD is
disabled because:
it is explicitly
disabled
- resource-set AND disabling (dummy example):
object
'ROBIN-TRT-ECC-03-3@RobinReadoutModule' is disabled because
it is "resource-set-AND" and all children (2)
are disabled:
[1] component
ROL-TRT-ECC-03-341402@RobinDataChannel is disabled
because:
it's parent
rod341402_ResourceSet@ResourceSetOR is disabled
because:
it is
"resource-set-OR" and at least one child
rod341402@TRTROD05Module is disabled because:
it is explicitly disabled
[2] component
dummy@RobinDataChannel is disabled because:
it is explicitly
disabled
See dal/jexamples/TestDisabled.java
for example of usage.
3. Set used disabled and enabled components in Java
Like in C++, DAL implements two dal.Partition algorithms to set
temporarily disabled or enabled components:
/**
* The method temporarily marks given
components as 'disabled'
* without modifying partition object in
database.
* Above is overwritten by consequent
set_disabled() call.
* It is automatically clean when
database is updated or reload.
*
* @param objs vector of components
to be temporarily disabled.
*/
public void set_disabled(dal.Component objs[]) ...
/**
* The method temporarily marks given
components as 'enabled'
* without modifying partition object in
database.
* Above is overwritten by consequent
set_enabled() call.
* It is automatically clean when
database is updated or reload.
*
* @param objs vector of
components to be temporarily
enabled.
*/
public void set_enabled(dal.Component objs[]) ...
See dal/jexamples/TestDisabled.java
for example of usage.
4. Multiple configuration objects and user enabling/disabling in
Java code
Since by efficiency reasons the dal.Component.disabled() algorithm uses
singleton map to keep information about disabled status of components,
a user disabling/enabling changes status of components application
wide. Also, there is a problem, when Java application creates more than
one configuration objects and call disabled() algorithm on them (e.g.
disabling status should be different, when IGUI works with RDB and
RDB_RW servers and changes disabling on RDB_RW) or uses different
partition objects.
The problem can be solved using an object of dal.Resources class used
to implement above mentioned singleton map. Each object of
dal.Resources class can be used separately with configuration object,
partition and user disabled/enabled components. In such case instead of
using DAL algorithms of of Component and Partition class, use
appropriate methods of dal.Resources class:
DAL algorithm
|
Appropriate method of dal.Resources class
|
| dal.Partition.set_disabled() |
public void set_disabled(dal.Component objs[]) |
| dal.Partition.set_enabled() |
public void set_enabled(dal.Component objs[]) |
| dal.Componet.disabled() |
public boolean get_disabled(dal.Component obj,
dal.Partition p, config.Configuration db) throws
config.SystemException, config.NotFoundException |
| dal.Componet.why_disabled() |
public String why_disabled(dal.Component c,
config.Configuration db, String prefix, boolean recursive) throws
config.SystemException |
For more info see file dal/jsrc/dal/Resources.java
Example of simultaneous usage of default and user-specific Resources
object:
config.Configuration db = new
config.Configuration("rdbconfig:RDB");
dal.Partition p = dal.Algorithms.get_partition(db, "be_test");
config.Configuration db2 = new
config.Configuration("rdbconfig:RDB_RW");
dal.Partition p2 = dal.Algorithms.get_partition(db2, "be_test");
dal.Resources resources = new dal.Resources(); //
will use with db2 scope
// disable in db2 scope resources "r1" and "r2"
dal.Component[] disabled_objs = new dal.Component[2];
disabled_objs[0] = dal.Component_Helper.get(db2,"r1");
disabled_objs[1] = dal.Component_Helper.get(db2,"r2");
resources.set_disabled(disabled_objs);
// check status of "r1" in db and
report
why if disabled:
dal.Component r1 = dal.Component_Helper.get(db,"r1");
if(r1.disabled(p)) {
System.out.println("object r1 in db1 is disabled
because "
+ r1.why_disabled(" ", true));
}
// check status of "r1" in
db2
and report why if disabled:
dal.Component r1_2 = dal.Component_Helper.get(db2,"r1");
if(resources.get_disabled(r1_2, p2, db2)) {
System.out.println("object r1 in db2 is disabled
because "
+ resources.why_disabled(r1_2,
db2, " ", true));
DAQPanel
The DAQPanel is part of the
TDAQ release since tdaq-02-00-00.
A detailed descripion of changes is available in the ChangeLog file available in the
package root directory.
To start the DAQPanel just
execute the daqPanel binary;
the following command line options are available:
- --setup_dir
<setup_dir> Directory
containing setup scripts [default is $HOME]
- --db_dir
<db_dir>
Directory containing database files [default is $HOME]
- --general_dir <general_dir> Directory used to save/get
files [default is $HOME]
To have default values for the setup script, database name and
partition name when the Get Default
button is pushed the following environment variables have to be
defined: TDAQ_SETUP_SCRIPT (the setup script full path),
TDAQ_DB_DATA_DEFAULT (the database file full path),
TDAQ_PARTITION_DEFAULT (the partition name).
Some applications will not be started because only available (or using
customized scripts) at P1:
- General tab:
Busy Presenter;
- Mon Advanced:
MDA Browser, SFO Display, Det Mask;
- Ctrl Advanced:
(De)Activate Automatic Recovery.
Changes with respect to release
tdaq-02-00-01:
- Use new DVS script;
- Use new RM gui script.
dccommon
General changes
-
ROSConf
-
Add support in ROSConf for networked
(ROBIN) readout of ROS data.
-
Add the possibily to use multiple Level2 result
handlers (pROSs) in a partition.
- L2PUs automatically configure to equally load available L2RHs.
- If a L2RH fail, the L2PUs using it will automatically
re-configure to use remaining L2RHs in a balanced way.
-
Internal updates and bugfixes.
Package Description
This package provide the classes
-
ROSConf, a class that provide configuration information
about the ROSs, ROSEs and Level2 result handlers in the partition.
Intended for use by L2PU, PSC and SFI.
-
TrafficShaping, a class that implements a simple traffic
shaping mechanism for use with UDP protocol.
The reulation is based on time-outs for ROS requests.
Intended for use by L2PU and SFI.
dcmessages
Changes since tdaq-02-00-00
The receive method
size_t receive(MessagePassing::Buffer*& buf, void*& addr)
was updated to guarantee that after being called the parameter
buf points to a DC buffer with a single page, i.e., buf can be
updated point to a new buffer.
ddc
Introduction
The complete documentation for the DAQ - DCS Communication package
may be found at CERN/EDMS as
https://edms.cern.ch/document/684955/5.5
General changes
The DIM library version 18 release 4 installed, where a series of DIM
problems is fixed.
Changes in the Command Transfer
NT Command Server:
Since this release the IS server DDC is used for the NT-commands. This
server is included by default into the infrastructure of any TDAQ
partition.
API changes: The controller name, which a command addresses to,
is introduced as a parameter std::string ctrlName into all public
methods of the DdcCommander interface. This is strongly caused by the
requirement that a controller should not accept the NT-commands
addressed to other controllers.
// For a command defined in the configuration
bool ddcSendDaqCommand(Configuration* confDb,
std::string ctrlName, std::string daqCommandName);
// For "direct" command defined by known PVSS
DIM-RPC-service
bool ddcSendCommand(std::string ctrlName,
std::string commandName, std::string commParameters,
unsigned int
timeout, bool BM = false);
// For "direct" command with receiving response
(high level interface)
int ddcExecCommand(std::string ctrlName, std::string
command, std::string params,
unsigned int
timeout, bool queued = false);
// To remove last command from the RunCtrl IS server
bool ddcRemoveCommand(std::string ctrlName,
std::string cmdName);
// To build the IS entry name for the command
response
std::string
makeResponseName(std::string ctrlName,std::string commandName);
std::string
makeResponseName(Configuration* confDb, std::string ctrlName,
std::string commandName);
Internal Improvements:
1. Fixed a bug in sending parllel non-transition commands.
2. Cleaning of "hanging" NT-commands moved from the FINAL transition to
the start up of the DDC controller
No functional changes in the Data and Message Transfer applications
Fixed an internal synchronasation bug of dynamic subscribe/unsubscribe
PVSS data by a TDAQ application
DFM
Introduction
The Data Flow Manager (DFM) is the manager of the Event Building part of
the DAQ; responsible for assigning L2-accepted events to Event Builder applications (SFIs),
and for sending clear messages to the Readout System for all events which have
either failed L2, or which have been accpeted by L2 and been built.
Changes since tdaq-02-00-00
- No major changes since tdaq-02-00-00
dnc
dvs
Changes since tdaq-02-00-00
- The tests have been moved to a new
package called dvs_tests
- the java version of dvs_gui display is no longer built. New
implementation for the dvs display is available in dvs_gui package. See
release notes for dvs_gui package.
Fixes:
- It happened that CLIPS engine was stopped
before
actually finishing evaluating all rules. At next 'test start' click
button, clips engine was restarted and finished execution of rules in
memory. Problem fixed.
- signature changed for method: void
getTestRuntimeOutput (const char* comp_name, std::string
test_host,std::string test_output_name);
- new method to unload harware objects with state marked 'OFF' in
the database.
dvs_gui
Package to implement
display for dvs core
- This application (Qt4 implemetation) is supposed to be the
replacement of the java implementation of the dvs display. It should
provide the same functionalities as the old dvs_gui.
- New gui is started with:
- dvs_start_gui [database_file]
[partition]
database_file
configuration database (you must specify it unless you have defined
TDAQ_DB_DATA
(can be specified relatively to cwd or to TDAQ_DB_PATH)
partition
selected partition (must specify unless you have defined TDAQ_PARTITION)
- igui starts the new dvs gui
- It uses QTUtils package.
- Options not implemented yet:
- New options:
- DVS Output: a separate window will print the dvs output during
a test session
- Remove Hardware-OFF : will not load from the database the
hardware components that have state set to OFF. The corresponding
applications running on these machines will not be loaded neither.
- Online Help is implemented.
- TO DO:
- add progress bar on the main window to show the evolution of
the testing session
errorRecovery
General changes
-
Make it possible for a requester application to
handle multiple responder types, e.g., the L2PU can
handle nodes of both ROS and LVL2ResultHandler type.
-
Internal improvements and bug fixes only.
gatherer
Merged the gatherer and MonGatherer packages.
Removed IS_OH from the schema. Named the binary Gatherer instead of GathererApplication.
Parallelize the gatherer.
Introduced message forwarding.
histmon
General changes
Histograms published with history by default. The history tag is derived from current time & update interval.
If original histograms were designed to be LBN aware then LBN is used as a tag.
igui
Introduction
The IGUI (Integrated Graphical User Interface) web page is at:
http://atlas-onlsw.web.cern.ch/Atlas-onlsw/components/igui/welcome.html
General changes
- In the Segment & Resource
panel, for the indirect disabled items, it is possible to find the
reason why a segment or a resource is disabled pressing the middle
mouse button with the mouse pointer on the node. An Information Dialog
Frame showing the reason is displayed.
- If mrs_monitor is
started from IGUI button, the user name is added at default logfile
name. The logfile can be found at /tmp/mrs_monitor_username.log
- The online help has been
updated.
Bug fixes
- Fix the behaviour of ELOG interface when the ELOG server is not
accessible.
- Fix the multiple ELOG dialog window problem.
- Fix the behaviour of Data Set
Tags panel at run state change.
Igui
This is the new IGUI
implementation. The current version does not implement the full igui functionalities yet, but it
can be used to fully control a partition.
It includes:
- A panel to send transition commands to the RootController
- A panel showing run information and settings
- This panel is located just below the one used to send
transition commands to the Root
Controller. It contains some tabs:
- Information: it
shows general information about the run;
- Counters: it shows
counters (event, rates, etc) for the current run;
- Settings: here the
used can set some run parameters (this tab replaces the RunParams panel in igui).
- A panel containing anything to operate the Run Control
- Recovery commands are implemented using a pop-up menu; just
right click on a controller/application and a contextual menu will
appear showing all the commands the user can send.
- In status display mode the user cannot send any command and
the contextual menu will not appear.
- Infrastructure panel:
- No changes with respect to igui.
- Advanced panel:
- This sub-panel contains three task panes giving access to
some extra functionalities
- FSM transition commands
- This panel gives the possibility to send transition
commands to the selected application; only valid commands are shown.
- Advanced commands
- Here the user can sends some extra commands to the
selected application: set debug level, and publish state/statistics.
- Application information
- Simple form showing detailed information about the
selected application.
- A panel to enable or disable segments and resources
- Extra information about the reason why a component is disabled
is available via a tooltip or a dialog (shown with a mouse righ button
click).
- A panel showing MRS log messages
- As in igui this panel is located at the bottom of the main
frame, but now this panel is surrounded by two toolbars giving access
to some functionalities available before in a separate panel:
- Using the top toolbar the user can select the MRS
subscription criteria (to apply the changes the subscribe button at the right side
should be pushed);
- Using the bottom toolbar the user can clean the message
window, select the message format (short or long), choose the number of
visible messages and see the current subscription criteria.
- When this panel owns the focus (usually selecting a message)
the CTRL-F keyborad combination will show a find dialog useful to
quickly browse the shown messages.
Database committing and reloading is handled centrally (via the Commit & Reload button placed
in the tool-bar at the top of the main frame) and all the panels do not
need anymore to implement such an action using custom buttons.
The IguiPanel interface has
been modified to meet the changes in the Igui core design and user
panel developers can find any needed information in the javadoc
documentation:
$TDAQ_INST_PATH/share/doc/Igui/javadoc/Igui/index.html.
To start the new Igui the following script can be used:
Igui_start -p <partition> -d
<database> <vm properties>
Now the IGUI can be used for partition starting/stopping
procedure as well: just start the setup_daq
script with the -newgui
switch.
l2pu
General changes
Internal improvements and bug fixes only.
No changes from user point of view.
l2rh
General changes
-
Use time-stamp to garbage collect 'old'
entries in the LVL2Result store,specially when the store is full.
-
LOG configured state.
-
Internal improvements and bug fixes.
ls
Introduction
This package substitues the obsolete logService. A new requirement whereby database technologies
other than ORACLE had to be dropped came in around spiring 2007. This new requirement meant the
re-writing of the logService package, which was higly dependent on MySQL. This opportunity was taken
to refactorize the code, especially the log manager, which was never very user friendly. The database
access in C++ is done using the CORAL interface, which hides the underlying technology. For the log manager,
JAVA was the language chosen, since it brings in the flexibility requiered to make this tool more intuitive.
The resulting java application can be run from the console, or remotely using the Java Web Start technology.
Known issues/bugs
In the Log Manager, when righ-clicking on the Messages Table (right hand side panel) or Partition Table
(left hand side panel) a Refresh button pops up. This is used to update the current view of partitions and
messages, respectively. Refreshing these objects only works if one right-clicks the mouse without moving it;
that is, press and release without changing the X and Y pointer of the mouse. This may be a problem with Java.
In the meantime, one can also perform a Refresh via the options in the top menu File: 'Refresh tree' and
'Refresh table'.
To be implemented
Add an option to display statistics, internal and from IS.
Changes from previous release
None
Example applications
None exist at the moment.
Applications
Log Manager
Usage: log_manager
Instruction on how to use this tool are on the Help menu in the application itself.
The Log Manager can also be run using Java Web Start technology from the following link:
https://pcatdwww.cern.ch/jnlp/logmanager/logmanager.jnlp
A Java application is downloaded and started. You MUST have at least Java 1.6 installed on your local machine.
lsReceiver
Description: This application subscribes to the MRS service to receive and log on a database messages produced by TDAQ applications.
Usage: lsReceiver [-p partition-name] [-u user-name] [-n IS-server-name] [-s threshold-size] -c connect-string [-S subscribe-expression]
Options/Arguments:
-p partitionName Partition name
-u userName User name
-n ISserverName Name of the Information Service to publish the message rate into.
-c connectionString Database connection string.
Test units
logTest
Description: Test binary for the Log Receiver application.
Usage: logTest -c connect-string [-p partition-name] [-l complexity-level]
Options/Arguments:
-p partitionName Partition name
-l level Level of Complexity of the test [1: open/close - 2: tests the Log Service Infrastructure].
-c connectionString Database connection string.
Utilities
logSelect
Description: Application to retrieve log messages for a given partition according to the search criteria specified. By default, messages are dumped on std::cout.
Usage: logSelect -c connect-string -p partition-name [-i message-name] [-m machine-name]
[-a application-name] [-l time-low] [-u time-up] [-s severity]
[-x text] [-r parameters] [-d order-list] [-e max-rows] [-f offset-row]
Options/Arguments:
-c connectionString Database connection string.
-p partitionName Partition name.
-u userName User name.
-n run-number Run Number.
-i message-name Message name or ID.
-m machine-name Machine name where the message was issued.
-a application-name Application name where the message was issued.
-L time-low Lower time threshold (in UTC time).
-U time-up Upper time threshold (in UTC time).
-s severity Message severity:
0 - SUCCESS
1 - INFORMATION
2 - DEBUG
3 - WARNING
4 - ERROR
5 - FATAL
-x text Text in the message body.
-r parameters Message parameters.
-d order-list Parameter to sort the messages by (MSG_ID, MACHINE_NAME, APPLICATION_NAME, ISSUED_WHEN, SEVERITY, MSG_TEXT, PARAMETERS, RUN_NUMBER).
-e max-rows Maximum number of rows to retrieve from the database; 100 by default. If 0, all entries are retrieved.
-f offset-row Offset in the table to retrieve the messages from.
logDelete
Description: Application to remove log messages for a given partition according to the search criteria specified.
Usage: logDelete -c connect-string -p partition-name [-i message-name] [-m machine-name]
[-a application-name] [-l time-low] [-u time-up] [-s severity]
[-x text] [-r parameters]
Options/Arguments:
-c connectionString Database connection string.
-p partitionName Partition name
-u userName User name.
-n run-number Run Number.
-i message-name Message name or ID.
-m machine-name Machine name where the message was issued.
-a application-name Application name where the message was issued.
-L time-low Lower time threshold (in UTC time).
-U time-up Upper time threshold (in UTC time).
-s severity Message severity:
0 - SUCCESS
1 - INFORMATION
2 - DEBUG
3 - WARNING
4 - ERROR
5 - FATAL
-x text Text in the message body.
-r parameters Message parameters.
logGetPartitionNames
Description: Application to retrieve the list of partition names.
Usage: logGetPartitionNames -c connectionString
Options/Arguments:
-c connectionString Database connection string.
ohp
General changes
For more details, please refer to the "README" file (in the ohp installation directory).
New in this version of OHP is the possibility to add a documentation string to plugins through configuration file. For example:
< plugin ..... >
< doc >"Your text explaining what the plugin does" < /doc >
< /plugin >
Since tdaq-01-09-01 ohp supports retreiving histograms from multile servers. To use this feature the configuration files need to be slightly modified: the general block should change from:
< general >
< partition name="be_test" />
< server name="Histogramming" />
< subscription name="Provider/.*" />
</general>
to:
< general >
< partition name="be_test" />
< subscription server="Histogramming" provider="Provider" histogram=".*" />
</general>
In the configuration file every istance (in tabs or options) of the file names have to be changed to include the OH server name as in the following examples:
< display >
< tab name="Tile" >
< histogram name="TileProv/Tile/Drawer1/h1"/>
< /tab>
< /display >
< globaloptions >
< histogram name="TileProv/Tile/Drawer1/h1" >
< DRAWOPT=LEGO/>
< /hisgogram >
< /globaloptions >
has to be changed in:
< display >
< tab name="Tile" >
< histogram name="Histogramming/TileProv/Tile/Drawer1/h1"/>
< /tab>
< /display >
< globaloptions >
< histogram name="Histogramming/TileProv/Tile/Drawer1/h1" >
< DRAWOPT=LEGO/>
< /hisgogram >
< /globaloptions >
In share directory a commented example (example.conf.xml) is provided
with instructions on how to write a configuration file for OHP.
Configuration of OHP through old ASCII format is deprecated, ohp will still work with ASCII but new features are compatible only with XML.
OHP has a plug-ins system. Users can develop their own GUIs to extend or modify OHP functionalities. The standard GUI has been migrated to a plug-in, even if nothing changes in the aspect/usage. You can refer to ohpplugin package as an example of developing plugins. More information on plug-ins system can be found on OHP twiki.
Links:
OhpMonitoring TWiki
OhpUserGuide TWiki
To be implemented/known issues:
ohpplugins
General changes
This package contains common plug-ins for the OHP general enough to be used by all ATLAS sub-systems. Also specific examples from detectors are included.
The package contains the follwing plug-ins:
Plug-ins list
- Histo Window: Simple Plugins to display histograms in a tab
- Histo Window Tab: A set of Histo Window plug-ins organized in tabs
- Status Window: A penl with the status of OHP and information on active session
- Browser: An extended version of classic OHP browser
- MDI Interface: A desktop to arrange and control the plug-ins
- Histo Window RegExp: A modified version of Histo Window with support for regular expression
- Tile Cosmics: This is a TileCal specific plugin that selects some histograms out of a list to be displayed
- Error: This is a TileCal specific plugin that shows in a table information extracted from histograms
- Legacy Buttons: Reconnect/Stop buttons
- ROOT Interface: Plugin that runs a user defined ROOT macro to handle drawing of histograms
Under Development
- Options Editor: To help modifying histogram drawing options
- Help System: html based help display
oks
OKS Server
- the oks-commit.sh supports directories in addition to files
- add oks-import.sh utility to simplify import of new directories
and files
- the repository locks remain from abnormally terminated oks
commits can be removed on Point-1 by DAQ experts using
/oks/admin/unlock-repository.sh script via sudo
- the "Replace" dialog of OKS Data editor proposes user to
check-out repository file containing modified objects
- file-related relative pathnames and absolute pathnames includes
are not allowed (in
particular to avoid inclusion of files stored outside current
repository and to simplify consistency check by oks-commit.sh)
Read more details on the TWiki
page.
OKS Performance Improvements
- the OKS library uses pool of threads to load OKS data files, i.e.
the data files can be read in parallel
- the number of threads by default is equal to number of the
computer's CPU cores
- it can be modified via OKS_KERNEL_THREADS_POOL_SIZE environment
variable
- the OKS library does not stop reading of files on first error,
but continues loading of files in parallel threads until their ends or
errors
- thus the final error report may contain several errors coming
from different files
- this error report may change between different runs of OKS
utilities even if the files were not updated
- for optimal performance it is recommended:
- to reduce number of schema files (at some point after a schema
file parsing OKS requires single active thread to update the database
schema)
- to avoid huge data files (processing of single data file is not
parallelized since XML files are non indexed)
OnlineRecovery
Introduction
The OnlineRecovery package is responsible for all recovery mechanisms from the RunControl point of view. It consists of two main parts. The first is a
plug-in to the new RunController and will reproduce all recovery related behavior seen in the old RunControl (such as restart, ignore, etc). In addition it
will include some more advanced recovery mechanism and also better statistics. The second part is a stand-alone server which will handle errors with a
system wide impact and will also receive information from the RunController plug-ins.
RunController expert system
This is integrated as a plug-in to the new RunController. It receives updates directly from the controller and decides what to do in error-cases (such
as ignoring, restarting, etc). It implements the ExpertSystemInterface defined in the RunController package.
Server
The OnlineRecovery server is responsible for handling all system wide errors. Currently the automatic disabling of RODs and the notification to the
corresponding ROS has been enabled.
Core functionality
The OnlineRecovery takes the decision what to do in case of applications dying, going into the error state, failed test, etc.
Normally the action taken will be according to the configuration settings for the specific application (IF-FAILS, IF-DIES, IF-ERROR)
with the following exceptions:
-
An application with decision set to RESTART will be restarted up to a maximum of 5 times. After this restarts will not happen and it will be considered an error instead.
If the last restart happened more than 30 min ago the counter is reset (it is also reset if the controller goes back to NONE state).
-
An application with a failed test will be re-tested up to a maximum of 3 times. After this a failed test is considered an error. If the last retest was more than
10 min ago the counter is reset.
-
If specific recovery is defined (through extra rules files) these might override the database settings.
- An application with decision IGNORE will not be ignored if they are depended upon by an application that is running and has membership IN. In this case it is considered
and error and an ers message will be sent containing all the applications depending upon it.
A detailed description of core and specialized OnlineRecovery behavior can be found at on the cc webpage
Expert system server
Currently the main responsibility of the expert system server is dealing with stopless recovery.
This is done whenever a ROD is reported as faulty by a RODBusyModule.
The corresponding InputChannels and their ROS(s) are found. The ROD and the channels are them automatically disabled.
The information about disabled channels are stored in COOL.
The behavior of the recovery can be dynamically configured with a combination of three different settings:
- State - if false no recovery is done.Otherwise recovery is performed according to the following settings.
- Ignore Xoff - if true ROS XOff is not taken into account, otherwise recovery is not done if ROS Xoff is the cause of busy ROD.
- Automatic - if false the user is asked before recovery is performed (through IGUI). If true, recovery is done without user input.
These settings can be modified either using the error_viewer application described below, or using the dedicated buttons in the DAQPanel (P1 only).
Changes since tdaq-02-00-00
- Lvl2 recovery now also in the case of L2SV dying
- Recovery in the case of a failed HLT prescales update
- Disabled RODs are also published in IS using the new DisabledROD IS structure. Information is published in the RunCtrlStatistics IS server.
Known bugs
Utilities
A graphical utility that allows a complete view of all errors in the system is available.
It is possible to select partition and expert system server to retrieve the list of errors from.
error_viewer
owl
new features
Implementation of thread pool pattern is added.
See for complete information.
Use:
#include
There is no library to link.
PmgGui
Introduction
The PmgGui package
includes GUI interfaces to the ProcessManager
system.
Changes wrt tdaq-02-00-00
PmgISPanel
- Fixed bug #46477 (https://savannah.cern.ch/bugs/?46477):
"The PMG IS panel in IGUI may show wrong processes on some host".
ProcessManager
Changes since tdaq-02-00-00
Server
- Following changes in the
system package, now
started processes no more inherit the pmgserver
initial environment;
- Added some additional verbosity interacting wit the AccessManager and the ResourceManager;
- Fixed problem with process stop time publication in IS.
RCDLTPModule
Introduction
This package contains RCD Software for the Local Trigger Processor. Please see ATLAS Timing Signal Distribution and https://edms.cern.ch/document/588024/1 for further details.
General changes
- Introduced classes to load the pattern generator from an OKS object in addition to from a file. New classes LTPPattern, LTPPatternFile, LTPDeadtimePattern created. Backward compatibility with tdaq-02-00-00 is ensured.
RCUtils
robin_ppc
RunController
Introduction
This package contains the Run Control for the TDAQ. It provides two
C++ interfaces which allow the user to introduce case-specific actions
carried out when a command is received: UserRoutines.h and
Controllable.h. The two interfaces have completely different purposes!
A developer shall extend from UserRoutines.h only if he is customizing
a controller at an intermediate level of the run control tree (i.e.
with child applications). He can create UserRoutines objects to be
called
at the corresponding state-transitions and commands before
they are processed by the Controller itself (i.e. transmitted to the
children).
A developer of leaf applications (run control applications without
children) shall only extend the Controllable.h interface. He creates a
Controllable object to be called
at the corresponding state-transitions and commands.
API additions since Previous Release
The UserRoutines class has been extended with 2 methods: virtual bool actionEnable(), virtual bool actionDisable(). These methods allow users to specify actions when a command ENABLE/DISABLE is received. The use case is a controller which is also a leaf application dealing with the stopless recovery (pattern which is discouraged since several years but still in use...).
Visible Changes since Previous Release
- Substates are implemented via a new mechanism. Instead of
implementing special UserRoutines in a controller, the applications
which need extra synchronization points can use "Barrier" class (see
details later on).
- The periodic publish and publish statistics commands are dealt
with differently. In tdaq-01-09-01 the controller was broadcasting
these commands to all children at regular intervals. In order to allow
different children to have different publishing frequencies, each leaf
application starts 2 threads once the CONFIGURED state is reached:
these wake up at the interval specified in the DB (ProbeInterval and
FullStatisticsInterval. 0 = means do not launch thread).
- It is not possible anymore to ask a controller to start or
restart an application when the controller is in a STATE at which that
application should not be running (e.g. cannot start a leaf application
in BOOTED).
- The SHUTDOWN command is equivalent to EXIT for all controllers
except the RootController, who has one more state (NONE).
- It is now possible to test individual objects sending the command
TEST_OBJECT <object name> to a controller.
- It is now possible to issue recovery commands also when the
Controller is BUSY (STARTAPP, STOPAPP, RESTARTAPP, TEST_OBJECT, TESTHW,
TESTPMG, IGNORE).
- A States class has been introduced which defines all FSM States
in an enumeration and provides their textual representation (see
details later on).
- If the RootController dies, it can now be safely restarted (by
launching again setup_daq) even if the system is in RUNNING state.
- The RootController can deal with the following extra commands:
- L1UPDATEPRESCALES: dummy place holder for now.
- HLTUPDATEPRESCALES: the root controller receives as
arguments
the Trigger SuperMaster Key (SMK) and the L1 and HLT
Prescale keys; it blocks the main trigger, inserts the new keys and a
new LB into the trigger DB, broadcasts to all nodes the user command
"HLT_PrescaleUpdate <lumi block> <SMK> <L1Prescale>
<HLT Prescale>"; when the command has been carried out completely
it releases the main trigger. If a child does not manage to complete
the command correctly it is terminated by the expert system.
- LUMIBLOCKINTERVAL: 2 argument sets are possible "TIME
<interval in seconds>" "EVENTS <number of events>". The
interval at which the LB changes spontaneously can be changed with this
command. The default is "TIME 300".
- LUMIBLOCKINCREASE: increase the LB by 1. Optionally a
parameter can be passed to ask for an increase bt more than 1.
Internal Changes since Previous Release
- In the Application class the startAt and stopAt parameters are
translated to the exact transition at which the process should be
launched/stopped
- Several helper methods have been added to the RunController class
in order to better factorize the code.
- the BOOT and INITIALIZE transitions are dealt with as normal
transitions.
- The TESTPMG and TESTHW are dealt with as normal active commands
and not as pseudo-transitions.
APIs
States Class
This class exists in the daq::rc namespace. Detailed documentation can
be found in the Doxygen documentation. Here we briefly list the public
API:
daq::rc::States::T_State toState(string stateName); // translates the
state name into its numerical value
std::string toString(daq::rc::States::T_State stateValue); //
translates a state into its textual representation
Barriers Class
This class allows applications to wait on each other, for the purpose
of synchronizing within a state transition. The mechanism used
underneath is IS. Details of the API can be found in the Doxygen
documentation.
An application who wishes to use it can follow this example:
#include "RunController/Barrier.h"
myMethod() {
daq::rc::Barrier a(partitionName, appName, subStateA_Name);
daq::rc::Barrier b(partitionName, appName, subStateB_Name);
try {
a.up();
b.up();
}
catch(ers::Issue &e) {
ers::fatal(e);
// do something or return;
}
// do whatever you need to do; if you encounter a fatal error, and you
// don't want all other applications to wait forever on you, lower the barrier with a.down()
// Now wait that the other applications have reached the synchronization point subStateA
try {
a.wait(timeout);
}
catch (ers::Issue & e) {
ers::fatal(e);
// do something or return;
}
// continue with what you need to do
//Now wait that the other applications have reached the synchronization point subStateB
try {
b.wait(timeout);
}
catch (ers::Issue & e) {
ers::fatal(e);
// do something or return;
}
// continue with what you need to do
return;
}
Errors
The RunController hardly takes any decisions when errors occurred.
Instead,
the corresponding error are forwarded to the Expert System (Online
Recovery) which
takes a decision accordingly. This decision currently depends on the
settings defined on the configuration database. Three fields are
available to the user to define the behaviour: IfFails, IfDies,
IfError. Note on restart: If the option restart is selected
for any of these fields and the restart fails repeatedly, it will cause
an error instead.
Timeouts
There are currently three different timeouts:
Action timeout - used for transition and standard commands
(RESTART, STOP, etc).
For a controller the actual transition timeout used is its
action-timeout plus the highest action timeout among its
children.
Short timeout - Used for killing applications and for testing.
Init timeout - After an application is started by a controller,
the former must send a pmg_sync to notify that it has started
correctly. Not doing this within the init timeout,
will cause an error on the controller side. This implies that the init
timeout should always be LESS than the ACTION timeout). This
synchronization is handled internally for all
RunControl applications and should not be done in the user code.
Known issues/bugs
... to be filled after testing...
Change of API planned for next major release!
For the next major release the API of the Controllable and UserRoutines will be changed, following the request from a number of users:
- Controllable.h : The std::string arguments passed to all methods will be substituted by std::vector. In this way user does not need
to do the equivalent parsing of the parameters in their code.
- UserRoutines.h : The equivalent substitution of std::string for std::vector will be made. The userAction(..) will be changed to
userAction(std::string command, std::vector arguments).
Applications
rc_setup
Description: Binary to start the basic infrastructure needed by the Root Controller.
Options/Arguments:
-p partition Name of the IPC partition (default $TDAQ_PARTITION)
-d database Name of the database (TDAQ_DB)
-s segmentname Name of the segment
-n controller Name of the controller
-R expertSystem Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.
run_controller
Description: The RunController is a general purpose control entity for the ATLAS Online infrastructure.
Options/Arguments:
-p partition Name of the IPC partition (default $TDAQ_PARTITION)
-P parentname Name of the parent
-s segmentname Name of the segment
-n controller Name of the controller
-u substates Library with the substates definition. The library name has to be given without the 'lib' prefix nor the '.so' extension.
-R expertSystem Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.
rc_test_controller
Description: Test unit for rc_controller binaries.
Options/Arguments:
-p partition IPC partition name (default $TDAQ_PARTITION)
-n controller Controller names
SFI
Changes since tdaq-02-00-00
-
Can now use ROBINs as data sources (a.k.a switch-based ROS mode)
-
The SFI is able to deal with the ROBINs as data-sources now.
When going though the ROSs, it asks each one of them, if
the data for the event builder are to be requested and taken out
from the ROS PC, or from the ROBINs directely.
It then uses the appropriate nodeIDs in the list of data sources.
This way, the list of data sources can be composed of a mixture of ROSs
and ROBINs.
-
So, the PC- or ROBIN-based readout is not any more a parameter in the SFIConfiguration,
but an attribute given by each ROSApplication.
-
Comply with the drop of the nameTag from the RawFileName constructor
SFO
Introduction
Sub Farm Output (SFO) is an application of the Event-Building / Data-Collection system;
it receives and writes to the appropriate files the events which have been
accepted by the Event Filter.
Changes since tdaq-02-00-01
-
Comply with the drop of the nameTag from the RawFileName constructor
system
Changes since tdaq-02-00-00
- Changes in the behavior of Executable::exec(const
param_collection ¶ms, const env_collection &envs):
- Now this method uses internally the execve system call; this means that
the child process will not inherit the parent environment but it will
be started
with the environment defined in envs;
- Here is a list of methods effected by this change:
- Executable::start(const
param_collection ¶ms, const env_collection &envs);
- Executable::pipe_out(const
param_collection ¶ms, const env_collection &envs, const
File &input_file, const File &output_file, const File
&error_file, mode_t perm).
training
Introduction
A new version of the Training Manual (Version 5.0) is available at:
$TDAQ_INST_PATH/share/doc/training/training_doc.pdf
General changes
- RodPanel exercise in the Gui panel chapter has been
updated according to the changes in igui
package.
- test_vme_interface
exercise in the Diagnostic Test chapter
has been updated according to the solution.
wmi
Introduction
Web Monitoring Interface, one of the software components of the TDAQ Software sub-system of the ATLAS TDAQ, is intended to give to remote users a view of the status of the data acquisition system and its sub-systems.
New features
- Remove output messages (when convert files) in DQ plug-in.
- Tree implementation use java.
- Compress files which copy to WEB server(use gzip).
Generated: Fri Apr 3 16:57:31 CEST 2009 by /afs/cern.ch/atlas/project/tdaq/cmt/adm/bin/do_release_notes (c)