Skip to content
fstagni edited this page Jun 22, 2017 · 12 revisions

ResourceStatusSystem

Main changes

Improvements to the Resource Status System (RSS) are the main enhancements coming together with DIRAC v6r18. From version v6r18, RSS can actively manage Sites and Computing Elements in addition to Storage Elements.

The RSS status for Sites and ComputingElements can now be used in the pilot submission logic for choosing destination resources. These statuses are used if the RSS status management is enabled with the flag:

/Operations/<setup>/ResourceStatus/Config/State=Active

In order to use RSS statuses for Sites and ComputingElements (and StorageElements), the latter must be synchronized with the CS information. This should be done with the following commands:

> dirac-rss-sync --element=Site --defaultStatus=Banned --init
> dirac-rss-sync --element=Resource --defaultStatus=Active --init

This will initialize the Sites in the RSS databases with the Banned default status unless the site is allowed in the old SiteMask logic (which could be found in the WMS JobDB). The ComputingElements are initialized all with the Active status because this status is not defined in the CS description.

The Site status can be altered as before by manual commands:

> dirac-admin-allow-site LCG.CERN.cern "Comment"
> dirac-admin-ban-site LCG.CERN.cern "Comment"

Policies

The below policy needs to be added in the CS under Operations > Defaults > ResourceStatus > Policies:

PropagationPolicy
{
  matchParams
  {
    element = Site
  }
  policyType = PropagationPolicy
}

This is needed in order to propagate Site status to all the dependent resources in case of the Site status change

Agents

A new agent should be installed, via dirac-admin-sysadmin-cli:

install agent ResourceStatus SiteInspectorAgent

APIs

A new "SiteStatus" API has been defined in https://github.com/DIRACGrid/DIRAC/blob/rel-v6r18/ResourceStatusSystem/Client/SiteStatus.py

The "ResourceStatus" API has been modified: https://github.com/DIRACGrid/DIRAC/blob/rel-v6r18/ResourceStatusSystem/Client/ResourceStatus.py

Code in vanilla DIRAC has been updated to reflect the changes: direct calls to it from DIRAC extensions should be done. More in details: the getStorageElementStatus method has been replaced by a generic getElementStatus method, so previous calls to:

ResourceStatus().getStorageElementStatus( elementName, statusType, default )

should be replaced by:

ResourceStatus().getElementStatus( elementName, elementType, statusType, default )

Where elementType, for the case of StorageElements, is, of course, StorageElement

DBs

The python code that interacts with the RSS DBs (ResourceStatusDB, and ResourceManagementDB) have been re-written using sqlalchemy. Eventual extensions may also need to be updated.

FTS

For compatibility with MySQL 5.7, the FTSHistoryView needs update:

alter view FTSHistoryView as select `FTSJob`.`Status` AS `Status`,sum(`FTSJob`.`Files`) AS `Files`,`FTSJob`.`TargetSE` AS `TargetSE`,(sum(`FTSJob`.`Completeness`) / count(distinct `FTSJob`.`FTSJobID`)) AS `Completeness`,sum(`FTSJob`.`FailedSize`) AS `FailedSize`,sum(`FTSJob`.`Size`) AS `Size`,sum(`FTSJob`.`FailedFiles`) AS `FailedFiles`,count(distinct `FTSJob`.`FTSJobID`) AS `FTSJobs`,`FTSJob`.`SourceSE` AS `SourceSE` from `FTSJob` where (`FTSJob`.`LastUpdate` > (utc_timestamp() - interval 3600 second)) group by `FTSJob`.`SourceSE`,`FTSJob`.`TargetSE`,`FTSJob`.`Status` ;

MessageQueueing resources

The machinery for the MQ systems support has been fully certified, and it's now out of the "Technology preview"

Accounting

Networking accounting:

The new NetworkAgent agent (http://dirac.readthedocs.io/en/integration/CodeDocumentation/AccountingSystem/Agent/NetworkAgent.html) is an optional Accounting agent that is used to consume perfSONAR (http://www.perfsonar.net/) network metrics, and to display them via a new "network" accounting plotting type. In the current implementation network metrics are received via a message queue.

In order for this agent to work properly, a message queue has to be specified, in CS, as per instructions in http://dirac.readthedocs.io/en/integration/AdministratorGuide/DIRACSites/MessageQueues/index.html For this purpose, in the agent configuration in /DIRAC/Systems section of the CS, the option

MessageQueueURI

should be specified, and its content define the queue host together with the topics you are interested to use:

netmon-mb.cern.ch::Topic::perfsonar.summary.packet-loss-rate
netmon-mb.cern.ch::Topic::perfsonar.summary.histogram-owdelay
Clone this wiki locally