-
Notifications
You must be signed in to change notification settings - Fork 176
DIRAC v6r18
Improvements to the Resource Status System (RSS) are the main enhancements coming together with DIRAC v6r18. From version v6r18, RSS can actively manage Sites and Computing Elements in addition to Storage Elements.
The RSS status for Sites and ComputingElements can now be used in the pilot submission logic for choosing destination resources. These statuses are used if the RSS status management is enabled with the flag:
/Operations/<setup>/ResourceStatus/Config/State=Active
In order to use RSS statuses for Sites and ComputingElements (and StorageElements), the latter must be synchronized with the CS information. This should be done with the following commands:
> dirac-rss-sync --element=Site --defaultStatus=Banned --init > dirac-rss-sync --element=Resource --defaultStatus=Active --init
This will initialize the Sites in the RSS databases with the Banned default status unless the site is allowed in the old SiteMask logic (which could be found in the WMS JobDB). The ComputingElements are initialized all with the Active status because this status is not defined in the CS description.
The Site status can be altered as before by manual commands:
> dirac-admin-allow-site LCG.CERN.cern "Comment" > dirac-admin-ban-site LCG.CERN.cern "Comment"
The below policy needs to be added in the CS under Operations > [Defaults or Setup] > ResourceStatus > Policies:
PropagationPolicy { matchParams { element = Site } policyType = PropagationPolicy }
This is needed in order to propagate Site status to all the dependent resources in case of the Site status change. Other policies can also be added (like the "AlwaysActivePolicy" and the "DowntimePolicy").
If there are no policies enabled for sites, then the SiteInspectorAgent, when installed, will force the status of the sites to "Unknown".
A new agent should be installed, via dirac-admin-sysadmin-cli:
install agent ResourceStatus SiteInspectorAgent
A new "SiteStatus" API has been defined in https://github.com/DIRACGrid/DIRAC/blob/rel-v6r18/ResourceStatusSystem/Client/SiteStatus.py
The "ResourceStatus" API has been modified: https://github.com/DIRACGrid/DIRAC/blob/rel-v6r18/ResourceStatusSystem/Client/ResourceStatus.py
Code in vanilla DIRAC has been updated to reflect the changes: direct calls to it from DIRAC extensions should be done. More in details: the getStorageElementStatus method has been replaced by a generic getElementStatus method, so previous calls to:
ResourceStatus().getStorageElementStatus( elementName, statusType, default )
should be replaced by:
ResourceStatus().getElementStatus( elementName, elementType, statusType, default )
Where elementType, for the case of StorageElements, is, of course, StorageElement
The python code that interacts with the RSS DBs (ResourceStatusDB, and ResourceManagementDB) have been re-written using sqlalchemy. Eventual extensions may also need to be updated.
For compatibility with MySQL 5.7, the FTSHistoryView needs update:
alter view FTSHistoryView as select `FTSJob`.`Status` AS `Status`,sum(`FTSJob`.`Files`) AS `Files`,`FTSJob`.`TargetSE` AS `TargetSE`,(sum(`FTSJob`.`Completeness`) / count(distinct `FTSJob`.`FTSJobID`)) AS `Completeness`,sum(`FTSJob`.`FailedSize`) AS `FailedSize`,sum(`FTSJob`.`Size`) AS `Size`,sum(`FTSJob`.`FailedFiles`) AS `FailedFiles`,count(distinct `FTSJob`.`FTSJobID`) AS `FTSJobs`,`FTSJob`.`SourceSE` AS `SourceSE` from `FTSJob` where (`FTSJob`.`LastUpdate` > (utc_timestamp() - interval 3600 second)) group by `FTSJob`.`SourceSE`,`FTSJob`.`TargetSE`,`FTSJob`.`Status` ;
The machinery for the MQ systems support has been fully certified, and it's now out of the "Technology preview"
The new NetworkAgent agent (http://dirac.readthedocs.io/en/integration/CodeDocumentation/AccountingSystem/Agent/NetworkAgent.html) is an optional Accounting agent that is used to consume perfSONAR (http://www.perfsonar.net/) network metrics, and to display them via a new "network" accounting plotting type. In the current implementation network metrics are received via a message queue.
In order for this agent to work properly, a message queue has to be specified, in CS, as per instructions in http://dirac.readthedocs.io/en/integration/AdministratorGuide/DIRACSites/MessageQueues/index.html For this purpose, in the agent configuration in /DIRAC/Systems section of the CS, the option
MessageQueueURI
should be specified, and its content define the queue host together with the topics you are interested to use:
netmon-mb.cern.ch::Topic::perfsonar.summary.packet-loss-rate netmon-mb.cern.ch::Topic::perfsonar.summary.histogram-owdelay