-
Notifications
You must be signed in to change notification settings - Fork 176
RFC #5: Resources CS section structure
The /Resources section of the DIRAC configuration contains descriptions of all the computing resources available for a given DIRAC installation. By the computing resources we understand the services of various providers giving access to computing and storage capacity as well as other services needed to build distributed computing systems. The current /Resources section in the DIRAC configuration originally was dedicated to the description of the Computing Elements only. In the new proposal, it is meant to provide the configuration information about all kinds of computing resources and services offered by third party providers.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of Site as the main body responsible for the services offered to the user community. Each resource ( computing resource or a service, e.g. file catalog ) belongs to a Site. The Site managers are those responsible for the resources availability as well as for the rules of accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact in case of problems.
(New Proposal for the introduction and now continues with the rest of the doc.)
The /Resources section of the DIRAC configuration contains a description of all IT Resources available for the users of a given DIRAC installation. By IT Resources we understand the services providing access to computing, storage and all other necessary services (catalogs, databases,...) to build a functional distributed computing system, as well as the description of the resources themselves (capacity, limits,...). This section does not include high level DIRAC services, but it does include DIRAC proprietary interfaces to some of the above resources.
The current /Resources section in the DIRAC configuration was originally dedicated to the description of the Computing Elements. It has naturally evolved as the functionality of DIRAC has increased. Now the schema shows some limitations that we would like to overcome:
- There is not a clear dependence relation between some of the access points (resources) and the (sites) responsible for their operation. This is essential for a proper monitoring of the status of the infrastructure.
- When several communities or virtual organizations share the same DIRAC installation, there is not a well defined way to express which resources are accessible to each of them.
- When several funding bodies or providers are supporting the resources at the different sites it is not easy to avoid in the current schema the double definition of the sites and access points to the resources.
- Different communities, or even different setups for the same community, might want to define different access points for transversals resources like catalogs, FTS servers,... This is currently not possible.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of Site as the main body responsible for the services offered to the user communities. Access to each resource is provided by a Site. The managers of the Site are those responsible for the resources availability as well as for the rules of accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact when needed. At the same time it provides a natural proximity relation between different types of resources that it is essential for DIRAC in order to optimize the scheduling.
The key concepts in the new schema are:
- Virtual Organization (VO): represents a community of users that use the resources in a coordinated way. They are essential pieces in the DIRAC functionality since each VO may define its own policies for accessing and using the resources, within the limits allowed by the Sites.
- Site: is the central piece of the new schema both from a functional and an administrative point of view. On the one hand it is the entity that collects access points to resources that are related by locality in a functional sense, i.e. the storage at a given Site is consider local to the CPU at the same Site and this relation will be used by DIRAC. On the other hand, a Site must provide a single entry point responsible for the availability of the resources that it encompasses. In the DIRAC sense, a Site can be from a fraction of physical computer center, to a whole regional grid. It is the responsibility of the DIRAC administrator of the installation to properly define the sites. Not all Sites need to provide access to all VOs supported in the DIRAC installation.
- Domain: is a supra-site organization meaningful in the context of a given DIRAC installation. Domains might be related to funding of the resources (GISELA, WLCG,...), to administrative domains (NGIs) or any other reason relevant for the installation. They have no functional meaning in DIRAC and are only used for reporting purposes to group contributions beyond the Site level. Sites can belong to more than one Domain, in some cases exclusively but not necessarily. By default, all installation support the DIRAC domain.
- Resource Type: is the main category in which relevant IT Resources are grouped. At the moment the following types are relevant for this document: Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement. (I have tried to make the names uniform, avoiding acronyms an meaningless sufixes like Service, Element,...)
While the /Resources sections contain all the information about the Sites, the services they provide and resources behind those services, the /Registry section will include the information about the Communities and Domains relevant for the installation, and the /Operations section will allow the VOs to define the way their resources are used.
With all the above in mind, the new schema of the Resources section of the configuration becomes:
/Resources/[Site]/[Resource Type]/[Name Of Service]/[Type Of Access Point]/[Name Of Access Point]
The naming conventions an the usage of each of the levels and the relevant options that need to be defined are described in the following sections of this document.
In the DIRAC configuration Sites have a name resulting of the concatenation with a middle "." of name of Site and the country code according to the ISO 3166 standard: [Site].[co]. Together with the "Domain" information, in each particular context, the Full DIRAC Site Name becomes of the form: [Domain].[Site].[co]. Both the Site and the Domain names must be unique case insensitive alphanumeric strings with possible used of the following characters "-", "_".
This convention will be enforced and Names not following it will not be usable.
They are defined, together with their contact information and other details, in the /Registry section of the configuration. But are essential information for the use of the information kept in the /Resources section. Therefore, at each level of the tree a list of supported "Domains" and "VOs" can be defined, if different from the full list defined under /Registry. This information is inherited by all subsections.
For multi-VO installations, at each level, sections named after each of the supported VOs can be used to overwrite the common options of the parent section. This allows to define a different contact at a Site for a certain VO, a different Port per VO for a VOMS Server, or a different SpaceToken in a SRM Storage Element:
/Resources /CERN.ch /IN2P3.fr /ContactEmail = someone@somewhere /biomed /ContactEmail = someoneelse@somewhereelse /PIC.es
(this is the end of the new proposal up to now)
Sites are usually grouped in larger infrastructure like Grids, Clouds, etc. This is reflected in the /Resources as the top level categorization. We propose to call this level Grid. Typical examples are:
/Resources/gLite /AMAZON /StratusLab /BOINC ...
The high level Grids are large administrative domains consisting typically of multiple sites and having common top level services like information systems, problem tracking systems, etc. Usually ( although not always ) Sites are belonging to a single top level Grid, for example, a particular grid infrastructure. However, some Sites can belong to two or more "Providers" by participation in several grid infrastructures. However, for administration and accounting purposes, user payloads are usually executed on such Sites in the context of a particular Provider. It is likely that the Site contacts are different for different Provider infrastructures, Sites are publishing their resources data in different information systems. Therefore, the *Providers" list is part of the Site resources definition.
Provider subsection can contain options specific for the Provider as a whole, for example, access details for the top level information system.
Each Site has a dedicated subsection in the CS. For example:
/Resources/EGI/EGI.CERN.ch /EGI.IN2P3.ch ...
The Site subsection name is the same as the Site name itself ( see the below description of the Site naming rules ). The Site subsections contain several options describing the Site as a whole:
ContactEmail Coordinates MoUTierLevel ...
The subsections in each Site section are Resource Types. The possible Resource Types are:
/Resources/EGI/EGI.CERN.ch/ComputingElements /StorageElements /FileCatalogs /FTSServers /ConditionsDatabases /VOMSServers ...
More Resource Types will be added as new services will become available.
In DIRAC the Site naming convention is strict. The Site name consists of three mandatory parts separated by a dot. The first part is the name of the Provider infrastructure. The second part is the Site unique name itself. The last part is the country code according to the ISO 3166 standard. The first and the last parts of the Site name are quite straightforward. The middle part must reflect the "physical" Site name. This means that the same Site participating to different Grid or Provider infrastructures must have the same "physical", middle part name. This is sometimes difficult to achieve as there is no definite way to determine that a Site is the "same" in two different Provider infrastructure, although in most cases this is still possible by using some extra information, Site contacts, etc. Having the "physical" middle part Site name unique will allow eventually to treat Sites properly across the Provider boundaries, for example, for accounting purposes.
This section contains information about the ComputingElements. It can have options common to all the CEs on the Site. The subsections are dedicated to each distinct CE. The subsection name is at the same time the name of the CE. The CE subsection contains the options describing the CE, for example:
.../ComputingElements/CREAM05/CEType /Host /Architecture /OS /SubmissionMode /VOs /Providers ...
Note that unlike the current CS the name of the CE sections is not necessarily the CE host name. However, of the host name is not given as an explicit Host option, the name of the CE section is interpreted as the host name. This details should be all taken care of by the Resources helper utility.
It is important that the CE section can contain a VOs option which is a comma separated listed of the VOs allowed for the given CE. This list can be defined for the CE as a whole or for each Queue of the CE in the corresponding section. The VOs value defined for the Queue overrides the one in the CE section. Similarly to the VOs option, the Providers option is a list of providers of large scale computing infrastructures, for example, EGI, WLCG, GISELA, NDG, etc. A Site or CE or Queue can be contributing resources in the name of one or more providers. This information will allow to provide accounting per provider eventually.
Each ComputingElement section has Queues subsection which in turns contains subsections per named queue. The name of the queue section is interpreted as the queue name. The queue subsection contains queue options, for example:
.../ComputingElements/CREAM05/CEType/Queues/cream-sge-long/MaxCPUTime /SI00 /MaxWaitingJobs /MaxTotalJobs /OutputURL ...
The queue options can be different for different CE types.
The StorageElements section contains subsections per named Storage Element ( SE ). The name of the SE subsection is interpreted as the name of the SE (see below about the SE naming convention ). The SE section contains options applicable to the SE as a whole, for example:
.../StorageElements/CERN-disk/SEType /BackendType /ReadAccess /WriteAcces
The SE section contains AccessProtocols subsection in which each subsection is dedicated to one access point description. For example:
.../StorageElements/CERN-disk/AccessProtocols/SRM/Host /Port /Protocol /Path ...
The SE names are given as <SiteName>-<qualifier>. Examples: CERN-disk, IN2P3-USER. The <SiteName> is the same as in the CS Site name ( see above ).
This section contains description of the configured File Catalogs. This includes third party catalogs, e.g. LcgFileCatalog, but also DIRAC File Catalogs. The name of the section is a unique File Catalog name ( and not its type ). The options of the FileCatalog include, for example:
.../DIRACFileCatalog/CatalogType /Host /Port ...
If the CatalogType option is not given, the section name is interpreted as the Catalog Type.
This section describes FTS server end points. The section options include, for example:
.../FTS-CERN/URL
The FTSServers section can have a subsection Channels with definitions of configured FTS channels.
This section describes the VOMS servers. It has subsections per VO, since we have separate VOMS servers for each VO.
This section describes instances of the Oracle Conditions databases. "Conditions" are LHCb specific. In principle, there is nothing specific to "Conditions" in this subsection and it describes generic database access parameters, for example:
.../Databases/ConditionsDB/Connection /User /Password (?)
The proposed CS schema can be used directly by the RSS in its internal Resources mapping. In most cases it corresponds to the four levels of the resources hierarchy which can be loosely described by the schema: Grids->Sites->Resources->Nodes . In this case Resources are CEs, SEs, etc; Nodes are Queues, AccessProtocols, Channels, etc.