-
Notifications
You must be signed in to change notification settings - Fork 176
RFC #5: Resources CS section structure
Authors: A.Tsaregorodtsev, R.Graciani
Last Modified: 04.04.2012
The /Resources section of the DIRAC configuration contains a description of all IT Resources available for the users of a given DIRAC installation. By IT Resources we understand the services providing access to computing, storage and all other necessary services (catalogs, databases,...) to build a functional distributed computing system, as well as the description of the resources themselves (capacity, limits,...). This section does not include high level DIRAC services, but it does include DIRAC interfaces to some of the above resources.
The current /Resources section in the DIRAC configuration was originally dedicated to the description of the Computing Elements. It has naturally evolved as the functionality of DIRAC has increased. Now the schema shows some limitations that we would like to overcome:
- There is not a clear dependence relation between some of the access points (resources) and the (sites) responsible for their operation. This is essential for properly monitoring the status of the infrastructure.
- When several communities or virtual organizations share the same DIRAC installation, there is no well defined way to express which resources are accessible to each of them.
- When several funding agencies or providers are supporting the resources at different sites it is not easy to avoid in the current schema the double definition of the sites and access points to the resources.
- Different communities, or even different setups for the same community, might want to define different access points for common resources like catalogs, FTS servers,... This is currently not possible.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of a Site as the main body responsible for the services offered to the user communities. Access to each resource is provided by a Site. The managers of the Site are those responsible for the resources availability as well as for defining the rules for accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact when needed. At the same time it provides a natural proximity relation between different types of resources that it is essential for DIRAC in order to optimize the scheduling.
The key concepts in the new schema are:
- Community (or Virtual Organization, VO): is a community of users that use the resources in a coordinated way. They are essential pieces in the DIRAC functionality since each Community may define its own policies for accessing and using the resources, within the limits allowed by the Sites. In the grid world Communities are called VOs.
- Site: is the central piece of the new schema both from a functional and an administrative point of view. On the one hand it is the entity that collects access points to resources that are related by locality in a functional sense, i.e. the storage at a given Site is consider local to the CPU at the same Site and this relation will be used by DIRAC. On the other hand, a Site must provide a single entry point responsible for the availability of the resources that it encompasses. In the DIRAC sense, a Site can be from a fraction of physical computer center, to a whole regional grid. It is the responsibility of the DIRAC administrator of the installation to properly define the sites. Not all Sites need to provide access to all VOs supported in the DIRAC installation.
- Domain: is a supra-site organization meaningful in the context of a given DIRAC installation. Domains might be related to funding of the resources (GISELA, WLCG,...), to administrative domains (NGIs) or any other reason relevant for the installation. They have no functional meaning in DIRAC and are only used for reporting purposes to group contributions beyond the Site level. Sites can belong to more than one Domain, in some cases exclusively but not necessarily. By default, all installations support the DIRAC domain.
- Resource Type: is the main category in which relevant IT Resources are grouped. At the moment the following types are relevant for this document: Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement.
The /Resources sections contain all the information about the Sites, the services they provide and resources behind those services. This section will include also the information about the Communities supported by the installation. The Domains are described in the same /Resources section. Another top level /Operations section ( not described in this document ) will allow the VOs to define how their resources are used.
With all the above in mind, the new schema of the /Resources section of the configuration is proposed as follows:
/Resources/Sites/[Site Name]/[Resource Type]/[Name Of Service]/[Type Of Access Point]/[Name Of Access Point] /Resources/Domains/[Domain Name]
The naming conventions and the usage of each of the levels and the relevant options that need to be defined are described in the following sections of this document.
In the DIRAC configuration Sites have names resulting from concatenation of name of Site and the country code according to the ISO 3166 standard with a dot as a separator: [Site].[co]. Together with the "Domain" information, in each particular context, the Full DIRAC Site Name becomes of the form: [Domain].[Site].[co]. Both the Site and the Domain names must be unique case insensitive alphanumeric strings with a possible use of the following characters "-", "_".
This convention will be enforced and Names not following it will not be usable.
They are defined, together with their contact information and other details, in the /Resources/Domains and /Registry/Communities sections of the configuration, respectively. They are essential for the use of the information kept in the /Resources/Sites section. Therefore, at each level of the tree under this section a list of supported "Domains" and served "Communities" can be defined, if different from the full list defined in the corresponding section. This information is inherited by all subsections.
For multi-VO installations, at each level, sections named after each of the supported Communities can be used to overwrite the common options of the parent section. This allows to define a different contact at a Site for a certain Community, or a different Port for a VOMS Server, or a different SpaceToken in a SRM Storage Element:
/Resources/Sites /CERN.ch ... /IN2P3.fr /ContactEmail = someone@somewhere /biomed /ContactEmail = someoneelse@somewhereelse /PIC.es ... /Resources/Domains /EGI /GISELA ... /Registry /Communities ....
Sites are usually grouped in larger infrastructures like Grids, Clouds, etc. or provisioned by different funding bodies like national or international grid projects. This grouping might not be exclusive and Sites might belong to more or one of this groups (or none). We propose to call this Domains and the information related to them is kept in the /Resources/Domains section, with one subsection for each Domain. Typical examples are:
/Resources/Domains /gLite /AMAZON /StratusLab /BOINC
The use of these Domains is mostly for reporting purposes, Accounting and Monitoring, and it is the responsibility of the administrator of the DIRAC installation to chose them in such a way that they are meaningful for the communities and for the computing resources served by the installation. In any case, DIRAC will always be a default Domain if nothing else is specified for a given resource.
Each resource section can have a Domains option which is a list of Domains providing this resource or to which the resource belongs by some other relation. The Domains option is not mandatory. If it is not present, the resource will be assigned to the DIRAC Domain when used.
Sites are providing access to the resources, therefore the /Resources/Sites section is the main place where the resources description is stored. At the next level there is a list of sections representing each of the Sites named following the short [Site].[co] convention as defined above:
/Resources/Sites /CERN.ch /IN2P3.fr /PIC.es
The subsection for each Site contains several options describing the Site as a whole, and a number of Sections describing the type of Resources it provides access to. At the moment this list includes Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement, but it can be extended in the future if necessary. The resulting section will look as follows:
/ContactEmail = someone@somewhere /WebURL = http://some.whe.re /Coordinates = /MoUTierLevel = 1 /Computing /... /Storage /... /...
This section contains information about the interfaces to access Computing resources at the Site. It can have options common to all Computing interfaces on the Site and each of these interfaces (ComputingElements) has its own subsection. The subsection name is at the same time the name of the ComputingElement ( CE ). The CE subsection contains the options describing the CE, for example:
.../Computing/some.cream.ce/ /CEType = CREAM /Host = some.cream.ce /Communities = VO1, VO2 /Domains = Grid1, Grid2 /Queues ...
Note that unlike the current CS the name of the CE sections is not necessarily the CE host name. However, if the host name is not given as an explicit Host option, the name of the CE section is interpreted as the host name. This details should be all taken care of by the Resources helper utility.
Notice that the CE section can contain a Communities option which is a comma separated listed of the Communities allowed to use the given CE. This list can be defined for the CE as a whole or for each Queue of the CE in the corresponding section. The Communities value defined for the Queue overrides the one in the CE section. Similarly to the Communities option, the Domains option is a list of Domains. A Site or CE or Queue can be contributing resources in the name of one or more Domains. This information will eventually allow to provide accounting per Domain.
Each ComputingElement section has Queues subsection which in turn contains subsections per named queue. The name of the queue section is interpreted as the queue name. The queue subsection contains queue options, for example:
.../Queues/cream-sge-long /MaxCPUTime = /SI00 = /MaxWaitingJobs = /MaxTotalJobs = /OutputURL = ...
The format of the names of the queues and the queue options can be different for different CE types.
The Storage section contains subsections per named Storage Element (SE). A named SE is an SE in the DIRAC sense, in other words, as seen by the DIRAC users. Different named SEs can point to the same StorageElement server, and make use of different options to upload/retrieve data from different backend storages. For instance a different base path or a different SRM Space token for different types of data. Different named SEs can also be used with identical definitions for the purpose of accounting classification. See below for SE naming convention. The SE section contains the options describing the SE as a whole, for example:
.../Storage/disk /BackendType = /ReadAccess = /WriteAcces = /AccessProtocols ...
In general the SE name is a logical name and not a hostname. See below for SE naming conventions. As for CEs, the SE section can contain Communities and Domains options that apply to each of the AccessProtocols. In turn the SEs can inherit those options from their Site.
The SE section contains an AccessProtocols subsection in which each subsection is dedicated to one access point description. For example:
.../AccessProtocols/SRM /Host = /Port = /Protocol = /Path = ...
The options might be different for different protocols
In DIRAC SE names follow the convention [SiteName]-[SEQualifier]. Examples: CERN-disk, IN2P3-USER. To avoid typos and to enforce this convention, in the CS only the SEQualifier is used in the Storage section. Based on the parent site name, the full name is build by the Helper tools. The full SE name must be used everywhere when the SE is referenced.
This section contains description of the configured File Catalogs. This includes third party catalogs, e.g. LcgFileCatalog, but also DIRAC File Catalogs. The name of the section is a unique File Catalog name ( and not its type ). The options of the FileCatalog include, for example:
.../DIRACFileCatalog /CatalogType = /CatalogMode = /Host = /Port = ...
If the CatalogType option is not given, the section name is interpreted as the Catalog Type. The CatalogMode refers to Replica or Metadata or both. In this way the FileCatalog class can properly chose which one to use for different methods and getReplicasWithMetadata() can be issued with AMGA/LFC, DFC/LFC or DFC installations.
We will have to find the way to discover the configuration for a LcgFileCatalogCombined. Probably at this level they are all Catalogs of Type LcgFileCatalog, and the Combined is defined in the Operations section of the VO. As for the SEs, the full name to refer to catalogs will be [Site]-[CatalogName].
This section includes the description of third party transfer services like FTS. It contains a number of sections, one per server available (currently only FTS is available, but there are long term plans to provide a service with similar functionality in DIRAC, other file transfer services also exists in the market). We follow the same convention as in the previous section, subsections are given a unique name:
.../FileTransfer/FTS /TranferType = FTS /Status = /URL = /Channels ...
If the TranferType option is not given, the section name is interpreted as the TransferType. Each Type of server will have its own set of Options, apart from the default ones to report the Status. For each server there could be section Channels defining the transfer channels that are supported in the server. As for the SEs, the full name to refer to a transfer server will be [Site]-[FileTransferName].
This section describes Community Management services like the VOMS servers:
.../CommunityManagement/VOMS /Type = VOMS /Host = /Status = /Port = ...
For VOMS servers and multi-VO installations, there is a Section per VO that holds the specific Port for each VO.
This section describes instances of the database servers available in the installation, like the Oracle Conditions database. "Conditions" is LHCb specific. In principle, there is nothing specific to "Conditions" in this subsection and it describes generic database access parameters, for example:
.../Database/ConditionsDB /Type = ConditionsDB /Connection = /User = /Password =
We can foresee a DIRAC Service exposing access to a MySQL server using the DISET protocol with a similar functionality as the current MySQL class. As for the SEs, the full name to refer to a transfer server will be [Site]-[DBName].
The proposed CS schema can be used directly by the RSS in its internal Resources mapping. In most cases it corresponds to the three levels of the resources hierarchy which can be loosely described by the schema: Sites->Resources->Nodes . In this case Resources are CEs, SEs, etc; Nodes are Queues, AccessProtocols, Channels, etc.