-
Notifications
You must be signed in to change notification settings - Fork 176
RFC #5: Resources CS section structure
(New Proposal)
The /Resources section of the DIRAC configuration contains a description of all IT Resources available for the users of a given DIRAC installation. By IT Resources we understand the services providing access to computing, storage and all other necessary services (catalogs, databases,...) to build a functional distributed computing system, as well as the description of the resources themselves (capacity, limits,...). This section does not include high level DIRAC services, but it does include DIRAC interfaces to some of the above resources.
The current /Resources section in the DIRAC configuration was originally dedicated to the description of the Computing Elements. It has naturally evolved as the functionality of DIRAC has increased. Now the schema shows some limitations that we would like to overcome:
- There is not a clear dependence relation between some of the access points (resources) and the (sites) responsible for their operation. This is essential for a proper monitoring of the status of the infrastructure.
- When several communities or virtual organizations share the same DIRAC installation, there is not a well defined way to express which resources are accessible to each of them.
- When several funding bodies or providers are supporting the resources at the different sites it is not easy to avoid in the current schema the double definition of the sites and access points to the resources.
- Different communities, or even different setups for the same community, might want to define different access points for transversals resources like catalogs, FTS servers,... This is currently not possible.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of Site as the main body responsible for the services offered to the user communities. Access to each resource is provided by a Site. The managers of the Site are those responsible for the resources availability as well as for the rules of accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact when needed. At the same time it provides a natural proximity relation between different types of resources that it is essential for DIRAC in order to optimize the scheduling.
The key concepts in the new schema are:
- Virtual Organization (VO): represents a community of users that use the resources in a coordinated way. They are essential pieces in the DIRAC functionality since each VO may define its own policies for accessing and using the resources, within the limits allowed by the Sites.
- Site: is the central piece of the new schema both from a functional and an administrative point of view. On the one hand it is the entity that collects access points to resources that are related by locality in a functional sense, i.e. the storage at a given Site is consider local to the CPU at the same Site and this relation will be used by DIRAC. On the other hand, a Site must provide a single entry point responsible for the availability of the resources that it encompasses. In the DIRAC sense, a Site can be from a fraction of physical computer center, to a whole regional grid. It is the responsibility of the DIRAC administrator of the installation to properly define the sites. Not all Sites need to provide access to all VOs supported in the DIRAC installation.
- Domain: is a supra-site organization meaningful in the context of a given DIRAC installation. Domains might be related to funding of the resources (GISELA, WLCG,...), to administrative domains (NGIs) or any other reason relevant for the installation. They have no functional meaning in DIRAC and are only used for reporting purposes to group contributions beyond the Site level. Sites can belong to more than one Domain, in some cases exclusively but not necessarily. By default, all installation support the DIRAC domain.
- Resource Type: is the main category in which relevant IT Resources are grouped. At the moment the following types are relevant for this document: Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement. (I have tried to make the names uniform, avoiding acronyms an meaningless sufixes like Service, Element,...)
While the /Resources sections contain all the information about the Sites, and the services they provide and resources behind those services, the /Registry section will include the information about the Communities relevant for the installation, the Domains are described in the same /Resource section, and the /Operations section will allow the VOs to define the way their resources are used.
With all the above in mind, the new schema of the Resources section of the configuration becomes:
/Resources/Sites/[Site Name]/[Resource Type]/[Name Of Service]/[Type Of Access Point]/[Name Of Access Point] /Resources/Domains/[Domain Name]
The naming conventions and the usage of each of the levels and the relevant options that need to be defined are described in the following sections of this document.
In the DIRAC configuration Sites have a name resulting of the concatenation with a middle "." of name of Site and the country code according to the ISO 3166 standard: [Site].[co]. Together with the "Domain" information, in each particular context, the Full DIRAC Site Name becomes of the form: [Domain].[Site].[co]. Both the Site and the Domain names must be unique case insensitive alphanumeric strings with possible used of the following characters "-", "_".
This convention will be enforced and Names not following it will not be usable.
They are defined, together with their contact information and other details, in the /Resources/Domains and /Registry/Communities sections of the configuration, respectively. They are essential for the use of the information kept in the /Resources/Sites section. Therefore, at each level of the tree under this section a list of supported "Domains" and served "Communities" can be defined, if different from the full list defined in the corresponding section. This information is inherited by all subsections.
For multi-VO installations, at each level, sections named after each of the supported Communities can be used to overwrite the common options of the parent section. This allows to define a different contact at a Site for a certain Community, or a different Port for a VOMS Server, or a different SpaceToken in a SRM Storage Element:
/Resources/Sites /CERN.ch ... /IN2P3.fr /ContactEmail = someone@somewhere /biomed /ContactEmail = someoneelse@somewhereelse /PIC.es ... /Resources/Domains /EGI /GISELA ... /Registry /Communities ....
Sites are usually grouped in larger infrastructure like Grids, Clouds, etc. or provisioned by different funding bodies like national or international grid projects. This grouping might not be exclusive and Sites might belong to more or one of this groups (or none). We propose to call this Domains and the information related to them is kept in the /Resources/Domains section, with one subsection for each Domain. Typical examples are:
/Resources/Domains /gLite /AMAZON /StratusLab /BOINC
The use of these Domains is mostly for report purposes, Accounting and Monitoring, and it is the responsibility of the administrator of the installation to chose them in such a way that are meaningful for the communities that are using the installation. In any case, DIRAC will always be a default Domain is nothing else is specified for a given resource. This section is not mandatory, it is not present all resources will be assigned to the DIRAC Domain when being used. The Domains list is part of the Site resources definition.
Sites are providing the access to the Resources, therefore the /Resources/Sites section is the main place holder of information. At the next level there is a list of section representing each of the Sites named following the short [Site].[co] defined above:
/Resources/Sites /CERN.ch /IN2P3.fr /PIC.es
The subsection for each Site contains several options describing the Site as a whole, and a number of Sections describing the type of Resources it provides access to. At the moment this list includes Computing, Storage, Catalog, FileTransfer, Database, CommunityManagement, but it can be extended in the future if necessary. The resulting section will look as follows:
/ContactEmail = someone@somewhere /WebURL = http://some.whe.re /Coordinates = /MoUTierLevel = 1 /Computing /... /Storage /... /...
This section contains information about the interfaces to access Computing resources at the Site. It can have options common to all Computing interfaces on the Site and each of these interfaces (ComputingElement) has its own subsection. The subsection name is at the same time the name of the ComputingElement, CE. The CE subsection contains the options describing the CE, for example:
.../Computing/some.cream.ce/ /CEType = CREAM /HostName = some.cream.ce /Communities = VO1, VO2 /Domains = Grid1, Grid2 /Queues ...
Note that unlike the current CS the name of the CE sections is not necessarily the CE host name. However, if the host name is not given as an explicit HostName option, the name of the CE section is interpreted as the host name. This details should be all taken care of by the Resources helper utility.
It is important that the CE section can contain a Communities option which is a comma separated listed of the Communities allowed for the given CE. This list can be defined for the CE as a whole or for each Queue of the CE in the corresponding section. The Communities value defined for the Queue overrides the one in the CE section. Similarly to the Communities option, the Domains option is a list of domains. A Site or CE or Queue can be contributing resources in the name of one or more Domains. This information will eventually allow to provide accounting per Domain.
(this is the end of the new proposal up to now)
The /Resources section of the DIRAC configuration contains descriptions of all the computing resources available for a given DIRAC installation. By the computing resources we understand the services of various providers giving access to computing and storage capacity as well as other services needed to build distributed computing systems. The current /Resources section in the DIRAC configuration originally was dedicated to the description of the Computing Elements only. In the new proposal, it is meant to provide the configuration information about all kinds of computing resources and services offered by third party providers.
The main idea is to structure the section by the logic of the resources provisioning. Therefore, it is based on the notion of Site as the main body responsible for the services offered to the user community. Each resource ( computing resource or a service, e.g. file catalog ) belongs to a Site. The Site managers are those responsible for the resources availability as well as for the rules of accessing and sharing them. Organizing resources by Sites gives a clear administrative information about whom to contact in case of problems.
Sites are usually grouped in larger infrastructure like Grids, Clouds, etc. This is reflected in the /Resources as the top level categorization. We propose to call this level Grid. Typical examples are:
/Resources/gLite /AMAZON /StratusLab /BOINC ...
The high level Grids are large administrative domains consisting typically of multiple sites and having common top level services like information systems, problem tracking systems, etc. Usually ( although not always ) Sites are belonging to a single top level Grid, for example, a particular grid infrastructure. However, some Sites can belong to two or more "Providers" by participation in several grid infrastructures. However, for administration and accounting purposes, user payloads are usually executed on such Sites in the context of a particular Provider. It is likely that the Site contacts are different for different Provider infrastructures, Sites are publishing their resources data in different information systems. Therefore, the *Providers" list is part of the Site resources definition.
Provider subsection can contain options specific for the Provider as a whole, for example, access details for the top level information system.
Each Site has a dedicated subsection in the CS. For example:
/Resources/EGI/EGI.CERN.ch /EGI.IN2P3.ch ...
The Site subsection name is the same as the Site name itself ( see the below description of the Site naming rules ). The Site subsections contain several options describing the Site as a whole:
ContactEmail Coordinates MoUTierLevel ...
The subsections in each Site section are Resource Types. The possible Resource Types are:
/Resources/EGI/EGI.CERN.ch/ComputingElements /StorageElements /FileCatalogs /FTSServers /ConditionsDatabases /VOMSServers ...
More Resource Types will be added as new services will become available.
In DIRAC the Site naming convention is strict. The Site name consists of three mandatory parts separated by a dot. The first part is the name of the Provider infrastructure. The second part is the Site unique name itself. The last part is the country code according to the ISO 3166 standard. The first and the last parts of the Site name are quite straightforward. The middle part must reflect the "physical" Site name. This means that the same Site participating to different Grid or Provider infrastructures must have the same "physical", middle part name. This is sometimes difficult to achieve as there is no definite way to determine that a Site is the "same" in two different Provider infrastructure, although in most cases this is still possible by using some extra information, Site contacts, etc. Having the "physical" middle part Site name unique will allow eventually to treat Sites properly across the Provider boundaries, for example, for accounting purposes.
This section contains information about the ComputingElements. It can have options common to all the CEs on the Site. The subsections are dedicated to each distinct CE. The subsection name is at the same time the name of the CE. The CE subsection contains the options describing the CE, for example:
.../ComputingElements/CREAM05/CEType /Host /Architecture /OS /SubmissionMode /VOs /Providers ...
Note that unlike the current CS the name of the CE sections is not necessarily the CE host name. However, of the host name is not given as an explicit Host option, the name of the CE section is interpreted as the host name. This details should be all taken care of by the Resources helper utility.
It is important that the CE section can contain a VOs option which is a comma separated listed of the VOs allowed for the given CE. This list can be defined for the CE as a whole or for each Queue of the CE in the corresponding section. The VOs value defined for the Queue overrides the one in the CE section. Similarly to the VOs option, the Providers option is a list of providers of large scale computing infrastructures, for example, EGI, WLCG, GISELA, NDG, etc. A Site or CE or Queue can be contributing resources in the name of one or more providers. This information will allow to provide accounting per provider eventually.
Each ComputingElement section has Queues subsection which in turns contains subsections per named queue. The name of the queue section is interpreted as the queue name. The queue subsection contains queue options, for example:
.../ComputingElements/CREAM05/CEType/Queues/cream-sge-long/MaxCPUTime /SI00 /MaxWaitingJobs /MaxTotalJobs /OutputURL ...
The queue options can be different for different CE types.
The StorageElements section contains subsections per named Storage Element ( SE ). The name of the SE subsection is interpreted as the name of the SE (see below about the SE naming convention ). The SE section contains options applicable to the SE as a whole, for example:
.../StorageElements/CERN-disk/SEType /BackendType /ReadAccess /WriteAcces
The SE section contains AccessProtocols subsection in which each subsection is dedicated to one access point description. For example:
.../StorageElements/CERN-disk/AccessProtocols/SRM/Host /Port /Protocol /Path ...
The SE names are given as <SiteName>-<qualifier>. Examples: CERN-disk, IN2P3-USER. The <SiteName> is the same as in the CS Site name ( see above ).
This section contains description of the configured File Catalogs. This includes third party catalogs, e.g. LcgFileCatalog, but also DIRAC File Catalogs. The name of the section is a unique File Catalog name ( and not its type ). The options of the FileCatalog include, for example:
.../DIRACFileCatalog/CatalogType /Host /Port ...
If the CatalogType option is not given, the section name is interpreted as the Catalog Type.
This section describes FTS server end points. The section options include, for example:
.../FTS-CERN/URL
The FTSServers section can have a subsection Channels with definitions of configured FTS channels.
This section describes the VOMS servers. It has subsections per VO, since we have separate VOMS servers for each VO.
This section describes instances of the Oracle Conditions databases. "Conditions" are LHCb specific. In principle, there is nothing specific to "Conditions" in this subsection and it describes generic database access parameters, for example:
.../Databases/ConditionsDB/Connection /User /Password (?)
The proposed CS schema can be used directly by the RSS in its internal Resources mapping. In most cases it corresponds to the four levels of the resources hierarchy which can be loosely described by the schema: Grids->Sites->Resources->Nodes . In this case Resources are CEs, SEs, etc; Nodes are Queues, AccessProtocols, Channels, etc.