Skip to content

WMStats Server REST APIs

Alan Malta Rodrigues edited this page Nov 14, 2022 · 1 revision

This wiki provides some instructions and documentation on the WMStats Server 2 RESTful APIs.

A general requirement of the WMCore REST framework is that clients must provide an Accept HTTP header in their request. Hence, if the client wants to retrieve data in a JSON format, it needs to provide the following HTTP header request: Accept: application/json.

IMPORTANT: WMStats Server serves data from its local cache, instead of always contacting the database backend (CouchDB) for the actual data. WMStats Server has a cache update polling of 10min, so multiple queries within a 10min range will likely not deliver different data and they are highly discouraged.

How to request compressed data through the WMCore REST APIs

Starting in the WMCore release HG2211 - from November, 2022 - using WMCore version 2.1.4, the capability of gzip compressed response has been added to the WMCore REST framework. End users are invited to request compressed data, especially for heavy APIs transferring (many) megabytes of data, including most of the wmstatsserver RESTful APIs.

When the user is creating their HTTP request, an extra HTTP header has to be provided to communicate to the WMCore server that gzip'ed content is accepted by the client. The user has to provide this key/value parameter in their HTTP request: Accept-Encoding: gzip.

This does not necessarily mean that the server will, so the user must check the HTTP response headers to decide how to read the response body, which might or not be compressed. In case the server has responded with compressed data, the following HTTP response header will be sent back to the client Content-Encoding: gzip, flagging that that response body is in a binary/compressed format.

In order to decompress the body data, the client can use the gzip third-party python library and decompress the data as:

gzip.decompress(body)

If HTTP requests are made with the curl Unix tool, the same header has to be provided and the output data can be redirected to a file, example:

curl -L -k --cert $X509_USER_CERT --key $X509_USER_KEY --cacert $X509_USER_CERT https://cmsweb.cern.ch/wmstatsserver/data/info -vvv -H "Accept: application/json" -H "Accept-Encoding: gzip" > out.data

now to see the content of out.data, one can use the zcat tool, example:

zcat out.data

To summarize the use of gzip, the client needs to provide the correct Accept-Encoding HTTP request header and when parsing the HTTP response object, a check for the HTTP response header Content-Encoding is required to decide how to deal with that object.

To retrieve the data from Active requests (active requests mean request whose states are not "*-archived")

GET /wmstatsserver/data/filtered_requests?[key]=[value]&mask=[key] HTTP/1.1
Accept: application/json
Host: cmsweb.cern.ch
  • 'key' is the property in the request document (i.e. RequestStatus, PrepID, etc) 'value' is the specific value matches the property value in the request document. (key and value are case sensitive)

  • for the same keys, it works as 'or' operator i.e.) RequestStatus=running-open&RequestStatus=running-closed will select requests where the RequestStatus is "running-open" OR "running-closed"

  • from the different keys, it works as 'and' operator i.e) RequestStatus=running-open&PrepID=ABC will select request where the RequestStatus is "running-open" AND PrepID is ABC.

  • mask controls output property. It will returns specified output by mask, i.e.) mask=Campaign&mask=PrepID will return only Campaign and PrepID (also RequestName is always returned without setting the mask explicitly)

  • if the key specified for mask doesn't exists it returns null for that key

  • example

https://cmsweb.cern.ch/wmstatsserver/data/filtered_requests?RequestStatus=new&RequestStatus=assignment-approvedCampaign=PhaseIIFall16LHEGS82&mask=MCPileup&mask=Campaign

will return something like this

{"result": [
 {
  "MCPileup": null, 
  "RequestStatus": "new", 
  "RequestName": "pdmvserv_task_HIG-PhaseIIFall16LHEGS82-00018__v1_T_170228_162325_5100", 
  "Campaign": "PhaseIIFall16LHEGS82"
},{
  "MCPileup": [
    "/MinBias_TuneCUETP8M1_14TeV-pythia8/PhaseIIFall16GS82-90X_upgrade2023_realistic_v1-v1/GEN-SIM"
  ], 
  "RequestStatus": "assignment-approved", 
  "RequestName": "pdmvserv_task_HIG-PhaseIIFall16LHEGS82-00021__v1_T_170126_092925_1311", 
  "Campaign": "PhaseIIFall16LHEGS82"
},{
  "MCPileup": null, 
  "RequestStatus": "assignment-approved", 
  "RequestName": "pdmvserv_task_HIG-PhaseIIFall16LHEGS82-00018__v1_T_170228_165502_3011", 
  "Campaign": "PhaseIIFall16LHEGS82"
},{
  "MCPileup": null, 
  "RequestStatus": "assignment-approved", 
  "RequestName": "pdmvserv_task_HIG-PhaseIIFall16LHEGS82-00018__v1_T_170228_170033_676", 
  "Campaign": [
    "PhaseIIFall16DR82", 
    "PhaseIIFall16LHEGS82"
  ]
}]}

Retrieving a list of protected LFNs

This API is meant to provide a list of unmerged LFNs that are undergoing in the workload management system (by retrieving the workflow property: OutputModulesLFNBases). It includes transient output LFNs as well as the final unmerged LFNs.

Workflows in one of the following statuses are considered as active in the system:

['assignment-approved', 'assigned', 'staging', 'staged', 'acquired', 'failed',
 'running-open', 'running-closed', 'force-complete', 'completed', 'closed-out']

The REST API is protectedlfns

GET /wmstatsserver/data/protectedlfns HTTP/1.1
Accept: application/json
Host: cmsweb.cern.ch

protectedlfns will return a 503 error in a case where the WMStats data cache is empty.

Retrieving a final list of protected LFNs

This API behaves very similar to protectedlfns, the only difference is that protectedlfns_final does not yield transient output LFNs (those defined by KeepOutput=True, and/or TransientOutputModules). It relies on the workflow property: OutputDatasets, then it builds the unmerged LFNs based on the output dataset names.

The REST API is protectedlfns_final

GET /wmstatsserver/data/protectedlfns_final HTTP/1.1
Accept: application/json
Host: cmsweb.cern.ch

Retrieving a list of locked datasets

This API returns a list of datasets that are in use by workflows with the following statuses:

['assignment-approved', 'assigned', 'staging', 'staged', 'acquired', 'failed',
 'running-open', 'running-closed', 'force-complete', 'completed', 'closed-out']

The REST API is globallocks

GET /wmstatsserver/data/globallocks HTTP/1.1
Accept: application/json
Host: cmsweb.cern.ch

Example Output

{"result": [
 "/Cosmics/Commissioning2015-PromptReco-v1/RECO","/Cosmics/CMSSW_7_3_2-CosmicSP-DQMHLTonRAWAOD_2017_TaskChain_InclParents_reqmgr2-v11/RAW-RECO"]}

globallocks will return a 503 error in a case where the WMStats data cache is empty.

API Usage by Dynamo for Data Locking

After the retirement of Unified input/output data placement, Dynamic Data Managment (DDM) will use a combination of the WMStats globallocks and protectedlfns APIs plus the ReqMgr2 API parentlocks API to determine the set of global datasets and unmerged files that are in use and should not be removed.

Clone this wiki locally