-
Notifications
You must be signed in to change notification settings - Fork 107
WMAgent
Alan Malta Rodrigues edited this page Sep 8, 2023
·
19 revisions
The WMAgent software is a distributed component of the production system, in a nutshell its functions are:
- Splitting WorkQueue elements into smaller basic work units, known as jobs.
- Creating jobs and controlling the flow of work according for the tasks defined in the workload of a request.
- Submitting jobs to a batch system (e.g. HTCondor, LSF).
- Tracking the submitted jobs and keeping tabs on their outcome.
- Registering the produced data into the CMS catalogs (i.e. DBS2/3, PhEDEx).
The WMAgent relies in two services for its operation:
- A relational database to keep the WMAgent state, known as WMBS.
- A non-relational database for monitoring and document storage, the current implementation uses CouchDB.
The WMAgent is made up of threaded WMComponents which function independently and use WMBS and CouchDB as their sources of information, some of them interact with external services such as PhEDEx, ReqMgr, DBS, WorkQueue, or SiteDB.
- Initial login to the machine:
[user@vocms0290]$ cmst1
cmst1@vocms0290:/afs/cern.ch/user$ agentenv
- Machine and components status management:
cmst1@vocms0290:/data/srv/wmagent/current$ $manage status
cmst1@vocms0290:/data/srv/wmagent/current$ $manage stop-services
cmst1@vocms0290:/data/srv/wmagent/current$ $manage start-services
cmst1@vocms0290:/data/srv/wmagent/current$ $manage stop-agent
cmst1@vocms0290:/data/srv/wmagent/current$ $manage start-agent
- Restart a subset of the agent's components:
cmst1@vocms0290:/data/srv/wmagent/current$ $manage execute-agent wmcoreD --restart --component JobAccountant,RucioInjector
- Unregister an agent from WMCore central services:
cmst1@vocms0290:/data/srv/wmagent/current$ $manage execute-agent wmagent-unregister-wmstats `hostname -f`
- Check or add resources to the agent's resource control database:
cmst1@vocms0290:/data/srv/wmagent/current$ $manage execute-agent wmagent-resource-control --site-name=T2_CH_CERN_HLT -p
cmst1@vocms0290:/data/srv/wmagent/current$ $manage execute-agent wmagent-resource-control --plugin=SimpleCondorPlugin --opportunistic --pending-slots=1000 --running-slots=2000 --add-one-site T3_ES_PIC_BSC
- Use the internal configuration and sql client to connect to the current agent's dataebase:
cmst1@vocms0290:/data/srv/wmagent/current$ $manage db-prompt wmagent
Optionally you may use the rlwrap
tool, if available at the agent, in order to have a proper console output wrapper and history. e.g.:
cmst1@vocms0290:/data/srv/wmagent/current$ rlwrap -m -pgreen -H /data/tmp/.sqlplus.hist $manage db-prompt
- Kill a workflow at the agent:
cmst1@vocms0290:/data/srv/wmagent/current $ $manage execute-agent kill-workflow-in-agent <FIXME:workflow-name>
- Minimal depth of the WMAgent tree, starting from the
current
deployment
cmst1@vocms0290:/data/srv/wmagent/current $ tree -lL 3
.
├── apps -> apps.sw
│ ├── wmagent -> ../sw/slc7_amd64_gcc630/cms/wmagentpy3/2.1.1.pre3
│ │ ├── bin
│ │ ├── data
│ │ ├── doc
│ │ ├── etc
│ │ ├── lib
│ │ ├── xbin
│ │ ├── xdata
│ │ ├── xdoc
│ │ └── xlib
│ └── wmagentpy3 -> ../sw/slc7_amd64_gcc630/cms/wmagentpy3/2.1.1.pre3 [recursive, not followed]
├── apps.sw
│ ├── wmagent -> ../sw/slc7_amd64_gcc630/cms/wmagentpy3/2.1.1.pre3 [recursive, not followed]
│ └── wmagentpy3 -> ../sw/slc7_amd64_gcc630/cms/wmagentpy3/2.1.1.pre3 [recursive, not followed]
├── auth
├── bin
├── config
│ ├── couchdb
│ │ └── local.ini
│ ├── mysql
│ │ └── my.cnf
│ ├── rucio
│ │ └── etc
│ ├── wmagent -> ../config/wmagentpy3
│ │ ├── config.py
│ │ ├── config.py~
│ │ ├── config-template.py
│ │ ├── deploy
│ │ ├── local.ini
│ │ ├── manage
│ │ ├── my.cnf
│ │ ├── __pycache__
│ │ └── rucio.cfg
│ └── wmagentpy3
│ ├── config.py
│ ├── config.py~
│ ├── config-template.py
│ ├── deploy
│ ├── local.ini
│ ├── manage
│ ├── my.cnf
│ ├── __pycache__
│ └── rucio.cfg
├── install
│ ├── couchdb
│ │ ├── certs
│ │ ├── database
│ │ └── logs
│ ├── mysql
│ │ ├── database
│ │ └── logs
│ └── wmagentpy3
│ ├── AgentStatusWatcher
│ ├── AnalyticsDataCollector
│ ├── ArchiveDataReporter
│ ├── DBS3Upload
│ ├── ErrorHandler
│ ├── JobAccountant
│ ├── JobArchiver
│ ├── JobCreator
│ ├── JobStatusLite
│ ├── JobSubmitter
│ ├── JobTracker
│ ├── JobUpdater
│ ├── RetryManager
│ ├── RucioInjector
│ ├── TaskArchiver
│ └── WorkQueueManager
└── sw
├── bin
│ ├── cmsarch -> ../common/cmsarch
│ ├── cmsos -> ../common/cmsarch
│ └── scramv1 -> ../common/scramv1
├── bootstrap.sh
├── bootstrap-slc7_amd64_gcc630.log
├── bootstraptmp
├── cmsset_default.csh
├── cmsset_default.sh
├── common
│ ├── cmsarch
│ ├── cmsos
│ ├── cmspkg
│ ├── migrate-cvsroot
│ ├── scram
│ ├── scramv0 -> scram
│ └── scramv1 -> scram
├── data -> /data
│ ├── admin
│ ├── certs
│ ├── khurtado
│ ├── lost+found
│ ├── srv
│ └── tmp
├── etc
│ └── cms-common
├── share
│ └── cms
└── slc7_amd64_gcc630
├── cms
├── etc
├── external
├── tmp
└── var
- All component logs can be found here:
cmst1@vocms0290:/data/srv/wmagent/current $ ls -ls /data/srv/wmagent/current/install/wmagentpy3/*/ComponentLog
827896 -rw-r--r--. 1 cmst1 zh 847759271 Aug 24 19:54 /data/srv/wmagent/current/install/wmagentpy3/AgentStatusWatcher/ComponentLog
13484 -rw-r--r--. 1 cmst1 zh 13799746 Oct 19 08:38 /data/srv/wmagent/current/install/wmagentpy3/AnalyticsDataCollector/ComponentLog
4244 -rw-r--r--. 1 cmst1 zh 4337901 Oct 19 08:40 /data/srv/wmagent/current/install/wmagentpy3/ArchiveDataReporter/ComponentLog
4092 -rw-r--r--. 1 cmst1 zh 4182158 Sep 1 16:23 /data/srv/wmagent/current/install/wmagentpy3/DBS3Upload/ComponentLog
11412 -rw-r--r--. 1 cmst1 zh 11680500 Oct 19 08:44 /data/srv/wmagent/current/install/wmagentpy3/ErrorHandler/ComponentLog
3560 -rw-r--r--. 1 cmst1 zh 3640859 Oct 19 08:42 /data/srv/wmagent/current/install/wmagentpy3/JobAccountant/ComponentLog
17716 -rw-r--r--. 1 cmst1 zh 18136882 Oct 19 08:43 /data/srv/wmagent/current/install/wmagentpy3/JobArchiver/ComponentLog
11240 -rw-r--r--. 1 cmst1 zh 11504668 Oct 19 08:44 /data/srv/wmagent/current/install/wmagentpy3/JobCreator/ComponentLog
21708 -rw-r--r--. 1 cmst1 zh 22220852 Oct 19 08:44 /data/srv/wmagent/current/install/wmagentpy3/JobStatusLite/ComponentLog
49336 -rw-r--r--. 1 cmst1 zh 50512403 Oct 19 08:43 /data/srv/wmagent/current/install/wmagentpy3/JobSubmitter/ComponentLog
26964 -rw-r--r--. 1 cmst1 zh 27606966 Oct 19 08:44 /data/srv/wmagent/current/install/wmagentpy3/JobTracker/ComponentLog
16576 -rw-r--r--. 1 cmst1 zh 16966263 Oct 19 08:43 /data/srv/wmagent/current/install/wmagentpy3/JobUpdater/ComponentLog
14368 -rw-r--r--. 1 cmst1 zh 14707697 Oct 19 08:45 /data/srv/wmagent/current/install/wmagentpy3/RetryManager/ComponentLog
55756 -rw-r--r--. 1 cmst1 zh 57089235 Oct 19 08:41 /data/srv/wmagent/current/install/wmagentpy3/RucioInjector/ComponentLog
22684 -rw-r--r--. 1 cmst1 zh 23221159 Oct 19 08:42 /data/srv/wmagent/current/install/wmagentpy3/TaskArchiver/ComponentLog
600168 -rw-r--r--. 1 cmst1 zh 614565975 Oct 19 08:44 /data/srv/wmagent/current/install/wmagentpy3/WorkQueueManager/ComponentLog