Pilot Architecture

The Pilot is component based, with each component being responsible for different tasks. The main tasks are handled by controller components, such as Job Control, Payload Control and Data Control. There is also a set of components with auxiliary functionalities, e.g. Pilot Monitor and Job Monitor - one for internal use which monitors threads and one that is tied to the job and checks parameters that are relevant for the payload (e.g. size checks). The Information System component presents an interface to a database containing knowledge about the resource where the Pilot is running (e.g. which copy tool to use and where to read and write data).

PanDA Pilot Architecture

Workflows

The pilot worksflows are described in the corresponding section.

Pilot Components

Each of the pilot components run as independent threads in the pilot. Each component has additional subthreads and are described below. Most of the threads manipulate Job objects that contain all the full information for a job downloaded from the PanDA server, or read from file. The Job objects are stored in globally available Python Queues. A Job object is passed around different queues until processing it has completed. The various threads are monitoring these queues, and act on a Job object as it arrives.

Job Control

The Job control spawns five subthreads for various tasks:

retrieve: Retrieve a job definition from any source and place it in the "job" queue. The job definition is a json dictionary that is either preplaced in the launch directory or downloaded from a server specified by args.url (pilot option)
validate: Retrieve a Job object from the "jobs" queue. If it passes the user defined verification, the main payload work directory gets created (PanDA_Pilot-<pandaid>) in the main pilot work directory. The Job object is passed on to the "validated_jobs" queue or "failed_jobs" in case of failure
create_data_payload: Get a Job object from the "validated_jobs" queue. If the job has defined input files, move the Job object to the "data_in" queue and put the internal pilot state to "stagein". In case there are no input files, place the Job object in the "finished_data_in" queue. For either case, the thread also places the Job object in the "payloads" queue (another thread will retrieve it and wait for any stage-in to finish)
queue_monitor: Monitoring of (internal Python) queues. This thread monitors queue activity, specifically if a job has finished or failed, and reports to the server. A completed job will be moved to the "completed_jobs" queue
job_monitor: Monitoring of job parameters. This thread monitors certain job parameters, such as job looping, at various time intervals. The main loop is executed once a minute, while individual verifications may be executed at any time interval (>= 1 minute). E.g. looping jobs are checked once per ten minutes (default) and the heartbeat is send once per 30 minutes. Memory usage is checked once a minute

Payload Control

bla..

Data Control

bla..

Pilot and Job Monitors

bla..

Overview

Introduction
Pilot Architecture
Pilot Workflows
- Standard Workflow
- HPC Workflow
Event service
Metadata
Direct Access
Signal Handling
Error Codes
Containers
Special Algorithms
Pilot Configuration
Timing Measurements
Copy Tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pilot Architecture

Workflows

Pilot Components

Job Control

Payload Control

Data Control

Pilot and Job Monitors

Overview

Developer pages

Related links

Clone this wiki locally