Error codes

When detecting a fatal problem, the Pilot assigns an error code and informs the server. Aside from the numerical code itself, it also reports the error meaning and a more detailed error diagnostics. The current range of error codes are listed in the [Pilot 2 wiki](https://twiki.cern.ch/twiki/bin/view/PanDA/Pilot2ErrorCodes).

Error code	Acronym	Meaning	Notes
1008	GENERALERROR	General pilot error, consult batch log
1098	NOLOCALSPACE	Not enough local space	Error code is set e.g. by job monitoring, also if copytool command fails (if "No space left on device" is found in command output)
1099	STAGEINFAILED	Failed to stage-in file
1100	REPLICANOTFOUND	The rucio API function list_replicas() did not return any replicas. Check log for details.
1103	NOSUCHFILE	No such file or directory	Error thrown by open_file() function. Also set if copytool fails if copytool fails and "No such file or directory" is found in command output
1104	USERDIRTOOLARGE	User work directory too large	The error is set if the user work directory exceeds the maximum allowed limit, as defined by schedconfig.maxwdir (default: 14 GB)
1106	STDOUTTOOBIG	Payload log or stdout file too big	Set if stdout exceeds maximum allowed limit of 2 GB, set in the Pilot's default config file
1110	SETUPFAILURE	Failed during payload setup
1115	NFSSQLITE	NFS SQLite locking problems	Pilot identifies this error by doing a grep on the strings "prepare 5 database is locked" and "Error SQLiteStatement" in the payload stdout
1116	QUEUEDATA	Pilot could not download queuedata
1117	QUEUEDATANOTOK	Pilot found non-valid queuedata
1124	OUTPUTFILETOOLARGE	Output file too large
1133	NOSTORAGE	Fetching default storage failed: no activity related storage defined
1137	STAGEOUTFAILED	Failed to stage-out file
1141	PUTMD5MISMATCH	md5sum mismatch on output file	Error acronym should be renamed
1143	CHMODTRF	Failed to chmod trf	After downloading a trf, the pilot tries to do a chmod 0755 on it. If this fails, the pilot will set this error
1144	PANDAKILL	This job was killed by panda server
1145	GETMD5MISMATCH	md5sum mismatch on input file	Error acronym should be renamed
1149	TRFDOWNLOADFAILURE	Transform could not be downloaded
1150	LOOPINGJOB	Looping job killed by pilot	The pilot will kill the payload (or stop stage-in/out) if there is no activity (i.e. files touched in the work directory or if the file transfer is stuck) within the allowed time. The default looping job time limit is 123600 s for production jobs and 33600 s for user analysis jobs. The limit can be overridden in the pilot's config file (or set by the user using the maxCPUCount variable)
1151	STAGEINTIMEOUT	File transfer timed out during stage-in	Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr)
1152	STAGEOUTTIMEOUT	File transfer timed out during stage-out	Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr)

Overview

Introduction
Pilot Architecture
Pilot Workflows
- Standard Workflow
- HPC Workflow
Event service
Metadata
Direct Access
Signal Handling
Error Codes
Containers
Special Algorithms
Pilot Configuration
Timing Measurements
Copy Tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error codes

Overview

Developer pages

Related links

Clone this wiki locally