This repository has been archived by the owner on Jan 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 24
Error codes
Paul Nilsson edited this page Mar 5, 2021
·
17 revisions
When detecting a fatal problem, the Pilot assigns an error code and informs the server. Aside from the numerical code itself, it also reports the error meaning and a more detailed error diagnostics. The current range of error codes are listed in the [Pilot 2 wiki](https://twiki.cern.ch/twiki/bin/view/PanDA/Pilot2ErrorCodes).
Error code | Acronym | Meaning | Notes |
---|---|---|---|
1008 | GENERALERROR | General pilot error, consult batch log | |
1098 | NOLOCALSPACE | Not enough local space | Error code is set e.g. by job monitoring, also if copytool command fails (if "No space left on device" is found in command output) |
1099 | STAGEINFAILED | Failed to stage-in file | |
1100 | REPLICANOTFOUND | The rucio API function list_replicas() did not return any replicas. Check log for details. | |
1103 | NOSUCHFILE | No such file or directory | Error thrown by open_file() function. Also set if copytool fails and "No such file or directory" is found in output |
1104 | USERDIRTOOLARGE | User work directory too large | The error is set if the user work directory exceeds the maximum allowed limit, as defined by schedconfig.maxwdir (default: 14 GB) |
1106 | STDOUTTOOBIG | Payload log or stdout file too big | Set if stdout exceeds maximum allowed limit of 2 GB, set in the Pilot's default config file |
1110 | SETUPFAILURE | Failed during payload setup | |
1115 | NFSSQLITE | NFS SQLite locking problems | Pilot identifies this error by doing a grep on the strings "prepare 5 database is locked" and "Error SQLiteStatement" in in the payload stdout |
1116 | QUEUEDATA | Pilot could not download queuedata | |
1117 | QUEUEDATANOTOK | Pilot found non-valid queuedata | |
1124 | OUTPUTFILETOOLARGE | Output file too large | |
1133 | NOSTORAGE | Fetching default storage failed: no activity related storage defined | |
1137 | STAGEOUTFAILED | Failed to stage-out file | |
1141 | PUTMD5MISMATCH | md5sum mismatch on output file | Error acronym should be renamed |
1143 | CHMODTRF | Failed to chmod trf |
|
1144 | PANDAKILL | This job was killed by panda server | |
1145 | GETMD5MISMATCH | md5sum mismatch on input file | Error acronym should be renamed |
1149 | TRFDOWNLOADFAILURE | Transform could not be downloaded | |
1150 | LOOPINGJOB | Looping job killed by pilot | The pilot will kill the payload (or stop stage-in/out) if there is no activity (i.e. files touched in the work directory or if the file transfer is stuck) within the allowed time. The default looping job time limit is 12*3600 s for production jobs and 3*3600 s for user analysis jobs. The limit can be overridden in the pilot's config file (or set by the user using the maxCPUCount variable) |
1151 | STAGEINTIMEOUT | File transfer timed out during stage-in | Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr) |
1152 | STAGEOUTTIMEOUT | File transfer timed out during stage-out | Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr) |
1163 | NOPROXY | Grid proxy not valid | Set if grid-proxy-info fails or if "Could not establish context" is found in copytool command output |
1165 | MISSINGOUTPUTFILE | Local output file is missing | |
1168 | SIZETOOLARGE | Total file size too large | Before stage-in, the pilot verifies that the sum of the input file sizes does not exceed maxwdir (set in schedconfig or in pilot config file). Any files that are to be accessed directly/remotely are excluded |
1171 | GETADMISMATCH | adler32 mismatch on input file | Error acronym should be renamed |
1172 | PUTADMISMATCH | adler32 mismatch on output file | Error acronym should be renamed |
1177 | NOVOMSPROXY | Voms proxy not valid | Set if arcproxy fails |
1180 | GETGLOBUSSYSERR | Globus system error during stage-in | Pilot identifies this error if "globes_xio:" is found in command output |
1181 | PUTGLOBUSSYSERR | Globus system error during stage-out | Pilot identifies this error if "globes_xio:" is found in command output |
1186 | NOSOFTWAREDIR | Software directory does not exist | |
1187 | NOPAYLOADMETADATA | Payload metadata does not exist | This error can happen due to previous uncaught error, leading to missing metadata, i.e. the error label can be misleading (when discovered, pilot is usually patched) |
1190 | LFNTOOLONG | LFN too long (exceeding limit of 255 characters) | When validating a job definition, before executing the payload, the Pilot makes sure that no output file has an LFN that is longer than 255 characters (which is not supported by the DDM system) |
1191 | ZEROFILESIZE | File size cannot be zero | Before executing the stage-out command, the Pilot verifies that the size of the file is not zero (which will not be accepted by any storage system) |
1199 | MKDIR | Failed to create local directory | |
1200 | KILLSIGNAL | Job terminated by unknown kill signal | |
1201 | SIGTERM | Job killed by signal: SIGTERM | |
1202 | SIGQUIT | Job killed by signal: SIGQUIT | |
1203 | SIGSEGV | Job killed by signal: SIGSEGV | |
1204 | SIGXCPU | Job killed by signal: SIGXCPU | |
1205 | USERKILL | Job killed by user | Reserved error code for user defined kill instructions. Currently not implemented |
1206 | SIGBUS | Job killed by signal: SIGBUS | |
1207 | SIGUSR1 | Job killed by signal: SIGUSR1 | |
1211 | MISSINGINSTALLATION | Missing installation | Assigned error code if the payload fails to execute the transform |
1212 | PAYLOADOUTOFMEMORY | Payload ran out of memory | Assigned error code if the pilot finds the string "FATAL out of memory: taking the application down" in the stderr and "St9bad_alloc", "std::bad_alloc" in the stdout |
1213 | REACHEDMAXTIME | Reached batch system time limit | Pilot aborts automatically when 10 minutes remain of the maximum allowed running time, as set by 1) schedconfig,maxtime or 2) Pilot option -l <maxtime> (both values are in seconds) |
1214 | UNKNOWNPAYLOADFAILURE | Job failed due to unknown reason (consult log file) | |
2222 | SINGULARITYRESOURCEUNAVAILABLE |