This repository has been archived by the owner on Jan 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 24
Error codes
Paul Nilsson edited this page Mar 5, 2021
·
17 revisions
When detecting a fatal problem, the Pilot assigns an error code and informs the server. Aside from the numerical code itself, it also reports the error meaning and a more detailed error diagnostics. The current range of error codes are listed in the [Pilot 2 wiki](https://twiki.cern.ch/twiki/bin/view/PanDA/Pilot2ErrorCodes).
Error code | Acronym | Meaning | Notes |
---|---|---|---|
1008 | GENERALERROR | General pilot error, consult batch log | |
1098 | NOLOCALSPACE | Not enough local space | Error code is set e.g. by job monitoring, also if copytool command fails (if "No space left on device" is found in command output) |
1099 | STAGEINFAILED | Failed to stage-in file | |
1100 | REPLICANOTFOUND | The rucio API function list_replicas() did not return any replicas. Check log for details. | |
1103 | NOSUCHFILE | No such file or directory | Error thrown by open_file() function. Also set if copytool fails if copytool fails and "No such file or directory" is found in command output |
1104 | USERDIRTOOLARGE | User work directory too large | The error is set if the user work directory exceeds the maximum allowed limit, as defined by schedconfig.maxwdir (default: 14 GB) |
1106 | STDOUTTOOBIG | Payload log or stdout file too big | Set if stdout exceeds maximum allowed limit of 2 GB, set in the Pilot's default config file |
1110 | SETUPFAILURE | Failed during payload setup | |
1115 | NFSSQLITE | NFS SQLite locking problems | Pilot identifies this error by doing a grep on the strings "prepare 5 database is locked" and "Error SQLiteStatement" in the payload stdout |
1116 | QUEUEDATA | Pilot could not download queuedata | |
1117 | QUEUEDATANOTOK | Pilot found non-valid queuedata | |
1124 | OUTPUTFILETOOLARGE | Output file too large | |
1133 | NOSTORAGE | Fetching default storage failed: no activity related storage defined | |
1137 | STAGEOUTFAILED | Failed to stage-out file | |
1141 | PUTMD5MISMATCH | md5sum mismatch on output file | Error acronym should be renamed |
1143 | CHMODTRF | Failed to chmod trf | After downloading a trf, the pilot tries to do a chmod 0755 on it. If this fails, the pilot will set this error |
1144 | PANDAKILL | This job was killed by panda server | |
1145 | GETMD5MISMATCH | md5sum mismatch on input file | Error acronym should be renamed |
1149 | TRFDOWNLOADFAILURE | Transform could not be downloaded | |
1150 | LOOPINGJOB | Looping job killed by pilot | The pilot will kill the payload (or stop stage-in/out) if there is no activity (i.e. files touched in the work directory or if the file transfer is stuck) within the allowed time. The default looping job time limit is 12*3600 s for production jobs and 3*3600 s for user analysis jobs. The limit can be overridden in the pilot's config file (or set by the user using the maxCPUCount variable) |
1151 | STAGEINTIMEOUT | File transfer timed out during stage-in | Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr) |
1152 | STAGEOUTTIMEOUT | File transfer timed out during stage-out | Currently only identified for rucio file transfer (unless "Operation timed out" is in stderr) |