Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.6.7.10 #101

Merged
merged 16 commits into from
Sep 18, 2023
Merged

3.6.7.10 #101

merged 16 commits into from
Sep 18, 2023

Conversation

PalNilsson
Copy link
Collaborator

  • Improved reporting of CPU consumption time
    • It was seen in I/O bound payloads that the correct CPU consumption time was not reported correctly. Pilot is now making sure there are no zero values reported
    • Reported by R. Walker
  • Migration towards using psutil module has started
    • Until now, pilot has relied on executing the ps command for process information, but this is heavy on the system if many ps commands are executed in short time
    • A. De Silva has made the psutil module available via ALRB and is setup in the wrapper with ‘lsetup psutil’ by P. Love
      Currently, there is no requirement for psutil - the pilot has a fallback to using other process info in case psutil fails to import - but this will change soon
    • Pilot is currently only using psutil to get information whether a certain process is running or not, with a fallback to /proc/{pid}
  • Added protection for failed writing of info dictionary to disk before server update
    • Curl normally uses this dictionary, but should now instead use the dictionary explicitly (converted to string)
    • Previously, the pilot would fail to inform the server, i.e. the job would become a lost heartbeat
    • The pilot might still fail before reaching this point, as it basically relies on disks with space > 0
  • Moved import of google cloud logging to beginning of real-time logging module to prevent an unexplained problem seen in Rubin jobs
    • Previously, said module was only imported when it needed to be used, but for some reasons this would occasionally lead to python locking up
    • Requested by Z. Yang (Rubin)

@PalNilsson PalNilsson merged commit f903d9e into master Sep 18, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant