-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hq as light scheduler #795
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #795 +/- ##
==========================================
- Coverage 67.72% 67.53% -0.20%
==========================================
Files 50 50
Lines 4598 4620 +22
==========================================
+ Hits 3114 3120 +6
- Misses 1484 1500 +16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
670e586
to
12784a0
Compare
4bb50e5
to
1e2bded
Compare
Just find a cache when pre-setting the computer, the issue exist also for the current In this PR, the same problem faced for the hq computer setup, therefore, I skip the setup for the default N_PROCS and MEM. |
7f1f129
to
2f956fd
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@danielhollas all requests addressed. |
I convert back to draft, since test on the production deployment reveal a problem that untar not happened for the presistent volume since it checks home is empty but there will be a |
Ready for another review. |
if [ -d $AIIDALAB_APPS/quantum-espresso ]; then | ||
echo "Quantum ESPRESSO app does exist" | ||
else | ||
echo "Copying directory '$QE_APP_FOLDER' to '$AIIDALAB_APPS'" | ||
cp -r "$QE_APP_FOLDER" "$AIIDALAB_APPS" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This do the trick, because in the k8s deployment, taring for the empty files like work
for example has permission issue. Which not cause any problem I can see but it prevent the qeapp folder copy after. I remove it out as a independent operation.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a super quick review, LGTM. (btw: will be away for the next three weeks)
@@ -4,7 +4,7 @@ set -eux | |||
home="/home/${NB_USER}" | |||
|
|||
# Untar home archive file to restore home directory if it is empty | |||
if [[ $(ls -A ${home} | wc -l) = "0" ]]; then | |||
if [ ! -e $home/.FLAG_HOME_INITIALIZED ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of this solution since it seems brittle (user can remove this file).
I am somewhat confused, why does the previous one not work anymore?
(feel free to ignore, this is just me rambling :D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two problems with previous one
- the left side is
0
and right side is\0
as str. - On the k8s deployment, if the persistent volume is used, it has a
lost+found
folder exist before this script is running.
So another solution I did was if [ $(ls -A ${home} | wc -l ) -lt 1 ]; then
, but it is more brittle I assume :-p
before-notebook.d/43_start-hq.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into this file too deeply, might want to get other eyes on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, please do it, I am not in hurry to get this merged.
No problem, and congrats with the atomspec paper! |
Thanks! I've been meaning to announce it on the forum and thank the team but I did not get to it yet. : 🙈 |
I merge this now, since in the demo server checkin meeting we set the next test from @cpignedoli and @giovannipizzi at next Wednesday. It is good to have an alpha release include this one and many new fixes and use version tag to be deployed in the server. |
In the docker image, the hyperqueue is pre-configured and replace the local.direct scheduler to limit the number of CPUs to be used when there are multiple calculations. The number of CPU and memory information are read from cgroups and set for the computer as default. These information can later be used for set up the default number of resource to be used for the QeApp. The number of CPU is set to be floor(ncpus) of the container, the goal is to have some amount of cpus to deal with system response specifically. - also include a small refactoring on the computer/code setup.
local-hq
or replacelocalhost
? (uselocalhost-hq
and hidelocalhost
for old calculation backward compatibility)