Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a workflow allowing both git-oriented and JupyterLab-oriented notebook development #16

Open
AdamOlech opened this issue Sep 29, 2022 · 1 comment

Comments

@AdamOlech
Copy link

Just to reiterate from our (me, @PiotrZierhoffer and @proppy) short email discussion:

While working on some additional notebooks, we've started getting hit by the issue where subsequent pushes to the rad-lab-deploy repository would not trigger code update on the already running JupyterLab instance.

In other words, the code located in examples is only ever used during the initial setup phase. The secondary build handles uploading these files to a staging bucket. Then, the script that gets generated in the notebooks-build step is attached to the notebook instance as a post-startup script.

As per official documentation, this script gets run only once – upon first boot of the newly created notebook instance. This creates an inconvenient situation where the developer is given a false impression that updating the repository will result in the code getting updated on the notebook. This obviously doesn’t happen as the code gets synced up only once – during the initial deployment.

Our development workflow mainly consists of prototyping in JupyterLab, committing changes back in git and prototyping again starting from the copy derived from git. This way we can ensure reproducibility in a sense that somebody can grab the repository, deploy the code to their project and start working on it.

It seems that an early attempt to address this issue has already been proposed in #5. Personally, I quite like this approach!

So far I see the following possible solutions on how to address this:

  • make the post-startup script run upon every boot. This would of course require us to restart the notebook instance upon each push but this could either be added to Terraform or it could simply be added as another step in the secondary build pipeline.
  • add an explicit step that would SSH into the notebook instance and run the bucket synchronization script.
  • add gcsfuse for staging directory #5
@proppy
Copy link
Owner

proppy commented Sep 29, 2022

As per official documentation, this script gets run only once – upon first boot of the newly created notebook instance.

With proppy/rad-lab-deploy#4, a change to the notebook should appropriatly re-trigger the cloud build null_resource, which per transitive dependency should re-create of the notebook instance resource.
This should cause the script to run again, but I think there might be a bad interaction with d3f43e0:

  • the data disk get reuse as the name of the notebook resource doesn't change
  • this avoid data loss of notebook created by end-user on infrastructure update (as requested by @HelgeGehring)
  • I suspect there might still be a file marker there that prevent the startup script to run again.

So far I see the following possible solutions on how to address this:

As discussed in proppy/rad-lab-deploy#4 (comment), I'd rather have those living in a separate repo altogether (I was thinking of a fork of https://github.com/chipsalliance/silicon-notebooks in a Cloud Source repo) w/ some git integration on the jupyterlab side (maybe using managed notebooks? or with #13) so that we can simply convey the familiar expectation of the users that they can easily manage them with git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants