Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Devcontainer #5010

Closed
wants to merge 2 commits into from
Closed

Added Devcontainer #5010

wants to merge 2 commits into from

Conversation

5had3z
Copy link
Contributor

@5had3z 5had3z commented Aug 21, 2023

Category:

New Feature

Description:

Devcontainer is a fancy docker-compose that intergrates tightly with VSCode also enables the automatic installing of commonly used extensions. This enables contributors to simply run the command "Dev Containers: Rebuild and Reopen in Container" to get working in a bare-metal like manner with all dependencies compiled and installed. From there you can compile, run and debug DALI in the single environment.

The only steps to do for a new developer are:

  • git clone recursive
  • "Rebuild and Reopen in Container"
  • run init_repo.sh in cvcuda
  • select GCC-11.4 from cmake extension + click build button
  • cd build && pip instsall dali/python
  • run tests and debug etc.

Additional information:

Affected modules and functionalities:

New dockerfile in .devcontainer folder based on standard cuda development container. Dependencies included have sufficied for me to develop with and run operator_2 test suite #5007.

Extensions added to devcontainer include C++, CMake, Nsight, Python. By default the devcontainer will try to run with a GPU.

Key points relevant for the review:

DALI_deps is built, installed then removed to save space on final image size. There is potentially a better way to automatically insert the DALI_DEPS_VERSION_SHA to the Dockerfile than an Arg so the correct commit is used at build time. DALI_extra on the otherhand is needed for testing so that is cloned and retained, so the target SHA doesn't really need to be inserted during image build.

I didn't attempt setting up the environment for working with the tensorflow plugin part of the library.

A few extra packages included from the test-toolchain ppa like gcc-13 and tbb since I may probe a C++23 port later, but this can be removed (doesn't harm anything for now).

Need to discuss how to update compilation.rst, I personally prefer this workflow over docker/build.sh.

Tests:

The only test perse is the user experience to be able to quickly get a development environment up from scratch with this system.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@klecki
Copy link
Contributor

klecki commented Aug 22, 2023

Thanks for the contribution.
This looks cool, please allow me a few days to evaluate and test the devcontainer, I never used it before.

I think after we evaluate this, we will try to accept this, and than proceed with adjustments/generalization as next steps. One thing we need to figure out is how to plug this into CI, so we can verify that the container allows for correct compilation of latest DALI and isn't getting out of date.

When we have that streamlined + possibly add a handling for the TF plugin & custom ops, we will adjust the documentation advertising this as an option for development.

As a side note, I am thinking if we can utilize any of the dockerfiles that are used for the CI/build.sh and their versioning here, if you didn't see it, here is an example of the latest one: https://github.com/NVIDIA/DALI/blob/main/docker/Dockerfile.cuda122.x86_64.deps. To be clear, I am not asking for changes in this PR or generalization of the dockerfile, but if you have any suggestions in that area I would be happy to discuss what may work best for a developer using this feature.

One more note regarding the plugin list, we have a config file for yapf formatter: https://github.com/NVIDIA/DALI/blob/main/.style.yapf and our current formatting would probably generate less formatting changes using yapf instead of black.

@5had3z
Copy link
Contributor Author

5had3z commented Aug 23, 2023

utilize any of the dockerfiles that are used for the CI/build.sh

This should be possible as the dockerfile and context can be changed to anything relative to the devcontainer.json i.e. ../docker/Dockerfile. The build container still needs to be extended as it can't actually run anything, which is why build.sh also has the separate runner image with "CREATE_RUNNER" arg. I have minimal experience with "run everywhere" binaries, certainly anything built in my dockerfile will probably result in the classic "built with newer GLIBC version than system" error.

using yapf instead of black

VSCode-wise yapf seems to be currently in the "depricated/unsupported" camp. Of course I'm not going to suggest changing formatting over one IDE not being well-supported, but just isn't as straight forward to get format-on-save up and running (just one of the propsed alternatives needs to be appied).

@JanuszL
Copy link
Contributor

JanuszL commented Aug 24, 2023

Hi @5had3z,

Thank you for pointing out the gap and providing the initial implementation.
We discussed this internally and we are afraid that this approach may be hard to maintain in the long run. That is why we started working on something that can reuse the parts that are changing most frequently in our setup and have it tested in our CI. The initial PR we want to extend further has been posted #5017. Feel free to watch it and share your comments when it becomes more mature.

@5had3z
Copy link
Contributor Author

5had3z commented Aug 28, 2023

@JanuszL
I'm wondering if the dockerfile system can be simplified a bit with cuda minor version compatibility? Could all images be just based on the latest minor version, so there's only one 11 and 12 image (and use ngc base as I have done here)? Correct me if I'm wrong with thinking how it works. Also, I'm not sure why 9 and 10 are still hanging around since they're not supported anymore.

If toolchain and dependency things don't change too often, you could build the dev image in CI itself and host the developer image in NGC, so the devcontainer.json becomes

{
    "image": "nvcr.io/nvidia/dali-dev:cu122",
    "customizations": {
            ....

Maybe save a bit of energy so many people can just download instead of download + compile.

I'll close this here so any future conversation can happen in #5017.

@5had3z 5had3z closed this Aug 28, 2023
@JanuszL
Copy link
Contributor

JanuszL commented Aug 28, 2023

Hi @5had3z,

I'm wondering if the dockerfile system can be simplified a bit with cuda minor version compatibility? Could all images be just based on the latest minor version, so there's only one 11 and 12 image (and use ngc base as I have done here)

We already follow that.

Also, I'm not sure why 9 and 10 are still hanging around since they're not supported anymore.

We keep them for eager hackers who may want to build DALI with their own setup. It might work although there is no guarantee as we do not test this.

If toolchain and dependency things don't change too often, you could build the dev image in CI itself and host the developer image in NGC, so the devcontainer.json becomes

Not that often usually means once every month or two. It is technically possible although this adds significant engineering effort to test and validate provided images. On top of that, we need to patch all CVEs discovered in such images. For now, we would prefer to let everyone build this image on their own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants