build: update to CUDA 12.3 #3956

esteve · 2023-11-02T14:20:38Z

Description

This PR updates the ansible scripts CUDA to 12.3.

Fixes #3943

Tests performed

Notes for reviewers

Interface changes

Effects on system behavior

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.
The PR follows the pull request guidelines.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

The PR follows the pull request guidelines.
The PR has been properly tested.
The PR has been reviewed by the code owners.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

There are no open discussions or they are tracked via tickets.
The PR is ready for merge.

After all checkboxes are checked, anyone who has write access can merge the PR.

Signed-off-by: Esteve Fernandez <[email protected]>

esteve · 2023-11-02T14:26:06Z

Back to draft, I hadn't realized the TensorRT packages are in a separate Ansible script

…e script Signed-off-by: Esteve Fernandez <[email protected]>

esteve · 2023-11-02T14:29:49Z

@xmfcx this is ready for review now.

ansible/roles/cuda/tasks/main.yaml

ansible/roles/cuda/README.md

xmfcx · 2023-11-02T15:38:51Z

I will try to test it tomorrow, thank you for your efforts!

Signed-off-by: Esteve Fernandez <[email protected]>

esteve · 2023-11-02T15:42:27Z

@xmfcx thanks for the prompt review. I've incorporated your feedback and pushed a new commit. I'm runnning this locally, though I won't be able to fully test it, just make sure that it can be built.

xmfcx · 2023-11-02T15:59:45Z

It seems there is:

https://github.com/autowarefoundation/autoware/blob/main/arm64.env
file too which contains a single line comment:

# Override amd64's settings

We might need to override the TensorRT version.

I've checked that sbsa repository contains:

libcudnn8-dev_8.9.5.29-1+cuda12.2 (same as in the x86_64 repository)

But it doesn't contain:

tensorrt-dev_8.6.1.6-1+cuda12.0

Instead, it contains:

tensorrt-dev_8.6.2.2-1+cuda12.0

So, we might need to add:

tensorrt_version=8.6.2.2-1+cuda12.0
to the arm64.env file.

Also we need to check where in the code it does the overriding.

esteve · 2023-11-02T16:03:35Z

@xmfcx do we need the packages from the sbsa repository? 8.6.1.6-1+cuda12.0 is in the x86_64 repository.

esteve · 2023-11-02T16:05:00Z

@xmfcx do we need the packages from the sbsa repository? 8.6.1.6-1+cuda12.0 is in the x86_64 repository.

Just to reply my own question, I think we do need the packages from sbsa for ARM-based non-Jetson platforms.

xmfcx · 2023-11-02T16:06:47Z

Yes, it is referenced in https://github.com/autowarefoundation/autoware/blob/main/ansible/roles/cuda/tasks/main.yaml#L6

Signed-off-by: Esteve Fernandez <[email protected]>

esteve · 2023-11-03T14:32:00Z

@xmfcx I'm building this branch on an ARM virtual machine, so far it seems to be using the correct version of TensorRT for ARM (8.6.2.2-1+cuda12.0)

esteve · 2023-11-03T14:58:24Z

Spoke too soon, the cuda-nvprof-12-3 package doesn't seem to be available for ARM, only cuda-nvprof-11-7

Signed-off-by: Esteve Fernandez <[email protected]>

esteve · 2023-11-06T13:57:01Z

@xmfcx I've built Autoware in an ARM VM and CUDA gets detected and the perception packages are built, I assume the result is correct.

xmfcx · 2023-11-13T09:53:11Z

Thanks for your efforts @esteve on this PR.
The changes look alright. To make sure nothing is wrong with the CI, I've started:

and will merge when they are done.

esteve · 2023-11-13T12:22:38Z

@xmfcx it seems that both jobs passed! 🥳

xmfcx

🎊🪩🎊🪩🎊

kaspermeck-arm · 2023-12-07T18:09:10Z

Is it possible to downgrade to 12.2 to match https://developer.nvidia.com/embedded/jetpack?

xmfcx · 2023-12-11T09:24:22Z

@kaspermeck-arm we could, this PR didn't make any changes in the code.
Only changed the dependencies.

For Jetson devices, I assume they won't install CUDA separately since it would come with the JetPack.

And even looking at the changes here, as long as it is CUDA 12, they seem to be playing well with each other.
So keeping it same shouldn't be a problem.

cuda_version=12.3
cudnn_version=8.9.5.29-1+cuda12.2
tensorrt_version=8.6.1.6-1+cuda12.0

Do you think we should still change the version?

kaspermeck-arm · 2023-12-11T15:49:09Z

@xmfcx
That's good to know, so technically setting cuda_version to 12.2 wouldn't interfere with any CUDA applications in Autoware.

We wouldn't install the driver on the Jetson platforms, but we would deploy CUDA capable containers on it. The driver is backwards compatible, i.e., you can run CUDA 12.1 on the 12.2 driver. Doing it the other way around can lead to complications.

Yes, I do think we should downgrade to 12.2 but only if we don't explicitly are using any features from 12.3. This should simplify deployment on the Jetson platforms.

@oguzkaganozt @ambroise-arm

build: update to CUDA 12.3

8e932e1

Signed-off-by: Esteve Fernandez <[email protected]>

esteve requested a review from xmfcx November 2, 2023 14:20

esteve mentioned this pull request Nov 2, 2023

Upgrade to CUDA 12 #3943

Closed

6 tasks

esteve marked this pull request as draft November 2, 2023 14:25

build: cuDNN and TensorRT packages are installed in a separate Ansibl…

495e85b

…e script Signed-off-by: Esteve Fernandez <[email protected]>

esteve marked this pull request as ready for review November 2, 2023 14:29

esteve enabled auto-merge (squash) November 2, 2023 15:26

xmfcx reviewed Nov 2, 2023

View reviewed changes

ansible/roles/cuda/tasks/main.yaml Outdated Show resolved Hide resolved

xmfcx reviewed Nov 2, 2023

View reviewed changes

ansible/roles/cuda/tasks/main.yaml Outdated Show resolved Hide resolved

xmfcx reviewed Nov 2, 2023

View reviewed changes

ansible/roles/cuda/README.md Outdated Show resolved Hide resolved

build: keep /usr/local/cuda as path

a35c308

Signed-off-by: Esteve Fernandez <[email protected]>

esteve force-pushed the update-cuda-12.3 branch from 67d9220 to a35c308 Compare November 2, 2023 15:41

build: add arm64-specific version of the TensortRT package

bc6a64e

Signed-off-by: Esteve Fernandez <[email protected]>

esteve added 2 commits November 3, 2023 16:35

build: only install cuda-nvprof on x86_64

f5275c5

Signed-off-by: Esteve Fernandez <[email protected]>

build: update cudnn version for ARM

a4310e2

Signed-off-by: Esteve Fernandez <[email protected]>

xmfcx approved these changes Nov 13, 2023

View reviewed changes

esteve merged commit a08fc46 into main Nov 13, 2023
19 checks passed

esteve deleted the update-cuda-12.3 branch November 13, 2023 13:01

tier4-autoware-public-bot bot mentioned this pull request Nov 14, 2023

chore: sync upstream tier4/autoware#5

Open

xmfcx mentioned this pull request Nov 15, 2023

build(CUDA): update CUDA repo OS, and use nvidia/cuda image as a cuda_base_image #3684

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: update to CUDA 12.3 #3956

build: update to CUDA 12.3 #3956

esteve commented Nov 2, 2023 •

edited

Loading

esteve commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 3, 2023

esteve commented Nov 3, 2023 •

edited

Loading

esteve commented Nov 6, 2023

xmfcx commented Nov 13, 2023

esteve commented Nov 13, 2023

xmfcx left a comment

kaspermeck-arm commented Dec 7, 2023

xmfcx commented Dec 11, 2023

kaspermeck-arm commented Dec 11, 2023

build: update to CUDA 12.3 #3956

build: update to CUDA 12.3 #3956

Conversation

esteve commented Nov 2, 2023 • edited Loading

Description

Related links

Tests performed

Notes for reviewers

Interface changes

Effects on system behavior

Pre-review checklist for the PR author

In-review checklist for the PR reviewers

Post-review checklist for the PR author

esteve commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 2, 2023

esteve commented Nov 2, 2023

xmfcx commented Nov 2, 2023

esteve commented Nov 3, 2023

esteve commented Nov 3, 2023 • edited Loading

esteve commented Nov 6, 2023

xmfcx commented Nov 13, 2023

esteve commented Nov 13, 2023

xmfcx left a comment

Choose a reason for hiding this comment

kaspermeck-arm commented Dec 7, 2023

xmfcx commented Dec 11, 2023

kaspermeck-arm commented Dec 11, 2023

esteve commented Nov 2, 2023 •

edited

Loading

esteve commented Nov 3, 2023 •

edited

Loading