CUDA not used #70

webfrank · 2024-10-22T15:25:13Z

Hi,
great works, flawless integration with Go.

I was trying to move inference on CUDA device. This is the code used to initialize the runtime:

func InitYolo8Session(input []float32) (ModelSession, error) {
	lib := getSharedLibPath()
	log.Printf("Loading ONNX runtime %s\n", lib)

	ort.SetSharedLibraryPath(lib)
	err := ort.InitializeEnvironment()
	if err != nil {
		return ModelSession{}, err
	}

	inputShape := ort.NewShape(1, 3, 640, 640)
	inputTensor, err := ort.NewTensor(inputShape, input)
	if err != nil {
		return ModelSession{}, err
	}

	outputShape := ort.NewShape(1, int64(len(yolo_classes)+4), 8400)
	outputTensor, err := ort.NewEmptyTensor[float32](outputShape)
	if err != nil {
		return ModelSession{}, err
	}

	options, e := ort.NewSessionOptions()
	if e != nil {
		return ModelSession{}, err
	}

	if UseCoreML { // If CoreML is enabled, append the CoreML execution provider
		e = options.AppendExecutionProviderCoreML(0)
		if e != nil {
			options.Destroy()
			return ModelSession{}, err
		}
		defer options.Destroy()
	}

	if UseCUDA { // If CUDA is enabled, append the CUDA execution provider
		cudaOptions, err := ort.NewCUDAProviderOptions()
		if err != nil {
			return ModelSession{}, err
		}
		defer cudaOptions.Destroy()

		err = cudaOptions.Update(map[string]string{"device_id": "0"})
		if err != nil {
			return ModelSession{}, err
		}

		e = options.AppendExecutionProviderCUDA(cudaOptions)
		if e != nil {
			options.Destroy()
			return ModelSession{}, err
		}
		defer options.Destroy()
	}

	session, err := ort.NewAdvancedSession(
		ModelPath,
		[]string{"images"},
		[]string{"output0"},
		[]ort.ArbitraryTensor{inputTensor},
		[]ort.ArbitraryTensor{outputTensor},
		options,
	)

	if err != nil {
		return ModelSession{}, err
	}

	modelSes := ModelSession{
		Session: session,
		Input:   inputTensor,
		Output:  outputTensor,
	}

	log.Printf("ONNX runtime (%s) initialized [%v]", ort.GetVersion(), ort.IsInitialized())
	return modelSes, err
}

The library is the latest runtime (1.19.2) from official repo, the GPU variant.

Inference is working but with similar time of CPU, the output from nvidia-smi is this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P0              32W /  70W |   1841MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

yalue · 2024-10-25T13:27:44Z

Thanks, and I'm glad it's working, at least somewhat!

I would expect any of the initialization functions to return an error if CUDA was not actually initialized correctly... odd. Have you verified that the library runs correctly by running go test -v -bench=. from its source directory? You'd need to set the ONNXRUNTIME_SHARED_LIBRARY_PATH environment variable to point to your GPU-enabled copy of onnxruntime.so in order to run the test, but this should give you good information on whether CUDA is enabled and working properly. (You'd specifically want to look at the output for the BenchmarkCUDASession output, and make sure it's faster than the BenchmarkOpMultiThreaded output.)

Depending on the size of the yolov8 network, it's possible that it's just not large enough to see a significant benefit from CUDA, especially with CUDA's higher overheads. However, it is indeed puzzling that nvidia-smi isn't showing anything. I've seen the current version of onnxruntime_go interact correctly with CUDA on several different systems, so I wonder if you're somehow just loading a wrong copy of the library? Let me know if the tests pass.

And sorry for the slow update, I haven't had much time to look at this project recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA not used #70

CUDA not used #70

webfrank commented Oct 22, 2024

yalue commented Oct 25, 2024 •

edited

Loading

CUDA not used #70

CUDA not used #70

Comments

webfrank commented Oct 22, 2024

yalue commented Oct 25, 2024 • edited Loading

yalue commented Oct 25, 2024 •

edited

Loading