Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA not used #70

Open
webfrank opened this issue Oct 22, 2024 · 1 comment
Open

CUDA not used #70

webfrank opened this issue Oct 22, 2024 · 1 comment

Comments

@webfrank
Copy link

Hi,
great works, flawless integration with Go.

I was trying to move inference on CUDA device. This is the code used to initialize the runtime:

func InitYolo8Session(input []float32) (ModelSession, error) {
	lib := getSharedLibPath()
	log.Printf("Loading ONNX runtime %s\n", lib)

	ort.SetSharedLibraryPath(lib)
	err := ort.InitializeEnvironment()
	if err != nil {
		return ModelSession{}, err
	}

	inputShape := ort.NewShape(1, 3, 640, 640)
	inputTensor, err := ort.NewTensor(inputShape, input)
	if err != nil {
		return ModelSession{}, err
	}

	outputShape := ort.NewShape(1, int64(len(yolo_classes)+4), 8400)
	outputTensor, err := ort.NewEmptyTensor[float32](outputShape)
	if err != nil {
		return ModelSession{}, err
	}

	options, e := ort.NewSessionOptions()
	if e != nil {
		return ModelSession{}, err
	}

	if UseCoreML { // If CoreML is enabled, append the CoreML execution provider
		e = options.AppendExecutionProviderCoreML(0)
		if e != nil {
			options.Destroy()
			return ModelSession{}, err
		}
		defer options.Destroy()
	}

	if UseCUDA { // If CUDA is enabled, append the CUDA execution provider
		cudaOptions, err := ort.NewCUDAProviderOptions()
		if err != nil {
			return ModelSession{}, err
		}
		defer cudaOptions.Destroy()

		err = cudaOptions.Update(map[string]string{"device_id": "0"})
		if err != nil {
			return ModelSession{}, err
		}

		e = options.AppendExecutionProviderCUDA(cudaOptions)
		if e != nil {
			options.Destroy()
			return ModelSession{}, err
		}
		defer options.Destroy()
	}

	session, err := ort.NewAdvancedSession(
		ModelPath,
		[]string{"images"},
		[]string{"output0"},
		[]ort.ArbitraryTensor{inputTensor},
		[]ort.ArbitraryTensor{outputTensor},
		options,
	)

	if err != nil {
		return ModelSession{}, err
	}

	modelSes := ModelSession{
		Session: session,
		Input:   inputTensor,
		Output:  outputTensor,
	}

	log.Printf("ONNX runtime (%s) initialized [%v]", ort.GetVersion(), ort.IsInitialized())
	return modelSes, err
}

The library is the latest runtime (1.19.2) from official repo, the GPU variant.

Inference is working but with similar time of CPU, the output from nvidia-smi is this:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P0              32W /  70W |   1841MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
@yalue
Copy link
Owner

yalue commented Oct 25, 2024

Thanks, and I'm glad it's working, at least somewhat!

I would expect any of the initialization functions to return an error if CUDA was not actually initialized correctly... odd. Have you verified that the library runs correctly by running go test -v -bench=. from its source directory? You'd need to set the ONNXRUNTIME_SHARED_LIBRARY_PATH environment variable to point to your GPU-enabled copy of onnxruntime.so in order to run the test, but this should give you good information on whether CUDA is enabled and working properly. (You'd specifically want to look at the output for the BenchmarkCUDASession output, and make sure it's faster than the BenchmarkOpMultiThreaded output.)

Depending on the size of the yolov8 network, it's possible that it's just not large enough to see a significant benefit from CUDA, especially with CUDA's higher overheads. However, it is indeed puzzling that nvidia-smi isn't showing anything. I've seen the current version of onnxruntime_go interact correctly with CUDA on several different systems, so I wonder if you're somehow just loading a wrong copy of the library? Let me know if the tests pass.

And sorry for the slow update, I haven't had much time to look at this project recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants