Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First load time in Nvidia Jetson AGX Xavier and Orin is more than 10 minutes #2402

Closed
deeprobo-dev opened this issue Sep 2, 2024 · 2 comments

Comments

@deeprobo-dev
Copy link

deeprobo-dev commented Sep 2, 2024

Hi,
I am trying to use whisper cpp inference for production purpose and tested it in:

  1. NVIDIA Jetson AGX Xavier:
    gpu compute capability: 7.2
    architecture: aarch64
    shared gpu ram of 32GB
  2. NVIDIA AGX orin:
    gpu compute capability: 8.7
    architecture: aarch64
    shared gpu ram of 64GB

For both devices it is taking more than 10 minutes to load which is becoming a bottleneck for me to use it in production even though inference time is good enough. I am running base model and compiled whisper server in cuda with corresponding compute capabilities.

But the same is taking very less time in my laptop with configuration:
processor: intel i7, 11gen
gpu: NVIDIA RTX 2060 (6GB)
cpu ram: 32GB
architecture: x86_64

Can you guys please help me with figuring out and solving the issue. Thanks in advance.

@deeprobo-dev deeprobo-dev changed the title First load time in Nvidia Jetson and Orin is more than 10 minutes First load time in Nvidia Jetson AGX Xavier and Orin is more than 10 minutes Sep 2, 2024
@deeprobo-dev
Copy link
Author

Found it to be a cuda issue. In cuda 11.4 first loading was slow irrespective of system architecture. When upgraded cuda it worked fine both in local as well as inside docker container. Tested with cuda 11.7, 12.2.

@aleksas
Copy link

aleksas commented Sep 28, 2024

@deeprobo-dev would be great if you could run benchmark both on xavier and orin and post results here . It would be great to have idea how performance compares to other hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants