Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Reimplement Load Model of Triton and MLServer #53

Open
WaterKnight1998 opened this issue Jul 25, 2023 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@WaterKnight1998
Copy link

WaterKnight1998 commented Jul 25, 2023

Good afternoon,

Thank you very much for creating this amazing framework.

I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of triton and mlserver adapters use the following method: CalcMemCapacity to return model size.

This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable? @tjohnson31415 @rafvasq @njhill @pvaneck

I am glad to help if you think is doable, but I don't have experience in GO, but I can learn

@ckadner ckadner added the enhancement New feature or request label Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants