You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to check if there is any use cases in the community using open-inference-protocol on LLMs andis there a road map to natively support or extend open-inference-protocol to have better support in LLMs?
The good thing about open-inference-protocol is it standardizes the way people interact with models, and it is very useful when it comes to developing a transformer (pre/post-processing) and integrating different transformers and predictors into an inference graph. A standard protocol also makes it easy to develop a serving runtime that supports different kind of LLMs.
Thanks.
The text was updated successfully, but these errors were encountered:
captured some of my thoughts on the API Spec discussed here.
However, I want to bring up a topic on the API Spec. Currently the schema is following to HF. However, from my experience of playing around with LLM and some of the third-party toolkit building on top of LLM/ChatGPT so far, I feel like openAI spec will be the better option as most of the third-party LLM applications are supporting it out of the box, which means a better ecosystem and user experience. If user deployed a LLM model in KServe however the API cannot be used in the LLM toolkit, it will be hard to promote it.
With the increasing popularity of LLMs, many companies have started to look into deploying LLMs.
Instead of
infer/predict
,completions
andembeddings
are being used. Most of the API supportsstream
.Example API spec:
I would like to check if there is any
use cases
in the community using open-inference-protocol on LLMs andis there aroad map
to nativelysupport
orextend
open-inference-protocol to have better support in LLMs?The good thing about open-inference-protocol is it standardizes the way people interact with models, and it is very useful when it comes to developing a transformer (pre/post-processing) and integrating different transformers and predictors into an
inference graph
. A standard protocol also makes it easy to develop a serving runtime that supports different kind of LLMs.Thanks.
The text was updated successfully, but these errors were encountered: