πͺ AI Gateway:
- Unified API Signature: If you've used OpenAI, you already know how to use Numexa with any other provider.
- Interoperability: Write once, run with any provider. Switch between any model from any provider seamlessly.
- Automated Fallbacks & Retries: Ensure your application remains functional even if a primary service fails.
- Load Balancing: Efficiently distribute incoming requests among multiple models.
- Semantic Caching: Reduce costs and latency by intelligently caching results.
π¬ Observability:
- Logging: Keep track of all requests for monitoring and debugging.
- Requests Tracing: Understand the journey of each request for optimization.
- Custom Tags: Segment and categorize requests for better insights.
4οΈ Steps to Integrate the SDK
- Get your Numexa API key and your virtual key for AI providers.
- Construct your LLM, add Numexa features, provider features, and prompt.
- Construct the Numexa client and set your usage mode.
- Now call Numexa regularly like you would call your OpenAI constructor.
Let's dive in! If you are an advanced user and want to directly jump to various full-fledged examples, click here.
Numexa API Key: Log into Numexa here, then click on the API Keys link on left and "Click on Generate".
import os
os.environ["NUMEXA_API_KEY"] = "NUMEXA_API_KEY"
Numexa Without proxy:
import os
os.environ["NUMEXA_PROXY"] = "disable"
Virtual Keys: Navigate to the "API Keys" page on Numexa and hit the "Generate" button. Choose your AI provider and assign a unique name to your key. Your virtual key is ready!
Numexa Features: You can find a comprehensive list of Numexa features here. This includes settings for caching, retries, metadata, and more.
Provider Features:
Numexa is designed to be flexible. All the features you're familiar with from your LLM provider, like top_p
, top_k
, and temperature
, can be used seamlessly. Check out the complete list of provider features here.
Setting the Prompt Input:
This param lets you override any prompt that is passed during the completion call - set a model-specific prompt here to optimise the model performance. You can set the input in two ways. For models like Claude and GPT3, use prompt
= (str)
, and for models like GPT3.5 & GPT4, use messages
= [array]
.
Here's how you can combine everything:
from numexa import LLMOptions
# Numexa Config
provider = "openai"
virtual_key = "key_a"
trace_id = "numexa_sdk_test"
# Model Settings
model = "gpt-4"
temperature = 1
# User Prompt
messages = [{"role": "user", "content": "Who are you?"}]
# Construct LLM
llm = LLMOptions(provider=provider, virtual_key=virtual_key, trace_id=trace_id, model=model, temperature=temperature)
Numexa client's config takes 3 params: api_key
, mode
, llms
.
api_key
: You can set your NUmexa API key here or withos.ennviron
as done above.mode
: There are 3 modes - Single, Fallback, Loadbalance.- Single - This is the standard mode. Use it if you do not want Fallback OR Loadbalance features.
- Fallback - Set this mode if you want to enable the Fallback feature.
- Loadbalance - Set this mode if you want to enable the Loadbalance feature.
llms
: This is an array where we pass our LLMs constructed using the LLMOptions constructor.
import asyncio
import os
# For Observability (Mandatory)
os.environ["NUMEXA_API_KEY"] = "Your Key"
# By Default proxy is always Enabled, If we do not want any proxy
os.environ["NUMEXA_PROXY"] = "disable"
# We need to set OPEN_API_KEY in case of Zero Proxy Overhead or Numexa-Free-Version Expired
os.environ["OPEN_API_KEY"] = "Bearer YOURKEY"
import numexa
from numexa import Config, LLMOptions
llm = LLMOptions(provider="openai", model="gpt-4", virtual_key="a"),
numexa.config = Config(mode="single",llms=[llm])
The Numexa client can do ChatCompletions
and Completions
.
Since our LLM is GPT4, we will use ChatCompletions:
# noinspection PyUnresolvedReferences
async def jarvis():
response = await numexa.ChatCompletions.create(
messages=[{
"role": "user",
"content": "Capital Of India?"
}]
)
print(response)
You have integrated Numexa Python SDK in just 4 steps!
import asyncio
import os
# For Observability (Mandatory)
os.environ["NUMEXA_API_KEY"] = "Your Key"
# By Default proxy is always Enabled, If we do not want any proxy
os.environ["NUMEXA_PROXY"] = "disable"
# We need to set OPEN_API_KEY in case of Zero Proxy Overhead or Numexa-Free-Version Expired
os.environ["OPEN_API_KEY"] = "Bearer YOURKEY"
import numexa
from numexa import Config, LLMOptions
# Let's construct our LLMs.
llm1 = LLMOptions(provider="openai", model="gpt-3.5-turbo-16k-0613", virtual_key="a")
llm2 = LLMOptions(provider="openai", model="gpt-4", virtual_key="b")
# In case of single LLM
numexa.config = Config(mode="single", llms=[llm1, ])
OR
# In case of Multiple LLM
numexa.config = Config(mode="fallback", llms=[llm1, llm2])
async def jarvis():
response = await numexa.ChatCompletions.create(
messages=[{
"role": "user",
"content": "Who is Anu kapoor?"
}]
)
print(response)
async def main():
await asyncio.gather(jarvis())
if __name__ == '__main__':
asyncio.run(main())
Feature | Config Key | Value(Type) | Required |
---|---|---|---|
Provider Name | provider |
string |
β Required |
Model Name | model |
string |
β Required |
Virtual Key OR API Key | virtual_key or api_key |
string |
β Required (can be set externally) |
Cache Type | cache_status |
simple , semantic |
β Optional |
Force Cache Refresh | cache_force_refresh |
True , False (Boolean) |
β Optional |
Cache Age | cache_age |
integer (in seconds) |
β Optional |
Trace ID | trace_id |
string |
β Optional |
Retries | retry |
integer [0,5] |
β Optional |
Metadata | metadata |
json object More info |
β Optional |