The Portkey x LlamaIndex integration brings advanced AI gateway capabilities, full-stack observability, and prompt management to apps built on LlamaIndex.
In a nutshell, Portkey extends the familiar OpenAI schema to make Llamaindex work with 200+ LLMs without the need for importing different classes for each provider or having to configure your code separately. Portkey makes your Llamaindex apps reliable, fast, and cost-efficient.
Getting Started
1. Install the Portkey SDK
pip install -U portkey-ai
2. Import the necessary classes and functions
Import the OpenAI class in Llamaindex as you normally would, along with Portkey's helper functions createHeaders and PORTKEY_GATEWAY_URL
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
3. Configure model details
Configure your model details using Portkey's . This is where you can define the provider and model name, model parameters, set up fallbacks, retries, and more.
Here are basic integrations examples on using the complete and chat methods with streaming on & off.
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
##### Streaming Mode #####
resp = portkey.stream_chat(messages)
for r in resp:
print(r.delta, end="")
assistant: Arrr, matey! They call me Captain Barnacle Bill, the most colorful pirate to ever sail the seven seas! With a parrot on me shoulder and a treasure map in me hand, I'm always ready for adventure! What be yer name, landlubber?
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
resp=portkey.complete("Paul Graham is ")
print(resp)
##### Streaming Mode #####
resp=portkey.stream_complete("Paul Graham is ")
for r in resp:
print(r.delta, end="")
a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham is also a prolific writer and has published essays on a wide range of topics, including startups, technology, and education.
import asyncio
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"openai",
"api_key":"YOUR_OPENAI_API_KEY",
"override_params": {
"model":"gpt-4o",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
async def main():
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="PORTKEY_API_KEY",
config=config
)
)
resp = await portkey.acomplete("Paul Graham is ")
print(resp)
##### Streaming Mode #####
resp = await portkey.astream_complete("Paul Graham is ")
async for delta in resp:
print(delta.delta, end="")
asyncio.run(main())
Enabling Portkey Features
By routing your LlamaIndex requests through Portkey, you get access to the following production-grade features:
Much of these features are driven by Portkey's Config architecture. On the Portkey app, we make it easy to help you create, manage, and version your Configs so that you can reference them easily in Llamaindex.
Saving Configs in the Portkey App
Head over to the Configs tab in Portkey app where you can save various provider Configs along with the reliability and caching features. Each Config has an associated slug that you can reference in your Llamaindex code.
Overriding a Saved Config
If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.
Here's an example of how you can fetch a saved Config using the Configs API and override the model parameter:
We define a helper function get_customized_config that takes a config_slug and a model as parameters.
Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided config_slug.
We extract the config object from the API response.
We update the model parameter in the override_params section of the Config with the provided custom_model.
Finally, we return the customized Config.
We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific model override is applied to the saved Config.
1. Interoperability - Calling Anthropic, Gemini, Mistral, and more
Switching providers just requires changing 3 lines of code:
Change the provider name
Change the API key, and
Change the model name
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"anthropic",
"api_key":"YOUR_ANTHROPIC_API_KEY",
"override_params": {
"model":"claude-3-opus-20240229",
"max_tokens":64
}
}
#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"google",
"api_key":"YOUR_GOOGLE_GEMINI_API_KEY",
"override_params": {
"model":"gemini-1.5-flash-latest",
"max_tokens":64
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-gemini-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"mistral-ai",
"api_key":"YOUR_MISTRAL_AI_API_KEY",
"override_params": {
"model":"codestral-latest",
"max_tokens":64
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-mistral-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
Calling Azure, Google Vertex, AWS Bedrock
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"AZURE_OPENAI_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"AWS_BEDROCK_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-bedrock-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request - you can do this while by sending it as the api_key in the OpenAI client.
Run gcloud auth print-access-token in your terminal to get your Vertex AI access token.
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"virtual_key":"VERTEX_AI_PORTKEY_VIRTUAL_KEY"
}
#### You can also reference a saved Config instead ####
#### config = "pc-vertex-xx"
portkey = OpenAI(
api_key="YOUR_VERTEX_AI_ACCESS_TOKEN", # Get by running gcloud auth print-access-token in terminal
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
Calling Local or Privately Hosted Models like Ollama
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = {
"provider":"ollama",
"custom_host":"https://7cc4-3-235-157-146.ngrok-free.app", # Your Ollama ngrok URL
"override_params": {
"model":"llama3"
}
}
#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
2. Caching
You can speed up your requests and save money on your LLM requests by storing past responses in the Portkey cache. There are 2 cache modes:
Simple: Matches requests verbatim. Perfect for repeated, identical prompts. Works on all models including image generation models.
Semantic: Matches responses for requests that are semantically similar. Ideal for denoising requests with extra prepositions, pronouns, etc.
Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, or set request timeouts - all set through Configs.
Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more.
Using Portkey, you can also send custom metadata with each of your requests to further segment your logs for better analytics. Similarly, you can also trace multiple requests to a single trace ID and filter or view them separately in Portkey logs.
Custom Metadata and Trace ID information is sent in default_headers.
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = "pc-xxxx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config,
metadata={
"_user": "USER_ID",
"environment": "production",
"session_id": "1729"
}
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config = "pc-xxxx"
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
config=config,
trace_id="YOUR_TRACE_ID_HERE"
)
)
messages = [
ChatMessage(role="system", content="You are a pirate with a colorful personality"),
ChatMessage(role="user", content="What is your name"),
]
resp = portkey.chat(messages)
print(resp)
Portkey shows these details separately for each log:
5. Prompt Management
Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:
Store Prompts with Access Control and Version Control: Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.
Experiment in a Sandbox Environment: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.
Here's how you can leverage Portkey's Prompt Management in your LlamaIndex application:
Create your prompt template on the Portkey app, and save it to get an associated Prompt ID
Before making a Llamaindex request, render the prompt template using the Portkey SDK
Transform the retrieved prompt to be compatible with LlamaIndex and send the request!
Example: Using a Portkey Prompt Template in LlamaIndex
import json
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey
### Initialize Portkey client with API key
client = Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))
### Render the prompt template with your prompt ID and variables
prompt_template = client.prompts.render(
prompt_id="pp-prompt-id",
variables={ "movie":"Dune 2" }
).data.dict()
config = {
"virtual_key":"GROQ_VIRTUAL_KEY", # You need to send the virtual key separately
"override_params":{
"model":prompt_template["model"], # Set the model name based on the value in the prompt template
"temperature":prompt_template["temperature"] # Similarly, you can also set other model params
}
}
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
api_key=os.environ.get("PORTKEY_API_KEY"),
config=config
)
)
### Transform the rendered prompt into LlamaIndex-compatible format
messages = [ChatMessage(content=msg["content"], role=msg["role"]) for msg in prompt_template["messages"]]
resp = portkey.chat(messages)
print(resp)
6. Continuous Improvement
Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!
You can append qualitative as well as quantitative feedback to any trace ID with the portkey.feedback.create method:
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY"
)
feedback = portkey.feedback.create(
trace_id="YOUR_LLAMAINDEX_TRACE_ID",
value=5, # Integer between -10 and 10
weight=1, # Optional
metadata={
# Pass any additional context here like comments, _user and more
}
)
print(feedback)
7. Security & Compliance
When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set budget limits on provide API keys and implement fine-grained user roles and permissions to:
Control access: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.
Manage costs: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.
Ensure compliance: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.
Simplify onboarding: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.
Monitor usage: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.
Join Portkey Community
Join the Portkey Discord to connect with other practitioners, discuss your LlamaIndex projects, and get help troubleshooting your queries.
: Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
: Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
: Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.
: Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.
: Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.
: Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.
: Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
For more details on working with Configs in Portkey, refer to the
Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show Anthropic, Gemini, and Mistral. For the full list of providers & LLMs supported, check out .
We recommend saving your cloud details to and getting a corresponding Virtual Key.
.
Check out and .
.
To enable Portkey cache, just add the cache params to your .
.
Explore deeper documentation for each feature here - , , , .
Parameterize Prompts: Define variables and within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.
.
.
.
For more detailed information on each feature and how to use them, please refer to the .