Portkey Docs
HomeAPIIntegrationsChangelog
  • Introduction
    • What is Portkey?
    • Make Your First Request
    • Feature Overview
  • Integrations
    • LLMs
      • OpenAI
        • Structured Outputs
        • Prompt Caching
      • Anthropic
        • Prompt Caching
      • Google Gemini
      • Groq
      • Azure OpenAI
      • AWS Bedrock
      • Google Vertex AI
      • Bring Your Own LLM
      • AI21
      • Anyscale
      • Cerebras
      • Cohere
      • Fireworks
      • Deepbricks
      • Deepgram
      • Deepinfra
      • Deepseek
      • Google Palm
      • Huggingface
      • Inference.net
      • Jina AI
      • Lingyi (01.ai)
      • LocalAI
      • Mistral AI
      • Monster API
      • Moonshot
      • Nomic
      • Novita AI
      • Ollama
      • OpenRouter
      • Perplexity AI
      • Predibase
      • Reka AI
      • SambaNova
      • Segmind
      • SiliconFlow
      • Stability AI
      • Together AI
      • Voyage AI
      • Workers AI
      • ZhipuAI / ChatGLM / BigModel
      • Suggest a new integration!
    • Agents
      • Autogen
      • Control Flow
      • CrewAI
      • Langchain Agents
      • LlamaIndex
      • Phidata
      • Bring Your own Agents
    • Libraries
      • Autogen
      • DSPy
      • Instructor
      • Langchain (Python)
      • Langchain (JS/TS)
      • LlamaIndex (Python)
      • LibreChat
      • Promptfoo
      • Vercel
        • Vercel [Depricated]
  • Product
    • Observability (OpenTelemetry)
      • Logs
      • Tracing
      • Analytics
      • Feedback
      • Metadata
      • Filters
      • Logs Export
      • Budget Limits
    • AI Gateway
      • Universal API
      • Configs
      • Multimodal Capabilities
        • Image Generation
        • Function Calling
        • Vision
        • Speech-to-Text
        • Text-to-Speech
      • Cache (Simple & Semantic)
      • Fallbacks
      • Automatic Retries
      • Load Balancing
      • Conditional Routing
      • Request Timeouts
      • Canary Testing
      • Virtual Keys
        • Budget Limits
    • Prompt Library
      • Prompt Templates
      • Prompt Partials
      • Retrieve Prompts
      • Advanced Prompting with JSON Mode
    • Guardrails
      • List of Guardrail Checks
        • Patronus AI
        • Aporia
        • Pillar
        • Bring Your Own Guardrails
      • Creating Raw Guardrails (in JSON)
    • Autonomous Fine-tuning
    • Enterprise Offering
      • Org Management
        • Organizations
        • Workspaces
        • User Roles & Permissions
        • API Keys (AuthN and AuthZ)
      • Access Control Management
      • Budget Limits
      • Security @ Portkey
      • Logs Export
      • Private Cloud Deployments
        • Architecture
        • AWS
        • GCP
        • Azure
        • Cloudflare Workers
        • F5 App Stack
      • Components
        • Log Store
          • MongoDB
    • Open Source
    • Portkey Pro & Enterprise Plans
  • API Reference
    • Introduction
    • Authentication
    • OpenAPI Specification
    • Headers
    • Response Schema
    • Gateway Config Object
    • SDK
  • Provider Endpoints
    • Supported Providers
    • Chat
    • Embeddings
    • Images
      • Create Image
      • Create Image Edit
      • Create Image Variation
    • Audio
      • Create Speech
      • Create Transcription
      • Create Translation
    • Fine-tuning
      • Create Fine-tuning Job
      • List Fine-tuning Jobs
      • Retrieve Fine-tuning Job
      • List Fine-tuning Events
      • List Fine-tuning Checkpoints
      • Cancel Fine-tuning
    • Batch
      • Create Batch
      • List Batch
      • Retrieve Batch
      • Cancel Batch
    • Files
      • Upload File
      • List Files
      • Retrieve File
      • Retrieve File Content
      • Delete File
    • Moderations
    • Assistants API
      • Assistants
        • Create Assistant
        • List Assistants
        • Retrieve Assistant
        • Modify Assistant
        • Delete Assistant
      • Threads
        • Create Thread
        • Retrieve Thread
        • Modify Thread
        • Delete Thread
      • Messages
        • Create Message
        • List Messages
        • Retrieve Message
        • Modify Message
        • Delete Message
      • Runs
        • Create Run
        • Create Thread and Run
        • List Runs
        • Retrieve Run
        • Modify Run
        • Submit Tool Outputs to Run
        • Cancel Run
      • Run Steps
        • List Run Steps
        • Retrieve Run Steps
    • Completions
    • Gateway for Other API Endpoints
  • Portkey Endpoints
    • Configs
      • Create Config
      • List Configs
      • Retrieve Config
      • Update Config
    • Feedback
      • Create Feedback
      • Update Feedback
    • Guardrails
    • Logs
      • Insert a Log
      • Log Exports [BETA]
        • Retrieve a Log Export
        • Update a Log Export
        • List Log Exports
        • Create a Log Export
        • Start a Log Export
        • Cancel a Log Export
        • Download a Log Export
    • Prompts
      • Prompt Completion
      • Render
    • Virtual Keys
      • Create Virtual Key
      • List Virtual Keys
      • Retrieve Virtual Key
      • Update Virtual Key
      • Delete Virtual Key
    • Analytics
      • Graphs - Time Series Data
        • Get Requests Data
        • Get Cost Data
        • Get Latency Data
        • Get Tokens Data
        • Get Users Data
        • Get Requests per User
        • Get Errors Data
        • Get Error Rate Data
        • Get Status Code Data
        • Get Unique Status Code Data
        • Get Rescued Requests Data
        • Get Cache Hit Rate Data
        • Get Cache Hit Latency Data
        • Get Feedback Data
        • Get Feedback Score Distribution Data
        • Get Weighted Feeback Data
        • Get Feedback Per AI Models
      • Summary
        • Get All Cache Data
      • Groups - Paginated Data
        • Get User Grouped Data
        • Get Model Grouped Data
        • Get Metadata Grouped Data
    • API Keys [BETA]
      • Update API Key
      • Create API Key
      • Delete an API Key
      • Retrieve an API Key
      • List API Keys
    • Admin
      • Users
        • Retrieve a User
        • Retrieve All Users
        • Update a User
        • Remove a User
      • User Invites
        • Invite a User
        • Retrieve an Invite
        • Retrieve All User Invites
        • Delete a User Invite
      • Workspaces
        • Create Workspace
        • Retrieve All Workspaces
        • Retrieve a Workspace
        • Update Workspace
        • Delete a Workspace
      • Workspace Members
        • Add a Workspace Member
        • Retrieve All Workspace Members
        • Retrieve a Workspace Member
        • Update Workspace Member
        • Remove Workspace Member
  • Guides
    • Getting Started
      • A/B Test Prompts and Models
      • Tackling Rate Limiting
      • Function Calling
      • Image Generation
      • Getting started with AI Gateway
      • Llama 3 on Groq
      • Return Repeat Requests from Cache
      • Trigger Automatic Retries on LLM Failures
      • 101 on Portkey's Gateway Configs
    • Integrations
      • Llama 3 on Portkey + Together AI
      • Introduction to GPT-4o
      • Anyscale
      • Mistral
      • Vercel AI
      • Deepinfra
      • Groq
      • Langchain
      • Mixtral 8x22b
      • Segmind
    • Use Cases
      • Few-Shot Prompting
      • Enforcing JSON Schema with Anyscale & Together
      • Detecting Emotions with GPT-4o
      • Build an article suggestion app with Supabase pgvector, and Portkey
      • Setting up resilient Load balancers with failure-mitigating Fallbacks
      • Run Portkey on Prompts from Langchain Hub
      • Smart Fallback with Model-Optimized Prompts
      • How to use OpenAI SDK with Portkey Prompt Templates
      • Setup OpenAI -> Azure OpenAI Fallback
      • Fallback from SDXL to Dall-e-3
      • Comparing Top10 LMSYS Models with Portkey
      • Build a chatbot using Portkey's Prompt Templates
  • Support
    • Contact Us
    • Developer Forum
    • Common Errors & Resolutions
    • December '23 Migration
    • Changelog
Powered by GitBook
On this page
  • Getting Started
  • 1. Install the Portkey SDK
  • 2. Import the necessary classes and functions
  • 3. Configure model details
  • 4. Pass Config details to OpenAI client with necessary headers
  • Example: OpenAI
  • Enabling Portkey Features
  • Saving Configs in the Portkey App
  • Overriding a Saved Config
  • 1. Interoperability - Calling Anthropic, Gemini, Mistral, and more
  • Calling Azure, Google Vertex, AWS Bedrock
  • Calling Local or Privately Hosted Models like Ollama
  • 2. Caching
  • 3. Reliability
  • 4. Observability
  • 5. Prompt Management
  • 6. Continuous Improvement
  • 7. Security & Compliance
  • Join Portkey Community

Was this helpful?

Edit on GitHub
  1. Integrations
  2. Libraries

LlamaIndex (Python)

PreviousLangchain (JS/TS)NextLibreChat

Last updated 10 months ago

Was this helpful?

The Portkey x LlamaIndex integration brings advanced AI gateway capabilities, full-stack observability, and prompt management to apps built on LlamaIndex.

In a nutshell, Portkey extends the familiar OpenAI schema to make Llamaindex work with 200+ LLMs without the need for importing different classes for each provider or having to configure your code separately. Portkey makes your Llamaindex apps reliable, fast, and cost-efficient.

Getting Started

1. Install the Portkey SDK

pip install -U portkey-ai

2. Import the necessary classes and functions

Import the OpenAI class in Llamaindex as you normally would, along with Portkey's helper functions createHeaders and PORTKEY_GATEWAY_URL

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

3. Configure model details

Configure your model details using Portkey's . This is where you can define the provider and model name, model parameters, set up fallbacks, retries, and more.

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

4. Pass Config details to OpenAI client with necessary headers

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

Example: OpenAI

Here are basic integrations examples on using the complete and chat methods with streaming on & off.

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

##### Streaming Mode #####

resp = portkey.stream_chat(messages)

for r in resp:
    print(r.delta, end="")

assistant: Arrr, matey! They call me Captain Barnacle Bill, the most colorful pirate to ever sail the seven seas! With a parrot on me shoulder and a treasure map in me hand, I'm always ready for adventure! What be yer name, landlubber?

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

resp=portkey.complete("Paul Graham is ")
print(resp)

##### Streaming Mode #####

resp=portkey.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")

a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham is also a prolific writer and has published essays on a wide range of topics, including startups, technology, and education.

import asyncio
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"openai",
    "api_key":"YOUR_OPENAI_API_KEY",
    "override_params": {
        "model":"gpt-4o",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

async def main():
    portkey = OpenAI(
        api_base=PORTKEY_GATEWAY_URL,
        default_headers=createHeaders(
            api_key="PORTKEY_API_KEY",
            config=config
        )
    )
    
    resp = await portkey.acomplete("Paul Graham is ")
    print(resp)
    
    ##### Streaming Mode #####
    
    resp = await portkey.astream_complete("Paul Graham is ")
    async for delta in resp:
        print(delta.delta, end="")

asyncio.run(main())

Enabling Portkey Features

By routing your LlamaIndex requests through Portkey, you get access to the following production-grade features:

Much of these features are driven by Portkey's Config architecture. On the Portkey app, we make it easy to help you create, manage, and version your Configs so that you can reference them easily in Llamaindex.

Saving Configs in the Portkey App

Head over to the Configs tab in Portkey app where you can save various provider Configs along with the reliability and caching features. Each Config has an associated slug that you can reference in your Llamaindex code.

Overriding a Saved Config

If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.

Here's an example of how you can fetch a saved Config using the Configs API and override the model parameter:

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import requests
import os

def create_config(config_slug,model):
    url = f'https://api.portkey.ai/v1/configs/{config_slug}'
    headers = {
        'x-portkey-api-key': os.environ.get("PORTKEY_API_KEY"),
        'content-type': 'application/json'
    }
    response = requests.get(url, headers=headers).json()
    config = json.loads(response['config'])
    config['override_params']['model']=model
    return config

config=create_config("pc-llamaindex-xx","gpt-4-turbo")

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key=os.environ.get("PORTKEY_API_KEY"),
        config=config
    )
)

messages = [ChatMessage(role="user", content="1729")]

resp = portkey.chat(messages)
print(resp)

In this example:

  1. We define a helper function get_customized_config that takes a config_slug and a model as parameters.

  2. Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided config_slug.

  3. We extract the config object from the API response.

  4. We update the model parameter in the override_params section of the Config with the provided custom_model.

  5. Finally, we return the customized Config.

We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific model override is applied to the saved Config.


1. Interoperability - Calling Anthropic, Gemini, Mistral, and more

Switching providers just requires changing 3 lines of code:

  1. Change the provider name

  2. Change the API key, and

  3. Change the model name

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"anthropic",
    "api_key":"YOUR_ANTHROPIC_API_KEY",
    "override_params": {
        "model":"claude-3-opus-20240229",
        "max_tokens":64
    }
}

#### You can also reference a saved Config ####
#### config = "pc-anthropic-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"google",
    "api_key":"YOUR_GOOGLE_GEMINI_API_KEY",
    "override_params": {
        "model":"gemini-1.5-flash-latest",
        "max_tokens":64
    }
}

#### You can also reference a saved Config instead ####
#### config = "pc-gemini-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    }
}

#### You can also reference a saved Config instead ####
#### config = "pc-mistral-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

Calling Azure, Google Vertex, AWS Bedrock

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"AZURE_OPENAI_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"AWS_BEDROCK_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-bedrock-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request - you can do this while by sending it as the api_key in the OpenAI client. Run gcloud auth print-access-token in your terminal to get your Vertex AI access token.

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "virtual_key":"VERTEX_AI_PORTKEY_VIRTUAL_KEY"
}

#### You can also reference a saved Config instead ####
#### config = "pc-vertex-xx"

portkey = OpenAI(
    api_key="YOUR_VERTEX_AI_ACCESS_TOKEN", # Get by running gcloud auth print-access-token in terminal
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

Calling Local or Privately Hosted Models like Ollama

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "provider":"ollama",
    "custom_host":"https://7cc4-3-235-157-146.ngrok-free.app", # Your Ollama ngrok URL
    "override_params": {
        "model":"llama3"
    }    
}

#### You can also reference a saved Config instead ####
#### config = "pc-azure-xx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

2. Caching

You can speed up your requests and save money on your LLM requests by storing past responses in the Portkey cache. There are 2 cache modes:

  • Simple: Matches requests verbatim. Perfect for repeated, identical prompts. Works on all models including image generation models.

  • Semantic: Matches responses for requests that are semantically similar. Ideal for denoising requests with extra prepositions, pronouns, etc.

config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    },
    "cache": {
        "mode": "simple",
        "max_age": 60000
  }
}
config = {
    "provider":"mistral-ai",
    "api_key":"YOUR_MISTRAL_AI_API_KEY",
    "override_params": {
        "model":"codestral-latest",
        "max_tokens":64
    },
    "cache": {
        "mode": "semantic",
        "max_age": 60000
  }
}

3. Reliability

Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, or set request timeouts - all set through Configs.

config = {
    "strategy": {
      "mode": "fallback"
    },
    "targets": [
    {
      "virtual_key": "openai-virtual-key",
      "override_params": {
        "model": "gpt-4o"
      }
    },
    {
      "virtual_key": "anthropic-virtual-key",
      "override_params": {
          "model": "claude-3-opus-20240229",
          "max_tokens":64
      }
    }
  ]
}
config = {
    "strategy": {
      "mode": "loadbalance"
    },
    "targets": [
    {
      "virtual_key": "openai-virtual-key-1",
      "weight":1
    },
    {
      "virtual_key": "openai-virtual-key-2",
      "weight":1
    }
  ]
}
config = {
    "retry": {
        "attempts": 5
    },
    "virtual_key": "virtual-key-xxx"
}
config = {
  "strategy": { "mode": "fallback" },
  "request_timeout": 10000,
  "targets": [
    { "virtual_key": "open-ai-xxx" },
    { "virtual_key": "azure-open-ai-xxx" }
  ]
}

4. Observability

Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more.

Using Portkey, you can also send custom metadata with each of your requests to further segment your logs for better analytics. Similarly, you can also trace multiple requests to a single trace ID and filter or view them separately in Portkey logs.

Custom Metadata and Trace ID information is sent in default_headers.

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = "pc-xxxx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config,
        metadata={
            "_user": "USER_ID",
            "environment": "production",
            "session_id": "1729"
        }
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = "pc-xxxx"

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="YOUR_PORTKEY_API_KEY",
        config=config,
        trace_id="YOUR_TRACE_ID_HERE"
    )
)

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]

resp = portkey.chat(messages)
print(resp)

Portkey shows these details separately for each log:


5. Prompt Management

Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:

  • Store Prompts with Access Control and Version Control: Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.

  • Experiment in a Sandbox Environment: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.

Here's how you can leverage Portkey's Prompt Management in your LlamaIndex application:

  1. Create your prompt template on the Portkey app, and save it to get an associated Prompt ID

  2. Before making a Llamaindex request, render the prompt template using the Portkey SDK

  3. Transform the retrieved prompt to be compatible with LlamaIndex and send the request!

Example: Using a Portkey Prompt Template in LlamaIndex

import json
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey

###  Initialize Portkey client with API key

client = Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))

### Render the prompt template with your prompt ID and variables

prompt_template = client.prompts.render(
    prompt_id="pp-prompt-id",
    variables={ "movie":"Dune 2" }
).data.dict()

config = {
    "virtual_key":"GROQ_VIRTUAL_KEY", # You need to send the virtual key separately
    "override_params":{
        "model":prompt_template["model"], # Set the model name based on the value in the prompt template
        "temperature":prompt_template["temperature"] # Similarly, you can also set other model params
    }
}

portkey = OpenAI(
    api_base=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key=os.environ.get("PORTKEY_API_KEY"),
        config=config
    )
)

### Transform the rendered prompt into LlamaIndex-compatible format

messages = [ChatMessage(content=msg["content"], role=msg["role"]) for msg in prompt_template["messages"]]

resp = portkey.chat(messages)
print(resp)

6. Continuous Improvement

Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!

You can append qualitative as well as quantitative feedback to any trace ID with the portkey.feedback.create method:

from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY"
)

feedback = portkey.feedback.create(
    trace_id="YOUR_LLAMAINDEX_TRACE_ID",
    value=5,  # Integer between -10 and 10
    weight=1,  # Optional
    metadata={
        # Pass any additional context here like comments, _user and more
    }
)

print(feedback)

7. Security & Compliance

When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set budget limits on provide API keys and implement fine-grained user roles and permissions to:

  • Control access: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.

  • Manage costs: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.

  • Ensure compliance: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.

  • Simplify onboarding: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.

  • Monitor usage: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.


Join Portkey Community

Join the Portkey Discord to connect with other practitioners, discuss your LlamaIndex projects, and get help troubleshooting your queries.

: Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.

: Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.

: Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.

: Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.

: Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.

: Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.

: Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.

For more details on working with Configs in Portkey, refer to the

Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show Anthropic, Gemini, and Mistral. For the full list of providers & LLMs supported, check out .

We recommend saving your cloud details to and getting a corresponding Virtual Key.

.

Check out and .

.

To enable Portkey cache, just add the cache params to your .

.

Explore deeper documentation for each feature here - , , , .

Parameterize Prompts: Define variables and within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.

.

.

.

For more detailed information on each feature and how to use them, please refer to the .

Config object schema
Config documentation.
this doc
Portkey vault
Explore the Virtual Key documentation here
Portkey docs for Ollama
other privately hosted models
Explore full list of the providers supported on Portkey here
config object
For more cache settings, check out the documentation here
Fallbacks
Loadbalancing
Retries
Timeouts
Check out Observability docs here.
Explore Prompt Management docs here
Check out the Feedback documentation for a deeper dive
Read more about Portkey's Security & Enterprise offerings here
Link to Discord
Portkey Documentation
Interoperability
Caching
Reliability
Observability
Prompt Management
Continuous Improvement
Security & Compliance
mustache-approved tags