Portkey Docs
HomeAPIIntegrationsChangelog
  • Introduction
    • What is Portkey?
    • Make Your First Request
    • Feature Overview
  • Integrations
    • LLMs
      • OpenAI
        • Structured Outputs
        • Prompt Caching
      • Anthropic
        • Prompt Caching
      • Google Gemini
      • Groq
      • Azure OpenAI
      • AWS Bedrock
      • Google Vertex AI
      • Bring Your Own LLM
      • AI21
      • Anyscale
      • Cerebras
      • Cohere
      • Fireworks
      • Deepbricks
      • Deepgram
      • Deepinfra
      • Deepseek
      • Google Palm
      • Huggingface
      • Inference.net
      • Jina AI
      • Lingyi (01.ai)
      • LocalAI
      • Mistral AI
      • Monster API
      • Moonshot
      • Nomic
      • Novita AI
      • Ollama
      • OpenRouter
      • Perplexity AI
      • Predibase
      • Reka AI
      • SambaNova
      • Segmind
      • SiliconFlow
      • Stability AI
      • Together AI
      • Voyage AI
      • Workers AI
      • ZhipuAI / ChatGLM / BigModel
      • Suggest a new integration!
    • Agents
      • Autogen
      • Control Flow
      • CrewAI
      • Langchain Agents
      • LlamaIndex
      • Phidata
      • Bring Your own Agents
    • Libraries
      • Autogen
      • DSPy
      • Instructor
      • Langchain (Python)
      • Langchain (JS/TS)
      • LlamaIndex (Python)
      • LibreChat
      • Promptfoo
      • Vercel
        • Vercel [Depricated]
  • Product
    • Observability (OpenTelemetry)
      • Logs
      • Tracing
      • Analytics
      • Feedback
      • Metadata
      • Filters
      • Logs Export
      • Budget Limits
    • AI Gateway
      • Universal API
      • Configs
      • Multimodal Capabilities
        • Image Generation
        • Function Calling
        • Vision
        • Speech-to-Text
        • Text-to-Speech
      • Cache (Simple & Semantic)
      • Fallbacks
      • Automatic Retries
      • Load Balancing
      • Conditional Routing
      • Request Timeouts
      • Canary Testing
      • Virtual Keys
        • Budget Limits
    • Prompt Library
      • Prompt Templates
      • Prompt Partials
      • Retrieve Prompts
      • Advanced Prompting with JSON Mode
    • Guardrails
      • List of Guardrail Checks
        • Patronus AI
        • Aporia
        • Pillar
        • Bring Your Own Guardrails
      • Creating Raw Guardrails (in JSON)
    • Autonomous Fine-tuning
    • Enterprise Offering
      • Org Management
        • Organizations
        • Workspaces
        • User Roles & Permissions
        • API Keys (AuthN and AuthZ)
      • Access Control Management
      • Budget Limits
      • Security @ Portkey
      • Logs Export
      • Private Cloud Deployments
        • Architecture
        • AWS
        • GCP
        • Azure
        • Cloudflare Workers
        • F5 App Stack
      • Components
        • Log Store
          • MongoDB
    • Open Source
    • Portkey Pro & Enterprise Plans
  • API Reference
    • Introduction
    • Authentication
    • OpenAPI Specification
    • Headers
    • Response Schema
    • Gateway Config Object
    • SDK
  • Provider Endpoints
    • Supported Providers
    • Chat
    • Embeddings
    • Images
      • Create Image
      • Create Image Edit
      • Create Image Variation
    • Audio
      • Create Speech
      • Create Transcription
      • Create Translation
    • Fine-tuning
      • Create Fine-tuning Job
      • List Fine-tuning Jobs
      • Retrieve Fine-tuning Job
      • List Fine-tuning Events
      • List Fine-tuning Checkpoints
      • Cancel Fine-tuning
    • Batch
      • Create Batch
      • List Batch
      • Retrieve Batch
      • Cancel Batch
    • Files
      • Upload File
      • List Files
      • Retrieve File
      • Retrieve File Content
      • Delete File
    • Moderations
    • Assistants API
      • Assistants
        • Create Assistant
        • List Assistants
        • Retrieve Assistant
        • Modify Assistant
        • Delete Assistant
      • Threads
        • Create Thread
        • Retrieve Thread
        • Modify Thread
        • Delete Thread
      • Messages
        • Create Message
        • List Messages
        • Retrieve Message
        • Modify Message
        • Delete Message
      • Runs
        • Create Run
        • Create Thread and Run
        • List Runs
        • Retrieve Run
        • Modify Run
        • Submit Tool Outputs to Run
        • Cancel Run
      • Run Steps
        • List Run Steps
        • Retrieve Run Steps
    • Completions
    • Gateway for Other API Endpoints
  • Portkey Endpoints
    • Configs
      • Create Config
      • List Configs
      • Retrieve Config
      • Update Config
    • Feedback
      • Create Feedback
      • Update Feedback
    • Guardrails
    • Logs
      • Insert a Log
      • Log Exports [BETA]
        • Retrieve a Log Export
        • Update a Log Export
        • List Log Exports
        • Create a Log Export
        • Start a Log Export
        • Cancel a Log Export
        • Download a Log Export
    • Prompts
      • Prompt Completion
      • Render
    • Virtual Keys
      • Create Virtual Key
      • List Virtual Keys
      • Retrieve Virtual Key
      • Update Virtual Key
      • Delete Virtual Key
    • Analytics
      • Graphs - Time Series Data
        • Get Requests Data
        • Get Cost Data
        • Get Latency Data
        • Get Tokens Data
        • Get Users Data
        • Get Requests per User
        • Get Errors Data
        • Get Error Rate Data
        • Get Status Code Data
        • Get Unique Status Code Data
        • Get Rescued Requests Data
        • Get Cache Hit Rate Data
        • Get Cache Hit Latency Data
        • Get Feedback Data
        • Get Feedback Score Distribution Data
        • Get Weighted Feeback Data
        • Get Feedback Per AI Models
      • Summary
        • Get All Cache Data
      • Groups - Paginated Data
        • Get User Grouped Data
        • Get Model Grouped Data
        • Get Metadata Grouped Data
    • API Keys [BETA]
      • Update API Key
      • Create API Key
      • Delete an API Key
      • Retrieve an API Key
      • List API Keys
    • Admin
      • Users
        • Retrieve a User
        • Retrieve All Users
        • Update a User
        • Remove a User
      • User Invites
        • Invite a User
        • Retrieve an Invite
        • Retrieve All User Invites
        • Delete a User Invite
      • Workspaces
        • Create Workspace
        • Retrieve All Workspaces
        • Retrieve a Workspace
        • Update Workspace
        • Delete a Workspace
      • Workspace Members
        • Add a Workspace Member
        • Retrieve All Workspace Members
        • Retrieve a Workspace Member
        • Update Workspace Member
        • Remove Workspace Member
  • Guides
    • Getting Started
      • A/B Test Prompts and Models
      • Tackling Rate Limiting
      • Function Calling
      • Image Generation
      • Getting started with AI Gateway
      • Llama 3 on Groq
      • Return Repeat Requests from Cache
      • Trigger Automatic Retries on LLM Failures
      • 101 on Portkey's Gateway Configs
    • Integrations
      • Llama 3 on Portkey + Together AI
      • Introduction to GPT-4o
      • Anyscale
      • Mistral
      • Vercel AI
      • Deepinfra
      • Groq
      • Langchain
      • Mixtral 8x22b
      • Segmind
    • Use Cases
      • Few-Shot Prompting
      • Enforcing JSON Schema with Anyscale & Together
      • Detecting Emotions with GPT-4o
      • Build an article suggestion app with Supabase pgvector, and Portkey
      • Setting up resilient Load balancers with failure-mitigating Fallbacks
      • Run Portkey on Prompts from Langchain Hub
      • Smart Fallback with Model-Optimized Prompts
      • How to use OpenAI SDK with Portkey Prompt Templates
      • Setup OpenAI -> Azure OpenAI Fallback
      • Fallback from SDXL to Dall-e-3
      • Comparing Top10 LMSYS Models with Portkey
      • Build a chatbot using Portkey's Prompt Templates
  • Support
    • Contact Us
    • Developer Forum
    • Common Errors & Resolutions
    • December '23 Migration
    • Changelog
Powered by GitBook
On this page
  • 1. Import the SDK and authenticate Portkey
  • 2. Create Configs: Loadbalance with Nested Fallbacks
  • 3. Make a Request
  • 4. Trace the request from the logs
  • 5. Advanced: Canary Testing
  • Considerations

Was this helpful?

Edit on GitHub
  1. Guides
  2. Use Cases

Setting up resilient Load balancers with failure-mitigating Fallbacks

PreviousBuild an article suggestion app with Supabase pgvector, and PortkeyNextRun Portkey on Prompts from Langchain Hub

Last updated 10 months ago

Was this helpful?

Companies often face challenges of scaling their services efficiently as the traffic to their applications grow - when you’re consuming APIs, the first point of failure is that if you hit the API too much, you can get rate limited. Loadbalancing is a proven way to scale usage horizontally without overburdening any one provider and thus staying within rate limits.

For your AI app, rate limits are even more stringent, and if you start hitting the providers’ rate limits, there’s nothing you can do except wait to cool down and try again. With Portkey, we help you solve this very easily.

This cookbook will teach you how to utilize Portkey to distribute traffic across multiple LLMs, ensuring that your loadbalancer is robust by setting up backups for requests. Additionally, you will learn how to load balance across OpenAI and Anthropic, leveraging the powerful Claude-3 models recently developed by Anthropic, with Azure serving as the fallback layer.

Prerequisites:

You should have the . Please sign up to obtain it. Additionally, you should have stored the OpenAI, Azure OpenAI, and Anthropic details in the .

1. Import the SDK and authenticate Portkey

Start by installing the portkey-ai to your NodeJS project.

npm i --save portkey-ai

Once installed, you can import it and instantiate it with the API key to your Portkey account.

import { Portkey } from 'portkey-ai';

const portkey = new Portkey({
  apiKey: process.env['PORTKEYAI_API_KEY']
});

2. Create Configs: Loadbalance with Nested Fallbacks

Portkey acts as AI gateway to all of your requests to LLMs. It follows the OpenAI SDK signature in all of it’s methods and interfaces making it easy to use and switch. Here is an example of an chat completions requests through Portkey.

const response = await portkey.chat.completions.create({
  messages,
  model: 'gpt-3.5-turbo'
});

The Portkey AI gateway can apply our desired behaviour to the requests to various LLMs. In a nutshell, our desired behaviour is the following:

Lucky for us, all of this can implemented by passing a configs allowing us to express what behavior to apply to every request through the Portkey AI gateway.

const config = {
  strategy: {
    mode: 'loadbalance'
  },
  targets: [
    {
      virtual_key: process.env['ANTHROPIC_VIRTUAL_KEY'],
      weight: 0.5,
      override_params: {
        max_tokens: 200,
        model: 'claude-3-opus-20240229'
      }
    },
    {
      strategy: {
        mode: 'fallback'
      },
      targets: [
        {
          virtual_key: process.env['OPENAI_VIRTUAL_KEY']
        },
        {
          virtual_key: process.env['AZURE_OPENAI_VIRTUAL_KEY']
        }
      ],
      weight: 0.5
    }
  ]
};

const portkey = new Portkey({
  apiKey: process.env['PORTKEYAI_API_KEY'],
  config // pass configs as argument
});

We apply the loadbalance strategy across Anthropic and OpenAI. weight describes the traffic should be split into 50/50 among both the LLM providers while override_params will help us override the defaults.

Let’s take this a step further to apply a fallback mechanism for the requests from* OpenAI* to fallback to Azure OpenAI. This nested mechanism among the targets will ensure our app is reliable in the production in great confidence.

3. Make a Request

Now that the config ‘s are concrete and are passed as arguments when instantiating the Portkey client instance, all subsequent will acquire desired behavior auto-magically — No additional changes to the codebase.

const messages = [
  {
    role: 'system',
    content: 'You are a very helpful assistant.'
  },
  {
    role: 'user',
    content: 'What are 7 wonders in the world?'
  }
];

const response = await portkey.chat.completions.create({
  messages,
  model: 'gpt-3.5-turbo'
});

console.log(response.choices[0].message.content);
// The Seven Wonders of the Ancient World are:

Next, we will examine how to identify load-balanced requests or those that have been executed as fallbacks.

4. Trace the request from the logs

It can be challenging to identify particular requests from the thousands that are received every day, similar to trying to find a needle in a haystack. However, Portkey offers a solution by enabling us to attach a desired trace ID. Here request-loadbalance-fallback.

const response = await portkey.chat.completions.create(
  {
    messages,
    model: 'gpt-3.5-turbo'
  },
  {
    traceID: 'request-loadbalance-fallback'
  }
);

This trace ID can be used to filter requests from the Portkey Dashboard (>Logs) easily.

In addition to activating Loadbalance (icon), the logs provide essential observability information, including tokens, cost, and model.

5. Advanced: Canary Testing

Given there are new models coming every day and your app is in production — What is the best way to try the quality of those models? Canary Testing allows you to gradually roll out a change to a small subset of users before making it available to everyone.

Consider this scenario: You have been using OpenAI as your LLM provider for a while now, but are considering trying an open-source Llama model for your app through Anyscale.

const config = {
  strategy: {
    mode: 'loadbalance'
  },
  targets: [
    {
      virtual_key: process.env['OPENAI_VIRTUAL_KEY'],
      weight: 0.9
    },
    {
      virtual_key: process.env['ANYSCALE_VIRTUAL_KEY'],
      weight: 0.1,
      override_params: {
        model: 'meta-llama/Llama-2-70b-chat-hf'
      }
    }
  ]
};

const portkey = new Portkey({
  apiKey: process.env['PORTKEYAI_API_KEY'],
  config
});

const response = await portkey.chat.completions.create(
  {
    messages,
    model: 'gpt-3.5-turbo'
  },
  {
    traceID: 'canary-testing'
  }
);

console.log(response.choices[0].message.content);

The weight , indication of traffic is split to have 10% of your user-base are served from Anyscale’s Llama models. Now, you are all set up to get feedback and observe the performance of your app and release increasingly to larger userbase.

Considerations

You can implement production-grade Loadbalancing and nested fallback mechanisms with just a few lines of code. While you are equipped with all the tools for your next GenAI app, here are a few considerations:

  • Every request has to adhere to the LLM provider’s requirements for it to be successful. For instance, max_tokens is required for Anthropic and not for OpenAI.

  • While loadbalance helps reduce the load on one LLM - it is recommended to pair it with a Fallback strategy to ensure that your app stays reliable

  • On Portkey, you can also pass the loadbalance weight as 0 - this will essentially stop routing requests to that target and you can amp it up when required

  • Loadbalance has no target limits as such, so you can potentially add multiple account details from one provider and effectively multiply your available rate limits

  • Loadbalance does not alter the outputs or the latency of the requests in any way

Happy Coding!

See the entire code
import { Portkey } from 'portkey-ai';

const config = {
  strategy: {
    mode: 'loadbalance'
  },
  targets: [
    {
      virtual_key: process.env['ANTHROPIC_VIRTUAL_KEY'],
      weight: 0.5,
      override_params: {
        max_tokens: 200,
        model: 'claude-2.1'
      }
    },
    {
      strategy: {
        mode: 'fallback'
      },
      targets: [
        {
          virtual_key: process.env['OPENAI_VIRTUAL_KEY']
        },
        {
          virtual_key: process.env['AZURE_OPENAI_VIRTUAL_KEY']
        }
      ],
      weight: 0.5
    }
  ]
};

const portkey = new Portkey({
  apiKey: process.env['PORTKEYAI_API_KEY'],
  config
});

const messages = [
  {
    role: 'system',
    content: 'You are a very helpful assistant.'
  },
  {
    role: 'user',
    content: 'What are 7 wonders in the world?'
  }
];

const response = await portkey.chat.completions.create(
  {
    messages,
    model: 'gpt-3.5-turbo'
  },
  {
    traceID: 'request-loadbalance-fallback'
  }
);

console.log(response.choices[0].message.content);

See the documentation for Portkey and .

Are the configs growing and becoming harder to manage in the code? and reference the configs ID in your code. It will make it significantly easier to maintain.

Portkey API Key
Portkey vault
Fallbacks
Loadbalancing
Try creating them from Portkey UI