Portkey Docs
HomeAPIIntegrationsChangelog
  • Introduction
    • What is Portkey?
    • Make Your First Request
    • Feature Overview
  • Integrations
    • LLMs
      • OpenAI
        • Structured Outputs
        • Prompt Caching
      • Anthropic
        • Prompt Caching
      • Google Gemini
      • Groq
      • Azure OpenAI
      • AWS Bedrock
      • Google Vertex AI
      • Bring Your Own LLM
      • AI21
      • Anyscale
      • Cerebras
      • Cohere
      • Fireworks
      • Deepbricks
      • Deepgram
      • Deepinfra
      • Deepseek
      • Google Palm
      • Huggingface
      • Inference.net
      • Jina AI
      • Lingyi (01.ai)
      • LocalAI
      • Mistral AI
      • Monster API
      • Moonshot
      • Nomic
      • Novita AI
      • Ollama
      • OpenRouter
      • Perplexity AI
      • Predibase
      • Reka AI
      • SambaNova
      • Segmind
      • SiliconFlow
      • Stability AI
      • Together AI
      • Voyage AI
      • Workers AI
      • ZhipuAI / ChatGLM / BigModel
      • Suggest a new integration!
    • Agents
      • Autogen
      • Control Flow
      • CrewAI
      • Langchain Agents
      • LlamaIndex
      • Phidata
      • Bring Your own Agents
    • Libraries
      • Autogen
      • DSPy
      • Instructor
      • Langchain (Python)
      • Langchain (JS/TS)
      • LlamaIndex (Python)
      • LibreChat
      • Promptfoo
      • Vercel
        • Vercel [Depricated]
  • Product
    • Observability (OpenTelemetry)
      • Logs
      • Tracing
      • Analytics
      • Feedback
      • Metadata
      • Filters
      • Logs Export
      • Budget Limits
    • AI Gateway
      • Universal API
      • Configs
      • Multimodal Capabilities
        • Image Generation
        • Function Calling
        • Vision
        • Speech-to-Text
        • Text-to-Speech
      • Cache (Simple & Semantic)
      • Fallbacks
      • Automatic Retries
      • Load Balancing
      • Conditional Routing
      • Request Timeouts
      • Canary Testing
      • Virtual Keys
        • Budget Limits
    • Prompt Library
      • Prompt Templates
      • Prompt Partials
      • Retrieve Prompts
      • Advanced Prompting with JSON Mode
    • Guardrails
      • List of Guardrail Checks
        • Patronus AI
        • Aporia
        • Pillar
        • Bring Your Own Guardrails
      • Creating Raw Guardrails (in JSON)
    • Autonomous Fine-tuning
    • Enterprise Offering
      • Org Management
        • Organizations
        • Workspaces
        • User Roles & Permissions
        • API Keys (AuthN and AuthZ)
      • Access Control Management
      • Budget Limits
      • Security @ Portkey
      • Logs Export
      • Private Cloud Deployments
        • Architecture
        • AWS
        • GCP
        • Azure
        • Cloudflare Workers
        • F5 App Stack
      • Components
        • Log Store
          • MongoDB
    • Open Source
    • Portkey Pro & Enterprise Plans
  • API Reference
    • Introduction
    • Authentication
    • OpenAPI Specification
    • Headers
    • Response Schema
    • Gateway Config Object
    • SDK
  • Provider Endpoints
    • Supported Providers
    • Chat
    • Embeddings
    • Images
      • Create Image
      • Create Image Edit
      • Create Image Variation
    • Audio
      • Create Speech
      • Create Transcription
      • Create Translation
    • Fine-tuning
      • Create Fine-tuning Job
      • List Fine-tuning Jobs
      • Retrieve Fine-tuning Job
      • List Fine-tuning Events
      • List Fine-tuning Checkpoints
      • Cancel Fine-tuning
    • Batch
      • Create Batch
      • List Batch
      • Retrieve Batch
      • Cancel Batch
    • Files
      • Upload File
      • List Files
      • Retrieve File
      • Retrieve File Content
      • Delete File
    • Moderations
    • Assistants API
      • Assistants
        • Create Assistant
        • List Assistants
        • Retrieve Assistant
        • Modify Assistant
        • Delete Assistant
      • Threads
        • Create Thread
        • Retrieve Thread
        • Modify Thread
        • Delete Thread
      • Messages
        • Create Message
        • List Messages
        • Retrieve Message
        • Modify Message
        • Delete Message
      • Runs
        • Create Run
        • Create Thread and Run
        • List Runs
        • Retrieve Run
        • Modify Run
        • Submit Tool Outputs to Run
        • Cancel Run
      • Run Steps
        • List Run Steps
        • Retrieve Run Steps
    • Completions
    • Gateway for Other API Endpoints
  • Portkey Endpoints
    • Configs
      • Create Config
      • List Configs
      • Retrieve Config
      • Update Config
    • Feedback
      • Create Feedback
      • Update Feedback
    • Guardrails
    • Logs
      • Insert a Log
      • Log Exports [BETA]
        • Retrieve a Log Export
        • Update a Log Export
        • List Log Exports
        • Create a Log Export
        • Start a Log Export
        • Cancel a Log Export
        • Download a Log Export
    • Prompts
      • Prompt Completion
      • Render
    • Virtual Keys
      • Create Virtual Key
      • List Virtual Keys
      • Retrieve Virtual Key
      • Update Virtual Key
      • Delete Virtual Key
    • Analytics
      • Graphs - Time Series Data
        • Get Requests Data
        • Get Cost Data
        • Get Latency Data
        • Get Tokens Data
        • Get Users Data
        • Get Requests per User
        • Get Errors Data
        • Get Error Rate Data
        • Get Status Code Data
        • Get Unique Status Code Data
        • Get Rescued Requests Data
        • Get Cache Hit Rate Data
        • Get Cache Hit Latency Data
        • Get Feedback Data
        • Get Feedback Score Distribution Data
        • Get Weighted Feeback Data
        • Get Feedback Per AI Models
      • Summary
        • Get All Cache Data
      • Groups - Paginated Data
        • Get User Grouped Data
        • Get Model Grouped Data
        • Get Metadata Grouped Data
    • API Keys [BETA]
      • Update API Key
      • Create API Key
      • Delete an API Key
      • Retrieve an API Key
      • List API Keys
    • Admin
      • Users
        • Retrieve a User
        • Retrieve All Users
        • Update a User
        • Remove a User
      • User Invites
        • Invite a User
        • Retrieve an Invite
        • Retrieve All User Invites
        • Delete a User Invite
      • Workspaces
        • Create Workspace
        • Retrieve All Workspaces
        • Retrieve a Workspace
        • Update Workspace
        • Delete a Workspace
      • Workspace Members
        • Add a Workspace Member
        • Retrieve All Workspace Members
        • Retrieve a Workspace Member
        • Update Workspace Member
        • Remove Workspace Member
  • Guides
    • Getting Started
      • A/B Test Prompts and Models
      • Tackling Rate Limiting
      • Function Calling
      • Image Generation
      • Getting started with AI Gateway
      • Llama 3 on Groq
      • Return Repeat Requests from Cache
      • Trigger Automatic Retries on LLM Failures
      • 101 on Portkey's Gateway Configs
    • Integrations
      • Llama 3 on Portkey + Together AI
      • Introduction to GPT-4o
      • Anyscale
      • Mistral
      • Vercel AI
      • Deepinfra
      • Groq
      • Langchain
      • Mixtral 8x22b
      • Segmind
    • Use Cases
      • Few-Shot Prompting
      • Enforcing JSON Schema with Anyscale & Together
      • Detecting Emotions with GPT-4o
      • Build an article suggestion app with Supabase pgvector, and Portkey
      • Setting up resilient Load balancers with failure-mitigating Fallbacks
      • Run Portkey on Prompts from Langchain Hub
      • Smart Fallback with Model-Optimized Prompts
      • How to use OpenAI SDK with Portkey Prompt Templates
      • Setup OpenAI -> Azure OpenAI Fallback
      • Fallback from SDXL to Dall-e-3
      • Comparing Top10 LMSYS Models with Portkey
      • Build a chatbot using Portkey's Prompt Templates
  • Support
    • Contact Us
    • Developer Forum
    • Common Errors & Resolutions
    • December '23 Migration
    • Changelog
Powered by GitBook
On this page
  • Here's an overview of rate limits imposed by various providers:
  • 1. Install Portkey SDK
  • 2. Fallback to Alternative LLMs
  • 3. Load Balance Among Multiple LLMs
  • That's it!

Was this helpful?

Edit on GitHub
  1. Guides
  2. Getting Started

Tackling Rate Limiting

LLMs are costly to run. As their usage increases, the providers have to balance serving user requests v/s straining their GPU resources too thin. They generally deal with this by putting rate limits on how many requests a user can send in a minute or in a day.

For example, for the text-to-speech model tts-1-hd from OpenAI, you can not send more than 7 requests in minute. Any extra request automatically fails.

There are many real-world use cases where it's possilbe to run into rate limits:

  • When your requests have very high input-tokens count or a very long context, you can hit token thresholds

  • When you are running a complex and long prompts pipeline that fires hundreds of requests at once, you can hit both token & request limits

Here's an overview of rate limits imposed by various providers:

LLM Provider
Example Model
Rate Limits

gpt-4

Tier 1:

  • 500 Requests per Minute

  • 10,000 Tokens per Minute

  • 10,000 Requests per Day

All models

Tier 1:

  • 50 RPM

  • 50,000 TPM

  • 1 Million Tokens per Day

Co.Generate models

Production Key:

  • 10,000 RPM

All models

Endpoints:

  • 30 concurrent requests

mixtral-8x7b-instruct

  • 24 RPM

  • 16,000 TPM

All models

Paid:

  • 100 RPM

Generally, developers tackle getting rate limiting with a few tricks: caching common responses, queuing the requests, reducing the number of requests sent, etc.

Using Portkey, you can solve this by just adding a few lines of code to your app once.

In this cookbook, we'll show you 2 ways of ensuring that your app never gets rate limited again:

1. Install Portkey SDK

npm i portkey-ai

Let's start by making a generic chat.completions call using the Portkey SDK:

import Portkey from "portkey-ai";

const portkey = new Portkey({
  apiKey: process.env.PORTKEYAI_API_KEY,
  virtualKey: process.env.OPENAI_VIRTUAL_KEY,
});

response = await portkey.chat.completions.create({
  messages: [ {role: "user", content: "Hello!"} ],
  model: "gpt-4",
});
console.log(response.choices[0].message.content);

To ensure your request doesn't get rate limited, we'll utilise Portkey's fallback & loadbalance features:

2. Fallback to Alternative LLMs

With Portkey, you can write a call routing strategy that helps you fallback from one provider to another provider in case of rate limit errors. This is done by passing a Config object while instantiating your Portkey client:

import Portkey from "portkey-ai";

const portkey = new Portkey({
   apiKey: process.env.PORTKEY_API_KEY,
   config: JSON.stringify({
      strategy: {
         mode: "fallback",
         on_status_codes:[429]
      },
      targets: [
         {
            provider: "openai",
            api_key: "OPENAI_API_KEY"
         },
         {
            provider: "anthropic",
            api_key: "ANTHROPIC_API_KEY",
            override_params: { "max_tokens":254 }
         }
      ],
   }),
});

In this Config object,

  • The routing strategy is set as fallback

  • on_status_codes param ensures that the fallback is only triggered on the 429 error code, which is generated for rate limit errors

  • targets array contains the details of the LLMs and the order of the fallback

  • The override_params in the second target lets you add more params for the specific provider. (max_tokens for Anthropic in this case)

That's it! The rest of your code remains the same. Based on the Config, Portkey will do the orchestration and ensure that Anthropic is called as a fallback option whenever you get rate limit errors on OpenAI.

Fallback is still reactive - it only gets triggered once you actually get an error. Portkey also lets you tackle rate limiting proactively:

3. Load Balance Among Multiple LLMs

Instead of sending all your requests to a single provider on a single account, you can split your traffic across multiple provider accounts using Portkey - this ensures that a single account does not get overburdened with requests and thus avoids rate limits. It is very easy to setup this "loadbalancing" using Portkey - just write the relevant loadbalance Config and pass it while instantiating your Portkey client once:

import Portkey from "portkey-ai";

const portkey = new Portkey({
   apiKey: process.env.PORTKEY_API_KEY,
   config: JSON.stringify({
      strategy: {
         mode: "loadbalance",
      },
      targets: [
         {
            provider: "openai", api_key: "OPENAI_KEY_1",
            weight: 1,
         },
         {
            provider: "openai", api_key: "OPENAI_KEY_2",
            weight: 1,
         },
         {
            provider: "openai", api_key: "OPENAI_KEY_3",
            weight: 1,
         },         
      ],
   }),
});

In this Config object:

  • The routing strategy is set as loadbalance

  • targets contain 3 different OpenAI API keys from 3 different accounts, all with equal weight - which means Portkey will split the traffic 1/3rd equally among the 3 keys

With Loadbalance on different accounts, you can effectively multiply your rate limits and make your app much more robust.

That's it!

PreviousA/B Test Prompts and ModelsNextFunction Calling

Last updated 9 months ago

Was this helpful?

You can read more on , and on . If you want a deep dive on how Configs work on Portkey, .

fallbacks here
loadbalance here
check out the docs here
OpenAI
Anthropic
Cohere
Anyscale
Perplexity AI
Together AI