Portkey Docs
HomeAPIIntegrationsChangelog
  • Introduction
    • What is Portkey?
    • Make Your First Request
    • Feature Overview
  • Integrations
    • LLMs
      • OpenAI
        • Structured Outputs
        • Prompt Caching
      • Anthropic
        • Prompt Caching
      • Google Gemini
      • Groq
      • Azure OpenAI
      • AWS Bedrock
      • Google Vertex AI
      • Bring Your Own LLM
      • AI21
      • Anyscale
      • Cerebras
      • Cohere
      • Fireworks
      • Deepbricks
      • Deepgram
      • Deepinfra
      • Deepseek
      • Google Palm
      • Huggingface
      • Inference.net
      • Jina AI
      • Lingyi (01.ai)
      • LocalAI
      • Mistral AI
      • Monster API
      • Moonshot
      • Nomic
      • Novita AI
      • Ollama
      • OpenRouter
      • Perplexity AI
      • Predibase
      • Reka AI
      • SambaNova
      • Segmind
      • SiliconFlow
      • Stability AI
      • Together AI
      • Voyage AI
      • Workers AI
      • ZhipuAI / ChatGLM / BigModel
      • Suggest a new integration!
    • Agents
      • Autogen
      • Control Flow
      • CrewAI
      • Langchain Agents
      • LlamaIndex
      • Phidata
      • Bring Your own Agents
    • Libraries
      • Autogen
      • DSPy
      • Instructor
      • Langchain (Python)
      • Langchain (JS/TS)
      • LlamaIndex (Python)
      • LibreChat
      • Promptfoo
      • Vercel
        • Vercel [Depricated]
  • Product
    • Observability (OpenTelemetry)
      • Logs
      • Tracing
      • Analytics
      • Feedback
      • Metadata
      • Filters
      • Logs Export
      • Budget Limits
    • AI Gateway
      • Universal API
      • Configs
      • Multimodal Capabilities
        • Image Generation
        • Function Calling
        • Vision
        • Speech-to-Text
        • Text-to-Speech
      • Cache (Simple & Semantic)
      • Fallbacks
      • Automatic Retries
      • Load Balancing
      • Conditional Routing
      • Request Timeouts
      • Canary Testing
      • Virtual Keys
        • Budget Limits
    • Prompt Library
      • Prompt Templates
      • Prompt Partials
      • Retrieve Prompts
      • Advanced Prompting with JSON Mode
    • Guardrails
      • List of Guardrail Checks
        • Patronus AI
        • Aporia
        • Pillar
        • Bring Your Own Guardrails
      • Creating Raw Guardrails (in JSON)
    • Autonomous Fine-tuning
    • Enterprise Offering
      • Org Management
        • Organizations
        • Workspaces
        • User Roles & Permissions
        • API Keys (AuthN and AuthZ)
      • Access Control Management
      • Budget Limits
      • Security @ Portkey
      • Logs Export
      • Private Cloud Deployments
        • Architecture
        • AWS
        • GCP
        • Azure
        • Cloudflare Workers
        • F5 App Stack
      • Components
        • Log Store
          • MongoDB
    • Open Source
    • Portkey Pro & Enterprise Plans
  • API Reference
    • Introduction
    • Authentication
    • OpenAPI Specification
    • Headers
    • Response Schema
    • Gateway Config Object
    • SDK
  • Provider Endpoints
    • Supported Providers
    • Chat
    • Embeddings
    • Images
      • Create Image
      • Create Image Edit
      • Create Image Variation
    • Audio
      • Create Speech
      • Create Transcription
      • Create Translation
    • Fine-tuning
      • Create Fine-tuning Job
      • List Fine-tuning Jobs
      • Retrieve Fine-tuning Job
      • List Fine-tuning Events
      • List Fine-tuning Checkpoints
      • Cancel Fine-tuning
    • Batch
      • Create Batch
      • List Batch
      • Retrieve Batch
      • Cancel Batch
    • Files
      • Upload File
      • List Files
      • Retrieve File
      • Retrieve File Content
      • Delete File
    • Moderations
    • Assistants API
      • Assistants
        • Create Assistant
        • List Assistants
        • Retrieve Assistant
        • Modify Assistant
        • Delete Assistant
      • Threads
        • Create Thread
        • Retrieve Thread
        • Modify Thread
        • Delete Thread
      • Messages
        • Create Message
        • List Messages
        • Retrieve Message
        • Modify Message
        • Delete Message
      • Runs
        • Create Run
        • Create Thread and Run
        • List Runs
        • Retrieve Run
        • Modify Run
        • Submit Tool Outputs to Run
        • Cancel Run
      • Run Steps
        • List Run Steps
        • Retrieve Run Steps
    • Completions
    • Gateway for Other API Endpoints
  • Portkey Endpoints
    • Configs
      • Create Config
      • List Configs
      • Retrieve Config
      • Update Config
    • Feedback
      • Create Feedback
      • Update Feedback
    • Guardrails
    • Logs
      • Insert a Log
      • Log Exports [BETA]
        • Retrieve a Log Export
        • Update a Log Export
        • List Log Exports
        • Create a Log Export
        • Start a Log Export
        • Cancel a Log Export
        • Download a Log Export
    • Prompts
      • Prompt Completion
      • Render
    • Virtual Keys
      • Create Virtual Key
      • List Virtual Keys
      • Retrieve Virtual Key
      • Update Virtual Key
      • Delete Virtual Key
    • Analytics
      • Graphs - Time Series Data
        • Get Requests Data
        • Get Cost Data
        • Get Latency Data
        • Get Tokens Data
        • Get Users Data
        • Get Requests per User
        • Get Errors Data
        • Get Error Rate Data
        • Get Status Code Data
        • Get Unique Status Code Data
        • Get Rescued Requests Data
        • Get Cache Hit Rate Data
        • Get Cache Hit Latency Data
        • Get Feedback Data
        • Get Feedback Score Distribution Data
        • Get Weighted Feeback Data
        • Get Feedback Per AI Models
      • Summary
        • Get All Cache Data
      • Groups - Paginated Data
        • Get User Grouped Data
        • Get Model Grouped Data
        • Get Metadata Grouped Data
    • API Keys [BETA]
      • Update API Key
      • Create API Key
      • Delete an API Key
      • Retrieve an API Key
      • List API Keys
    • Admin
      • Users
        • Retrieve a User
        • Retrieve All Users
        • Update a User
        • Remove a User
      • User Invites
        • Invite a User
        • Retrieve an Invite
        • Retrieve All User Invites
        • Delete a User Invite
      • Workspaces
        • Create Workspace
        • Retrieve All Workspaces
        • Retrieve a Workspace
        • Update Workspace
        • Delete a Workspace
      • Workspace Members
        • Add a Workspace Member
        • Retrieve All Workspace Members
        • Retrieve a Workspace Member
        • Update Workspace Member
        • Remove Workspace Member
  • Guides
    • Getting Started
      • A/B Test Prompts and Models
      • Tackling Rate Limiting
      • Function Calling
      • Image Generation
      • Getting started with AI Gateway
      • Llama 3 on Groq
      • Return Repeat Requests from Cache
      • Trigger Automatic Retries on LLM Failures
      • 101 on Portkey's Gateway Configs
    • Integrations
      • Llama 3 on Portkey + Together AI
      • Introduction to GPT-4o
      • Anyscale
      • Mistral
      • Vercel AI
      • Deepinfra
      • Groq
      • Langchain
      • Mixtral 8x22b
      • Segmind
    • Use Cases
      • Few-Shot Prompting
      • Enforcing JSON Schema with Anyscale & Together
      • Detecting Emotions with GPT-4o
      • Build an article suggestion app with Supabase pgvector, and Portkey
      • Setting up resilient Load balancers with failure-mitigating Fallbacks
      • Run Portkey on Prompts from Langchain Hub
      • Smart Fallback with Model-Optimized Prompts
      • How to use OpenAI SDK with Portkey Prompt Templates
      • Setup OpenAI -> Azure OpenAI Fallback
      • Fallback from SDXL to Dall-e-3
      • Comparing Top10 LMSYS Models with Portkey
      • Build a chatbot using Portkey's Prompt Templates
  • Support
    • Contact Us
    • Developer Forum
    • Common Errors & Resolutions
    • December '23 Migration
    • Changelog
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Guides
  2. Use Cases

Comparing Top10 LMSYS Models with Portkey

PreviousFallback from SDXL to Dall-e-3NextBuild a chatbot using Portkey's Prompt Templates

Last updated 1 year ago

Was this helpful?

The , with over 1,000,000 human comparisons, is the gold standard for evaluating LLM performance.

But, testing multiple LLMs is a pain, requiring you to juggle APIs that all work differently, with different authentication and dependencies.

Enter Portkey: A unified, open source API for accessing over 200 LLMs. Portkey makes it a breeze to call the models on the LMSYS leaderboard - no setup required.


In this notebook, you'll see how Portkey streamlines LLM evaluation for the Top 10 LMSYS Models, giving you valuable insights into cost, performance, and accuracy metrics.

Let's dive in!


Video Guide

The notebook comes with a video guide that you can follow along

Setting up Portkey

To get started, install the necessary packages:

!pip install -qU portkey-ai openai

Next, sign up for a Portkey API key at https://app.portkey.ai/. Navigate to "Settings" -> "API Keys" and create an API key with the appropriate scope.

Defining the Top 10 LMSYS Models

Let's define the list of Top 10 LMSYS models and their corresponding providers.

top_10_models = [
    ["gpt-4o-2024-05-13", "openai"],
    ["gemini-1.5-pro-latest", "google"],
##  ["gemini-advanced-0514","google"],             # This model is not available on a public API
    ["gpt-4-turbo-2024-04-09", "openai"],
    ["gpt-4-1106-preview","openai"],
    ["claude-3-opus-20240229", "anthropic"],
    ["gpt-4-0125-preview","openai"],
##  ["yi-large-preview","01-ai"],                  # This model is not available on a public API
    ["gemini-1.5-flash-latest", "google"],
    ["gemini-1.0-pro", "google"],
    ["meta-llama/Llama-3-70b-chat-hf", "together"],
    ["claude-3-sonnet-20240229", "anthropic"],
    ["reka-core-20240501","reka-ai"],
    ["command-r-plus", "cohere"],
    ["gpt-4-0314", "openai"],
    ["glm-4","zhipu"],
##  ["qwen-max-0428","qwen"]                       # This model is not available outside of China
]

Add Provider API Keys to Portkey Vault

ALL the providers above are integrated with Portkey - which means, you can add their API keys to Portkey vault and get a corresponding Virtual Key and streamline API key management.

Provider
Link to get API Key
Payment Mode

openai

https://platform.openai.com/

Wallet Top Up

anthropic

https://console.anthropic.com/

Wallet Top Up

google

https://aistudio.google.com/

💰 Free to Use

cohere

https://dashboard.cohere.com/

💰 Free Credits

together-ai

https://api.together.ai/

💰 Free Credits

reka-ai

https://platform.reka.ai/

Wallet Top Up

zhipu

https://open.bigmodel.cn/

💰 Free to Use

## Replace the virtual keys below with your own

virtual_keys = {
    "openai": "openai-new-c99d32",
    "anthropic": "anthropic-key-a0b3d7",
    "google": "google-66c0ed",
    "cohere": "cohere-ab97e4",
    "together": "together-ai-dada4c",
    "reka-ai":"reka-54f5b5",
    "zhipu":"chatglm-ba1096"
}

Running the Models with Portkey

Now, let's create a function to run the Top 10 LMSYS models using OpenAI SDK with Portkey Gateway:

from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

def run_top10_lmsys_models(prompt):
    outputs = {}

    for model, provider in top_10_models:
        portkey = OpenAI(
            api_key = "dummy_key",
            base_url = PORTKEY_GATEWAY_URL,
            default_headers = createHeaders(
                api_key="YOUR_PORTKEY_API_KEY",                 # Grab from https://app.portkey.ai/
                virtual_key = virtual_keys[provider],
                trace_id="COMPARING_LMSYS_MODELS"
            )
        )

        response = portkey.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model=model,
            max_tokens=256
        )

        outputs[model] = response.choices[0].message.content

    return outputs

Comparing Model Outputs

To display the model outputs in a tabular format for easy comparison, we define the print_model_outputs function:

from tabulate import tabulate

def print_model_outputs(prompt):
    outputs = run_top10_lmsys_models(prompt)

    table_data = []
    for model, output in outputs.items():
        table_data.append([model, output.strip()])

    headers = ["Model", "Output"]
    table = tabulate(table_data, headers, tablefmt="grid")
    print(table)
    print()

Example: Evaluating LLMs for a Specific Task

Let's run the notebook with a specific prompt to showcase the differences in responses from various LLMs:

prompt = "If 20 shirts take 5 hours to dry, how much time will 100 shirts take to dry?"

print_model_outputs(prompt)

Conclusion

With minimal setup and code modifications, Portkey enables you to streamline your LLM evaluation process and easily call 200+ LLMs to find the best model for your specific use case.

Explore Portkey further and integrate it into your own projects. Visit the Portkey documentation at https://docs.portkey.ai/ for more information on how to leverage Portkey's capabilities in your workflow.

On Portkey, you will be able to see the logs for all models:

LMSYS Chatbot Arena