Request Timeouts
Last updated
Was this helpful?
Last updated
Was this helpful?
This feature is available on all Portkey .
Manage unpredictable LLM latencies effectively with Portkey's Request Timeouts. This feature allows automatic termination of requests that exceed a specified duration, letting you gracefully handle errors or make another, faster request.
You can enable request timeouts while making your request or you can set them in Configs.
Set request timeout while instantiating your Portkey client or if you're using the REST API, send the x-portkey-request-timeout
header.
In Configs, request timeouts are set at either (1) strategy level, or (2) target level.
For a 10-second timeout, it will be:
Here, the request timeout of 10 seconds will be applied to *all* the targets in this Config.
Here, for the first target, a request timeout of 10s will be set, while for the second target, a request timeout of 2s will be set.
We've set a global timeout of 2s at line #3
The first target has a nested fallback strategy, with a top level request timeout of 5s at line #7
The first virtual key (at line #10), the target-level timeout of 5s will be applied
For the second virtual key (i.e. open-ai-1-2
), there is a timeout override, set at 10s, which will be applied only to this target
For the last target (i.e. virtual key azure-open-ai-1
), the top strategy-level timeout of 2s will be applied
Portkey issues a standard 408 error for timed-out requests. You can leverage this by setting up fallback or retry strategies through the on_status_codes
parameter, ensuring robust handling of these scenarios.
Here, fallback from OpenAI to Azure OpenAI will only be triggered if the first request times out after 2 seconds, otherwise the request will fail with a 408 error code.
Here, retry is triggered upto 3 times whenever the request takes more than 1s to return a response. After 3 unsuccessful retries, it will fail with a 408 code.
While the request timeout is a powerful feature to help you gracefully handle unruly models & their latencies, there are a few things to consider:
Ensure that you are setting reasonable timeouts - for example, models like gpt-4
often have sub-10-second response times
Ensure that you gracefully handle 408 errors for whenever a request does get timed out - you can inform the user to rerun their query and setup some neat interactions on your app
For streaming requests, the timeout will not be triggered if it gets atleast a chunk before the specified duration.