Skip to main content

Introduction

Load balancing is a paid feature. You can enable it through a paid SaaS subscription or an Enterprise license.
Model providers typically enforce rate limits on API access within a specific timeframe to ensure stability and fair use. For enterprise applications, a high volume of concurrent requests from a single credential can easily trigger these limits, disrupting user access. An effective solution is load balancing, which distributes request traffic across multiple model credentials. This prevents rate limit issues and single points of failure, ensuring business continuity and faster response times for all users. Dify employs a round-robin strategy for load balancing, sequentially routing model requests to each credential in the load balancing pool. If a credential hits a rate limit, it is temporarily removed from rotation for one minute to avoid futile retries.

Procedure

To configure load balancing for a model, follow these steps:
  1. In the model list, find the target model, click the corresponding Config, and select Load balancing.
  2. In the load balancing pool, click Add credential to select from existing credentials or add a new one.
Default Config refers to the default credential currently specified for that model.
If a credential has a higher quota or better performance, you can add it multiple times to increase its weight in the load balancing rotation, allowing it to handle a larger share of the request load.
Add credentials for load balancing
  1. Enable at least two credentials in the load balancing pool, then click Save. Models with load balancing enabled will be marked with a special icon.
Load balancing icon
When you switch from load-balancing mode back to the default single-credential mode, your load-balancing configuration is preserved for future use.