Rate Limits

1. Overview of Rate Limits

1.1 What are Rate Limits?

Rate Limits refer to the rules governing the frequency of API access to the SiliconFlow platform services within a specified time frame.

1.2 Why Implement Rate Limits?

Rate Limits are a common practice for APIs, implemented for the following reasons:

Ensuring fairness and optimal resource utilization: To ensure fair use of resources and prevent excessive requests from some users that may affect the experience of others.
Preventing request overload: To enhance service reliability by managing the platform’s overall load and avoiding performance issues caused by sudden surges in requests.
Security protection: To prevent malicious attacks that could overload the platform or even cause service interruptions.

1.3 Rate Limit Metrics

Currently, Rate Limits are measured using seven metrics:

RPM (Requests per minute): Maximum number of requests allowed per minute.
RPH (Requests per hour): Maximum number of requests allowed per hour.
RPD (Requests per day): Maximum number of requests allowed per day.
TPM (Tokens per minute): Maximum number of tokens allowed per minute.
TPD (Tokens per day): Maximum number of tokens allowed per day.
IPM (Images per minute): Maximum number of images that can be generated per minute.
IPD (Images per day): Maximum number of images that can be generated per day.

1.4 Rate Limits for Different Models

Model Name	Rate Limit Metrics	Current Metrics
Language Models (Chat)	RPM, TPM	RPM: 1000-10000, TPM: 50000-5000000
Vector Models (Embedding)	RPM, TPM	RPM: 2000-10000, TPM: 500000-10000000
Reranking Models (Reranker)	RPM, TPM	RPM: 2000, TPM: 500000
Image Generation Models (Image)	IPM, IPD	IPM: 2-, IPD: 400-
Multimodal Models	-	-

Rate Limits may be triggered when any of the options (RPM, RPH, RPD, TPM, TPD, IPM, IPD) reaches its peak, depending on which occurs first. For example, if the RPM limit is 20 and the TPM limit is 200K, and an account sends 20 requests to ChatCompletions within one minute, each containing 100 tokens, the limit will be triggered even if the total tokens in these 20 requests do not reach 200K.

1.5 Rate Limit Scope

Rate Limits are defined at the user account level, not at the API key level.
Each model has separate Rate Limits, meaning that exceeding the Rate Limits for one model does not affect the normal usage of other models.

2. Rate Limit Rules

The Rate Limits for free models are fixed, while paid models have different Rate Limit metrics based on account usage tiers.
Within the same usage tier, the peak Rate Limits vary depending on the model type and parameter size.

2.1 Model Rate Limits

Pay-as-you-go: API calls are counted towards the account’s billing statement.
Rate Limits are tiered based on account usage levels. Peak Rate Limits increase as the usage tier rises.
Within the same usage tier, the peak Rate Limits vary depending on the model type and the size of the model’s parameters.

2.3 User Usage Levels and Rate Limits

The platform categorizes accounts into different usage tiers based on their monthly spending, with each tier having its own Rate Limit standards. When the monthly spending reaches the standard for a higher tier, the account is automatically upgraded to the corresponding usage tier. The upgrade takes effect immediately, providing more lenient Rate Limits.

Monthly Spending: Includes both recharged and gifted amounts in the account’s total monthly spending.
Tier Settings: Compares the previous calendar month’s spending with the spending from the 1st of the current month to today, and uses the higher value to determine the corresponding usage tier. New users start at the initial usage tier of L0.

Tier	RPM	TPM
L0	1,000	40,000
L1	1,200	60,000
L2	2,000	80,000
L3	4,000	160,000
L4	8,000	500,000
L5	10,000	2,000,000

2.4 Specific Model Rate Limits

The platform currently provides five categories: text generation, image generation, vectorization, reranking, and speech. Specific Rate Limit metrics for each model can be found on the Models.

2.5 Rate Limits for `deepseek-ai/DeepSeek-R1` and `deepseek-ai/DeepSeek-V3`:

Added RPH Limit (Requests Per Hour):
- Applicable Models: deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
- Applicable Users: All users
- Limit: 30 requests/hour
Added RPD Limit (Requests Per Day):
- Applicable Models: deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
- Applicable Users: Users who have not completed identity verification
- Limit: 100 requests/day

The strategy may be adjusted periodically based on traffic and load changes. SiliconFlow reserves the right of final interpretation.

3. Handling Exceeding Rate Limits

3.1 Error Messages for Exceeding Rate Limits

If API requests exceed the Rate Limits, they will fail due to the limits being exceeded. Users must wait for a period of time until the Rate Limits are satisfied before making additional requests. The corresponding HTTP error message is:

    HTTP/1.1 429
    Too Many Requests
    Content Type: application/json
    Request was rejected due to rate limiting. If you want more, please contact contact@siliconflow.com

3.2 Methods to Handle Exceeding Rate Limits

Within the existing Rate Limits, users can refer to Handling Rate Limit Exceedance for error mitigation.
Alternatively, users can increase their usage tier to raise the peak Rate Limits for achieving business goals.

GET STARTED

Capabilities

Features

1. Overview of Rate Limits

1.1 What are Rate Limits?

1.2 Why Implement Rate Limits?

1.3 Rate Limit Metrics

1.4 Rate Limits for Different Models

1.5 Rate Limit Scope

2. Rate Limit Rules

2.1 Model Rate Limits

2.3 User Usage Levels and Rate Limits

2.4 Specific Model Rate Limits

2.5 Rate Limits for `deepseek-ai/DeepSeek-R1` and `deepseek-ai/DeepSeek-V3`:

3. Handling Exceeding Rate Limits

3.1 Error Messages for Exceeding Rate Limits

3.2 Methods to Handle Exceeding Rate Limits

GET STARTED

Capabilities

Features

​1. Overview of Rate Limits

​1.1 What are Rate Limits?

​1.2 Why Implement Rate Limits?

​1.3 Rate Limit Metrics

​1.4 Rate Limits for Different Models

​1.5 Rate Limit Scope

​2. Rate Limit Rules

​2.1 Model Rate Limits

​2.3 User Usage Levels and Rate Limits

​2.4 Specific Model Rate Limits

​2.5 Rate Limits for deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-V3:

​3. Handling Exceeding Rate Limits

​3.1 Error Messages for Exceeding Rate Limits

​3.2 Methods to Handle Exceeding Rate Limits

1. Overview of Rate Limits

1.1 What are Rate Limits?

1.2 Why Implement Rate Limits?

1.3 Rate Limit Metrics

1.4 Rate Limits for Different Models

1.5 Rate Limit Scope

2. Rate Limit Rules

2.1 Model Rate Limits

2.3 User Usage Levels and Rate Limits

2.4 Specific Model Rate Limits

2.5 Rate Limits for `deepseek-ai/DeepSeek-R1` and `deepseek-ai/DeepSeek-V3`:

3. Handling Exceeding Rate Limits

3.1 Error Messages for Exceeding Rate Limits

3.2 Methods to Handle Exceeding Rate Limits