1. Overview of Rate Limits

1.1 What are Rate Limits?

Rate Limits refer to the rules governing the frequency of API access to the SiliconCloud platform services within a specified time frame.

1.2 Why Implement Rate Limits?

Rate Limits are a common practice for APIs, implemented for the following reasons:

  • Ensuring fairness and optimal resource utilization: To ensure fair use of resources and prevent excessive requests from some users that may affect the experience of others.
  • Preventing request overload: To enhance service reliability by managing the platform’s overall load and avoiding performance issues caused by sudden surges in requests.
  • Security protection: To prevent malicious attacks that could overload the platform or even cause service interruptions.

1.3 Rate Limit Metrics

Currently, Rate Limits are measured using seven metrics:

  • RPM (Requests per minute): Maximum number of requests allowed per minute.
  • RPH (Requests per hour): Maximum number of requests allowed per hour.
  • RPD (Requests per day): Maximum number of requests allowed per day.
  • TPM (Tokens per minute): Maximum number of tokens allowed per minute.
  • TPD (Tokens per day): Maximum number of tokens allowed per day.
  • IPM (Images per minute): Maximum number of images that can be generated per minute.
  • IPD (Images per day): Maximum number of images that can be generated per day.

1.4 Rate Limits for Different Models

Model NameRate Limit MetricsCurrent Metrics
Language Models (Chat)RPM, TPMRPM: 1000-10000, TPM: 50000-5000000
Vector Models (Embedding)RPM, TPMRPM: 2000-10000, TPM: 500000-10000000
Reranking Models (Reranker)RPM, TPMRPM: 2000, TPM: 500000
Image Generation Models (Image)IPM, IPDIPM: 2-, IPD: 400-
Multimodal Models--

Rate Limits may be triggered when any of the options (RPM, RPH, RPD, TPM, TPD, IPM, IPD) reaches its peak, depending on which occurs first. For example, if the RPM limit is 20 and the TPM limit is 200K, and an account sends 20 requests to ChatCompletions within one minute, each containing 100 tokens, the limit will be triggered even if the total tokens in these 20 requests do not reach 200K.

1.5 Rate Limit Scope

  1. Rate Limits are defined at the user account level, not at the API key level.
  2. Each model has separate Rate Limits, meaning that exceeding the Rate Limits for one model does not affect the normal usage of other models.

2. Rate Limit Rules

  • The Rate Limits for free models are fixed, while paid models have different Rate Limit metrics based on account usage tiers.
  • Within the same usage tier, the peak Rate Limits vary depending on the model type and parameter size.

2.1 Model Rate Limits

  1. Pay-as-you-go: API calls are counted towards the account’s billing statement.
  2. Rate Limits are tiered based on account usage levels. Peak Rate Limits increase as the usage tier rises.
  3. Within the same usage tier, the peak Rate Limits vary depending on the model type and the size of the model’s parameters.

2.3 User Usage Levels and Rate Limits

The platform categorizes accounts into different usage tiers based on their monthly spending, with each tier having its own Rate Limit standards. When the monthly spending reaches the standard for a higher tier, the account is automatically upgraded to the corresponding usage tier. The upgrade takes effect immediately, providing more lenient Rate Limits.

  • Monthly Spending: Includes both recharged and gifted amounts in the account’s total monthly spending.
  • Tier Settings: Compares the previous calendar month’s spending with the spending from the 1st of the current month to today, and uses the higher value to determine the corresponding usage tier. New users start at the initial usage tier of L0.
TierRPMTPM
L01,00040,000
L11,20060,000
L22,00080,000
L34,000160,000
L48,000500,000
L510,0002,000,000

2.4 Specific Model Rate Limits

The platform currently provides five categories: text generation, image generation, vectorization, reranking, and speech. Specific Rate Limit metrics for each model can be found on the Models.

2.5 Rate Limits for deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-V3:

  1. Added RPH Limit (Requests Per Hour):

    • Applicable Models: deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
    • Applicable Users: All users
    • Limit: 30 requests/hour
  2. Added RPD Limit (Requests Per Day):

    • Applicable Models: deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3
    • Applicable Users: Users who have not completed identity verification
    • Limit: 100 requests/day

The strategy may be adjusted periodically based on traffic and load changes. SiliconFlow reserves the right of final interpretation.

3. Handling Exceeding Rate Limits

3.1 Error Messages for Exceeding Rate Limits

If API requests exceed the Rate Limits, they will fail due to the limits being exceeded. Users must wait for a period of time until the Rate Limits are satisfied before making additional requests. The corresponding HTTP error message is:

    HTTP/1.1 429
    Too Many Requests
    Content Type: application/json
    Request was rejected due to rate limiting. If you want more, please contact contact@siliconflow.com

3.2 Methods to Handle Exceeding Rate Limits

  • Within the existing Rate Limits, users can refer to Handling Rate Limit Exceedance for error mitigation.
  • Alternatively, users can increase their usage tier to raise the peak Rate Limits for achieving business goals.