1. Garbled Output from the Model

Currently, some models may produce garbled output when parameters are not set. In such cases, you can try setting parameters like temperature, top_k, top_p, and frequency_penalty.

Modify the corresponding payload as follows, adjusting for different languages as needed:

    payload = {
        "model": "Qwen/Qwen2.5-Math-72B-Instruct",
        "messages": [
            {
                "role": "user",
                "content": "1+1=?",
            }
        ],
        "max_tokens": 200,  # Add as needed
        "temperature": 0.7, # Add as needed
        "top_k": 50,        # Add as needed
        "top_p": 0.7,       # Add as needed
        "frequency_penalty": 0 # Add as needed
    }

2. Explanation of max_tokens

For the LLM models provided by the platform:

  • Models with a max_tokens limit of 16384:

    • deepseek-ai/DeepSeek-R1
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Models with a max_tokens limit of 8192:

    • Qwen/QwQ-32B-Preview
  • Models with a max_tokens limit of 4096:

    • All other LLM models not mentioned above

If you have special requirements, it is recommended to send feedback via email to contact@siliconflow.com.

3. Explanation of context_length

The context_length varies across different LLM models. You can search for the specific model on the Models page to view detailed information about the model.

4. Explanation of 429 Error for DeepSeek-R1 and DeepSeek-V3 Models

  1. Unverified Users: Limited to 100 requests per day. If the daily request limit exceeds 100, a 429 error will be returned with the message: “Details: RPD limit reached. Could only send 100 requests per day without real name verification.” You can unlock higher rate limits through real-name verification.

  2. Verified Users: Have higher rate limits. Refer to the Models page for specific values.

    If the request limit is exceeded, a 429 error will also be returned.

5. Differences Between Pro and Non-Pro Models

  1. For some models, the platform provides both free and paid versions. The free version retains its original name, while the paid version is prefixed with “Pro/” to distinguish them. The rate limits for the free version are fixed, while those for the paid version are variable. Refer to Rate Limits for specific rules.

  2. For DeepSeek R1 and DeepSeek V3 models, the naming differs based on the payment method. The Pro version only supports payment through “recharge balance,” while the non-Pro version supports payment through both “gifted balance” and “recharge balance.”

6. Are There Any Time or Quality Requirements for Custom Voice Uploads in Speech Models?

  • cosyvoice2: The uploaded voice must be less than 30 seconds.
  • GPT-SoVITS: The uploaded voice should be between 3–10 seconds.
  • fishaudio: No specific restrictions.

To ensure optimal voice generation results, it is recommended to upload a voice sample of around 8–10 seconds, with clear pronunciation and no noise or background sounds.

7. Troubleshooting Truncated Model Output

You can troubleshoot truncated output from the following perspectives:

  • For API requests:
    • max_tokens setting: Set max_tokens to an appropriate value. If the output exceeds the max_tokens value, it will be truncated. The maximum max_tokens value for the DeepSeek R1 series is 16384.
    • Enable streaming output requests: For non-streaming requests, long outputs may result in a 504 timeout.
    • Set client timeout: Increase the client timeout to prevent truncation due to the timeout being reached before output completion.
  • For third-party client requests:
    • CherryStdio: The default max_tokens is 4096. Users can adjust this by enabling the “Message Length Limit” switch and setting max_tokens to an appropriate value.
For any other issues, please send an email to help@siliconflow.com.