1. Core Capabilities of the Model

1.1 Basic Functions

  • Text Generation: Generate coherent natural language text based on context, supporting various writing styles and tones.
  • Semantic Understanding: Deeply analyze user intent, support multi-turn conversation management, ensuring coherence and accuracy in dialogues.
  • Knowledge Q&A: Cover a wide range of knowledge domains, including science, technology, culture, history, etc., providing accurate answers.
  • Code Assistance: Support code generation, explanation, and debugging for various mainstream programming languages (e.g., Python, Java, C++, etc.).

1.2 Advanced Capabilities

  • Long-text Processing: Support context windows ranging from 4k to 64k tokens, suitable for generating long documents and handling complex conversational scenarios.
  • Instruction Following: Accurately understand complex task instructions, such as “Compare Plan A and Plan B using a Markdown table.”
  • Style Control: Adjust output style through system prompts, supporting academic, conversational, poetic, and other styles.
  • Multi-modal Support: In addition to text generation, support tasks like image description and speech-to-text.

2. API Call Specifications

2.1 Basic Request Structure

You can perform end-to-end API requests using the OpenAI SDK.

2.2 Message Structure Explanation

Message TypeFunction DescriptionExample Content
systemModel instructions, defining the AI’s role and general behaviore.g., “You are a pediatrician with 10 years of experience.”
userUser input, passing the end user’s message to the modele.g., “How should a persistent fever in a toddler be treated?“
assistantModel-generated historical responses, providing examples of how it should respond to the current requeste.g., “I suggest measuring the temperature first…”

When you want the model to follow hierarchical instructions, message roles can help you achieve better outputs. However, they are not deterministic, so the best approach is to try different methods to see which yields optimal results.

3. Model Selection Guide

Visit the Models to filter language models supporting different functionalities using the filter options on the left. Learn about specific model details such as pricing, model size, maximum context length, and cost.

You can also experience the models in the Playground. Note that the Playground is only for model testing and does not retain historical conversation records. If you wish to save the conversation history, please do so manually. For more usage details, refer to the API Documentation.

4. Detailed Explanation of Core Parameters

4.1 Creativity Control

# Temperature parameter (0.0~2.0)   
temperature=0.5  # Balances creativity and reliability  

# Nucleus sampling (top_p)   
top_p=0.9  # Considers only the top 90% probability cumulative word set  

4.2 Output Limits

max_tokens=1000  # Maximum generation length per request  
stop=["\n##", "<|end|>"]  # Stop sequences; output halts when encountering these strings 
frequency_penalty=0.5  # Suppresses repetitive word usage (-2.0~2.0)  
stream=true # Controls whether the output is streamed; recommended for models with lengthy outputs to prevent timeouts

4.3 Common Issues with Language Model Scenarios

1. Model Output Garbled

Some models may produce garbled output if parameters are not set. To address this, try setting parameters like temperature, top_k, top_p, and frequency_penalty.

Corresponding payload adjustments for different languages:

payload = {
    "model": "Qwen/Qwen2.5-Math-72B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "1+1=?",
        }
    ],
    "max_tokens": 200,  # Adjust as needed
    "temperature": 0.7, # Adjust as needed
    "top_k": 50,        # Adjust as needed
    "top_p": 0.7,       # Adjust as needed
    "frequency_penalty": 0 # Adjust as needed
}

2. Explanation of max_tokens

For LLM models provided by the platform:

  • Models with a max_tokens limit of 16384 include:
    • deepseek-ai/DeepSeek-R1
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
    • deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Models with a max_tokens limit of 8192 include:
    • Qwen/QwQ-32B-Preview
    • deepseek-ai/DeepSeek-R1
  • Models with a max_tokens limit of 4096 include:
    • All other LLM models not mentioned above.

3. Explanation of context_length

The context_length varies across different LLM models. You can search for specific models on the Models to view detailed information.

4. Model Output Truncation Issues

To troubleshoot truncation issues:

  • For API requests:
    • Adjust max_tokens to an appropriate value. Outputs exceeding max_tokens will be truncated. The DeepSeek R1 series supports up to 16384 tokens.
    • Enable streaming output for lengthy responses to prevent 504 timeouts.
    • Increase the client timeout duration to prevent truncation due to incomplete output.
  • For third-party clients:
    • CherryStdio defaults to max_tokens=4096. Users can enable the “Message Length Limit” switch in settings to adjust the value.

5. Error Code Handling

Error CodeCommon CauseSolution
400Parameter format errorCheck the range of parameters like temperature
401API Key not correctly setVerify the API Key
403Insufficient permissionsCommonly requires real-name authentication; refer to error messages for other cases
429Request rate limit exceededImplement exponential backoff retry mechanism
503/504Model overloadSwitch to backup model nodes

5. Billing and Quota Management

5.1 Billing Formula

Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)

5.2 Example Pricing for Different Series

Specific model prices can be found on the Models under the model details page.

6. Application Scenarios

6.1 Technical Documentation Generation

from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")
response = client.chat.completions.create(  
    model="Qwen/Qwen2.5-Coder-32B-Instruct",  
    messages=[{  
        "role": "user",  
        "content": "Write a Python tutorial on asynchronous web scraping, including code examples and precautions."  
    }],  
    temperature=0.7,  
    max_tokens=4096  
)  

6.2 Data Analysis Reports

from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")
response = client.chat.completions.create(  
    model="Qwen/QVQ-72B-Preview",  
    messages=[    
        {"role": "system", "content": "You are a data analysis expert. Output results in Markdown."},  
        {"role": "user", "content": "Analyze the sales trends of new energy vehicles in 2023."}  
    ],  
    temperature=0.7,  
    max_tokens=4096  
)  
Model capabilities are continuously updated. It is recommended to visit the Models regularly for the latest information.