Text Generation

Language Model (LLM) User Manual

1. Model Core Capabilities

1.1 Basic Functions

Text Generation: Generate coherent natural language text based on context, supporting various styles and genres.

Semantic Understanding: Deeply parse user intent, supporting multi-round dialogue management to ensure the coherence and accuracy of conversations.

Knowledge Q&A: Cover a wide range of knowledge domains, including science, technology, culture, history, etc., providing accurate knowledge answers.

Code Assistance: Support code generation, explanation, and debugging for multiple mainstream programming languages (such as Python, Java, C++, etc.).

1.2 Advanced Capabilities

Long Text Processing: Support context windows of 4k to 64k tokens, suitable for long document generation and complex dialogue scenarios.

Instruction Following: Precisely understand complex task instructions, such as “compare A/B schemes using a Markdown table.”

Style Control: Adjust output style through system prompts, supporting various styles such as academic, conversational, and poetry.

Multimodal Support: In addition to text generation, support tasks such as image description and speech-to-text.

2. API Call Specifications

2.1 Basic Request Structure

You can make end-to-end API requests using the OpenAI SDK

Generate Dialogue (Click to View Details)

    from openai import OpenAI  
    client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")  

    response = client.chat.completions.create(  
        model="deepseek-ai/DeepSeek-V3",  
        messages=[  
            {"role": "system", "content": "You are a helpful assistant."},  
            {"role": "user", "content": "Write a haiku about recursion in programming."}  
        ],  
        temperature=0.7,  
        max_tokens=1024,
        stream=True
    )  
    # 逐步接收并处理响应
    for chunk in response:
        if not chunk.choices:
            continue
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
        if chunk.choices[0].delta.reasoning_content:
            print(chunk.choices[0].delta.reasoning_content, end="", flush=True)

Analyze an Image (Click to View Details)

from openai import OpenAI

client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")

response = client.chat.completions.create(
    model="deepseek-ai/deepseek-vl2",
    messages=[
        {
            "role": "user",
             "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://sf-maas.s3.us-east-1.amazonaws.com/images/recDq23epr.png",
                        },
                    },
                     {
                         "type": "text",
                         "text": "What's in this image?"
                     }
                ],
        }
    ],
    temperature=0.7,
    max_tokens=1024,
    stream=True
)
# Process the response incrementally
for chunk in response:
    if not chunk.choices:
        continue
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    if chunk.choices[0].delta.reasoning_content:
        print(chunk.choices[0].delta.reasoning_content, end="", flush=True)

Generate JSON Data (Click for Details)

import json  
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY", # Obtain from https://cloud.siliconflow.com/account/ak
    base_url="https://api.ap.siliconflow.com/v1"
)

response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V2.5",
        messages=[
            {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
            {"role": "user", "content": "? Who were the men's and women's singles champions of the 2020 Olympic table tennis event? "
             "Please respond in the format {\"Men's Champion\": ..., \"Women's Champion\": ...}"}
        ],
        response_format={"type": "json_object"}
    )

print(response.choices[0].message.content)

2.2 Message Structure Explanation

Message Type	Function Description	Example Content
system	Model instructions, defining the AI’s role and general behavior	e.g., “You are a pediatrician with 10 years of experience.”
user	User input, passing the end user’s message to the model	e.g., “How should a persistent fever in a toddler be treated?“
assistant	Model-generated historical responses, providing examples of how it should respond to the current request	e.g., “I suggest measuring the temperature first…”

When you want the model to follow hierarchical instructions, message roles can help you achieve better outputs. However, they are not deterministic, so the best approach is to try different methods to see which yields optimal results.

3. Model Selection Guide

Visit the Models to filter language models supporting different functionalities using the filter options on the left. Learn about specific model details such as pricing, model size, maximum context length, and cost.

You can also experience the models in the Playground. Note that the Playground is only for model testing and does not retain historical conversation records. If you wish to save the conversation history, please do so manually. For more usage details, refer to the API Documentation.

4. Detailed Explanation of Core Parameters

4.1 Creativity Control

# Temperature parameter (0.0~2.0)   
temperature=0.5  # Balances creativity and reliability  

# Nucleus sampling (top_p)   
top_p=0.9  # Considers only the top 90% probability cumulative word set

4.2 Output Limits

max_tokens=1000  # Maximum generation length per request  
stop=["\n##", "<|end|>"]  # Stop sequences; output halts when encountering these strings 
frequency_penalty=0.5  # Suppresses repetitive word usage (-2.0~2.0)  
stream=true # Controls whether the output is streamed; recommended for models with lengthy outputs to prevent timeouts

4.3 Common Issues with Language Model Scenarios

1. Model Output Garbled

Some models may produce garbled output if parameters are not set. To address this, try setting parameters like temperature, top_k, top_p, and frequency_penalty.

Corresponding payload adjustments for different languages:

payload = {
    "model": "Qwen/Qwen2.5-Math-72B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": "1+1=?",
        }
    ],
    "max_tokens": 200,  # Adjust as needed
    "temperature": 0.7, # Adjust as needed
    "top_k": 50,        # Adjust as needed
    "top_p": 0.7,       # Adjust as needed
    "frequency_penalty": 0 # Adjust as needed
}

2. Explanation of max_tokens

For LLM models provided by the platform:

Models with a max_tokens limit of 16384 include:
- deepseek-ai/DeepSeek-R1
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- The model with a max_tokens limit of 8192:
  - baidu/ERNIE-4.5-300B-A47B
  - tencent/Hunyuan-A13B-Instruct
  - Qwen/Qwen3-30B-A3B
  - Qwen/Qwen3-32B
  - Qwen/Qwen3-14B
  - Qwen/Qwen3-8B
  - Qwen/Qwen3-235B-A22B
  - THUDM/GLM-4-32B-0414
  - THUDM/GLM-4-9B-0414
  - deepseek-ai/DeepSeek-V3
  - Qwen/QwQ-32B
  - Qwen/Qwen2.5-VL-32B-Instruct
Models with a max_tokens limit of 4096 include:
- All other LLM models not mentioned above.

3. Explanation of context_length

The context_length varies across different LLM models. You can search for specific models on the Models to view detailed information.

4. Model Output Truncation Issues

To troubleshoot truncation issues:

For API requests:
- Adjust max_tokens to an appropriate value. Outputs exceeding max_tokens will be truncated. The DeepSeek R1 series supports up to 16384 tokens.
- Enable streaming output for lengthy responses to prevent 504 timeouts.
- Increase the client timeout duration to prevent truncation due to incomplete output.
For third-party clients:
- CherryStdio defaults to max_tokens=4096. Users can enable the “Message Length Limit” switch in settings to adjust the value.

5. Error Code Handling

Error Code	Common Cause	Solution
400	Parameter format error	Check the range of parameters like temperature
401	API Key not correctly set	Verify the API Key
403	Insufficient permissions	Commonly requires real-name authentication; refer to error messages for other cases
429	Request rate limit exceeded	Implement exponential backoff retry mechanism
503/504	Model overload	Switch to backup model nodes

5. Billing and Quota Management

5.1 Billing Formula

Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)

5.2 Example Pricing for Different Series

Specific model prices can be found on the Models under the model details page.

6. Application Scenarios

6.1 Technical Documentation Generation

from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")
response = client.chat.completions.create(  
    model="Qwen/Qwen2.5-Coder-32B-Instruct",  
    messages=[{  
        "role": "user",  
        "content": "Write a Python tutorial on asynchronous web scraping, including code examples and precautions."  
    }],  
    temperature=0.7,  
    max_tokens=4096  
)

6.2 Data Analysis Reports

from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.ap.siliconflow.com/v1")
response = client.chat.completions.create(  
    model="Qwen/QVQ-72B-Preview",  
    messages=[    
        {"role": "system", "content": "You are a data analysis expert. Output results in Markdown."},  
        {"role": "user", "content": "Analyze the sales trends of new energy vehicles in 2023."}  
    ],  
    temperature=0.7,  
    max_tokens=4096  
)

Model capabilities are continuously updated. It is recommended to visit the Models regularly for the latest information.

GET STARTED

Capabilities

Features

1. Model Core Capabilities

1.1 Basic Functions

1.2 Advanced Capabilities

2. API Call Specifications

2.1 Basic Request Structure

2.2 Message Structure Explanation

3. Model Selection Guide

4. Detailed Explanation of Core Parameters

4.1 Creativity Control

4.2 Output Limits

4.3 Common Issues with Language Model Scenarios

5. Billing and Quota Management

5.1 Billing Formula

5.2 Example Pricing for Different Series

6. Application Scenarios

6.1 Technical Documentation Generation

6.2 Data Analysis Reports

GET STARTED

Capabilities

Features

​1. Model Core Capabilities

​1.1 Basic Functions

​1.2 Advanced Capabilities

​2. API Call Specifications

​2.1 Basic Request Structure

​2.2 Message Structure Explanation

​3. Model Selection Guide

​4. Detailed Explanation of Core Parameters

​4.1 Creativity Control

​4.2 Output Limits

​4.3 Common Issues with Language Model Scenarios

​5. Billing and Quota Management

​5.1 Billing Formula

​5.2 Example Pricing for Different Series

​6. Application Scenarios

​6.1 Technical Documentation Generation

​6.2 Data Analysis Reports

1. Model Core Capabilities

1.1 Basic Functions

1.2 Advanced Capabilities

2. API Call Specifications

2.1 Basic Request Structure

2.2 Message Structure Explanation

3. Model Selection Guide

4. Detailed Explanation of Core Parameters

4.1 Creativity Control

4.2 Output Limits

4.3 Common Issues with Language Model Scenarios

5. Billing and Quota Management

5.1 Billing Formula

5.2 Example Pricing for Different Series

6. Application Scenarios

6.1 Technical Documentation Generation

6.2 Data Analysis Reports