Reasoning

1. Overview

Reasoning models are AI systems based on deep learning that solve complex tasks through logical deduction, knowledge association, and context analysis. Typical applications include mathematical problem solving, code generation, logical judgment, and multi-step reasoning scenarios. These types of models typically have the following characteristics:

Structured thinking: Using techniques like Chain-of-Thought to break down complex problems
Knowledge integration: Combining domain knowledge bases with common sense reasoning capabilities
Self-correction mechanism: Enhancing result reliability through feedback loops
Multimodal processing: Some advanced models support mixed input of text, code, and formulas

2. Supported model list

tencent:
- tencent/Hunyuan-A13B-Instruct
MiniMaxAI：
- MiniMaxAI/MiniMax-M1-80k
Qwen Series:
- Qwen/Qwen3-30B-A3B
- Qwen/Qwen3-32B
- Qwen/Qwen3-14B
- Qwen/Qwen3-8B
- Qwen/Qwen3-235B-A22B
- Qwen/QwQ-32B
THUDM Series:
- THUDM/GLM-Z1-32B-0414
- THUDM/GLM-Z1-Rumination-32B-0414
- THUDM/GLM-Z1-9B-0414
deepseek-ai Series:
- deepseek-ai/DeepSeek-R1
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

3. Usage recommendations

3.1 API parameters

3.1.1 Input parameters

Input parameters:
- Maximum Chain-of-Thought Length (thinking_budget): The number of tokens the model uses for internal reasoning. Adjusting the thinking_budget controls the length of the chain-of-thought process.
- Maximum Response Length (max_tokens): This is used to limit the number of tokens in the final output to the user, excluding the chain-of-thought part. Users can configure this normally to control the maximum length of the response.

Maximum Context Length (context_length): This is the maximum total content length, including the user input, chain-of-thought, and output. It is not a request parameter and does not need to be set by the user.

The maximum response length, maximum reasoning chain length, and maximum context length supported by different models are shown in the table below:

Model	Maximum Response Length	Maximum Reasoning Chain Length	Maximum Context Length
DeepSeek-R1	16384	32768	98304
DeepSeek-R1-Distill Series	16384	32768	131072
Qwen3 Series	8192	32768	131072
QwQ-32B	32768	16384	131072
GLM-Z1 Series	16384	32768	131072
MiniMax-M1-80k	40000	40000	80000
Hunyuan-A13B-Instruct	8192	38912	131072

After decoupling the reasoning model’s chain-of-thought process from the response length, the output behavior will follow the following rules:
- If the number of tokens generated during the “thinking phase” reaches the thinking_budget, the Qwen3 series reasoning model, which natively supports this parameter, will forcibly stop the chain-of-thought reasoning. Other reasoning models might continue to output the thinking content.
- If the maximum response length exceeds the max_tokens limit or the context length exceeds the context_length restriction, the response content will be truncated. The finish_reason field in the response will be marked as length, indicating that the output was terminated due to length constraints.

3.1.2 Return parameters

Return parameters:
- reasoning_content: Reasoning chain content, at the same level as content.
- content: Final answer content

3.2 DeepSeek-R1 Usage Recommendations

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Set the value of top_p to 0.95.
Avoid adding a system prompt; all instructions should be contained within the user prompt
For mathematical problems, it is advisable to include a directive in your prompt such as: “Please reason step by step, and put your final answer within \boxed.”
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

4. OpenAI request examples

4.1 Stream Mode Request

from openai import OpenAI

url = 'https://api.ap.siliconflow.com/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output.
content = ""
reasoning_content=""
messages = [
    {"role": "user", "content": "Who are the legendary athletes of the Olympic Games?"}
]
response = client.chat.completions.create(
    model="Pro/deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True,  # Enable streaming output.
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
# Step-by-step receiving and processing the response.
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Go on"})
response = client.chat.completions.create(
    model="Pro/deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True
)

4.2 Non-Stream Mode Request

from openai import OpenAI
url = 'https://api.ap.siliconflow.com/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a non-streaming request.
messages = [
    {"role": "user", "content": "Who are the legendary athletes of the Olympic Games?"}
]
response = client.chat.completions.create(
    model="Pro/deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False, 
    max_tokens=4096,
    extra_body={
        "thinking_budget": 1024
    }
)
content = response.choices[0].message.content
reasoning_content = response.choices[0].message.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "GO on"})
response = client.chat.completions.create(
    model="Pro/deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False
)

5. Notes

API Key: Ensure you use the correct API key for authentication.
Stream Mode: Stream mode is suitable for scenarios where responses need to be received incrementally, while non-stream mode is suitable for scenarios where a complete response is needed at once.

6. Common questions

How to obtain the API key?

Please visit SiliconFlow to register and obtain the API key.
How to handle long text?

You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.

GET STARTED

Capabilities

Features

1. Overview

2. Supported model list

3. Usage recommendations

3.1 API parameters

3.1.1 Input parameters

3.1.2 Return parameters

3.2 DeepSeek-R1 Usage Recommendations

4. OpenAI request examples

4.1 Stream Mode Request

4.2 Non-Stream Mode Request

5. Notes

6. Common questions

GET STARTED

Capabilities

Features

​1. Overview

​2. Supported model list

​3. Usage recommendations

​3.1 API parameters

​3.1.1 Input parameters

​3.1.2 Return parameters

​3.2 DeepSeek-R1 Usage Recommendations

​4. OpenAI request examples

​4.1 Stream Mode Request

​4.2 Non-Stream Mode Request

​5. Notes

​6. Common questions

1. Overview

2. Supported model list

3. Usage recommendations

3.1 API parameters

3.1.1 Input parameters

3.1.2 Return parameters

3.2 DeepSeek-R1 Usage Recommendations

4. OpenAI request examples

4.1 Stream Mode Request

4.2 Non-Stream Mode Request

5. Notes

6. Common questions