Reasoning models are AI systems based on deep learning that solve complex tasks through logical deduction, knowledge association, and context analysis. Typical applications include mathematical problem solving, code generation, logical judgment, and multi-step reasoning scenarios. These types of models typically have the following characteristics:
tencent:
MiniMaxAI:
Qwen Series:
THUDM Series:
deepseek-ai Series:
Maximum Chain-of-Thought Length (thinking_budget): The number of tokens the model uses for internal reasoning. Adjusting the thinking_budget controls the length of the chain-of-thought process.
Maximum Response Length (max_tokens): This is used to limit the number of tokens in the final output to the user, excluding the chain-of-thought part. Users can configure this normally to control the maximum length of the response.
Maximum Context Length (context_length): This is the maximum total content length, including the user input, chain-of-thought, and output. It is not a request parameter and does not need to be set by the user.
The maximum response length, maximum reasoning chain length, and maximum context length supported by different models are shown in the table below:
Model | Maximum Response Length | Maximum Reasoning Chain Length | Maximum Context Length |
---|---|---|---|
DeepSeek-R1 | 16384 | 32768 | 98304 |
DeepSeek-R1-Distill Series | 16384 | 32768 | 131072 |
Qwen3 Series | 8192 | 32768 | 131072 |
QwQ-32B | 32768 | 16384 | 131072 |
GLM-Z1 Series | 16384 | 32768 | 131072 |
MiniMax-M1-80k | 40000 | 40000 | 80000 |
Hunyuan-A13B-Instruct | 8192 | 38912 | 131072 |
After decoupling the reasoning model’s chain-of-thought process from the response length, the output behavior will follow the following rules:
How to obtain the API key?
Please visit SiliconFlow to register and obtain the API key.
How to handle long text?
You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.
Reasoning models are AI systems based on deep learning that solve complex tasks through logical deduction, knowledge association, and context analysis. Typical applications include mathematical problem solving, code generation, logical judgment, and multi-step reasoning scenarios. These types of models typically have the following characteristics:
tencent:
MiniMaxAI:
Qwen Series:
THUDM Series:
deepseek-ai Series:
Maximum Chain-of-Thought Length (thinking_budget): The number of tokens the model uses for internal reasoning. Adjusting the thinking_budget controls the length of the chain-of-thought process.
Maximum Response Length (max_tokens): This is used to limit the number of tokens in the final output to the user, excluding the chain-of-thought part. Users can configure this normally to control the maximum length of the response.
Maximum Context Length (context_length): This is the maximum total content length, including the user input, chain-of-thought, and output. It is not a request parameter and does not need to be set by the user.
The maximum response length, maximum reasoning chain length, and maximum context length supported by different models are shown in the table below:
Model | Maximum Response Length | Maximum Reasoning Chain Length | Maximum Context Length |
---|---|---|---|
DeepSeek-R1 | 16384 | 32768 | 98304 |
DeepSeek-R1-Distill Series | 16384 | 32768 | 131072 |
Qwen3 Series | 8192 | 32768 | 131072 |
QwQ-32B | 32768 | 16384 | 131072 |
GLM-Z1 Series | 16384 | 32768 | 131072 |
MiniMax-M1-80k | 40000 | 40000 | 80000 |
Hunyuan-A13B-Instruct | 8192 | 38912 | 131072 |
After decoupling the reasoning model’s chain-of-thought process from the response length, the output behavior will follow the following rules:
How to obtain the API key?
Please visit SiliconFlow to register and obtain the API key.
How to handle long text?
You can adjust the max_tokens parameter to control the length of the output, but please note that the maximum length is 16K.