Skip to main content

Important Notice

⚠️ Some models introduce new Interleaved Thinking behaviors.
On SiliconFlow, DeepSeek V3.2 and GLM-4.7 may emit Interleaved Thinking-related structured output when using the serverless model API—most commonly in tool-calling flows.
To ensure correctness, stability, and optimal performance, please follow the guidelines below.

Overview

On SiliconFlow, Interleaved Thinking is currently supported by:
  • DeepSeek V3.2
  • GLM-4.7
Interleaved Thinking is especially useful for:
  • Agent-style orchestration
  • Tool-calling scenarios
  • Coding and debugging
  • Multi-step tasks that benefit from intermediate tool outputs

1. What is Interleaved Thinking?

With Interleaved Thinking, a model can:
  1. Decide whether it needs to call a tool
  2. Call a tool
  3. Receive tool results
  4. Continue from intermediate outputs
  5. Decide the next step (call another tool or produce a final answer)
This enables robust multi-step execution where tool outputs can influence subsequent steps.

Diagram: Turn/Step structure with tool results

The diagram below illustrates how a single Turn can contain multiple Steps, and how the model may continue producing reasoning_content after tool results (i.e., after you send role="tool" messages).
For correct tool-calling behavior, preserve and replay reasoning_content exactly as received across the entire sequence.
Interleaved Thinking (Turn/Step, tool call/result, reasoning_content)

2. Tool Calling: The Non-Negotiable Rule

When using tool calling with DeepSeek V3.2 or GLM-4.7, the API may return additional structured output in a dedicated field:
  • reasoning_content
You must preserve reasoning_content exactly as received, and send it back unchanged in subsequent requests.

What must be preserved (including after tool results)

To be explicit: preservation is required not only for reasoning_content produced after user messages, but also for reasoning_content produced after tool results. In tool-enabled flows, the model may produce reasoning:
  • before any tool call,
  • between multiple tool calls,
  • and after receiving tool results (i.e., after you send role="tool" messages and the model continues).
You must preserve and replay all such reasoning_content exactly as generated. This includes:
  • Content emitted before any tool calls
  • Content emitted between tool calls (multi-step tool chaining)
  • Content emitted after tool results
  • Any reasoning_content segments produced across turns (keep the original order)

What you must NOT do

Do NOT:
  • Modify the text
  • “Clean up” or post-process it
  • Merge or split segments
  • Reorder segments
  • Drop it while keeping only normal assistant text
If you do, you may see:
  • Broken multi-step behavior around tools
  • Instability across tool calls
  • Reduced cache efficiency and degraded output quality

When receiving responses:
  • Accumulate normal assistant text from content (or delta.content if streaming)
  • Accumulate Interleaved Thinking text from reasoning_content (or delta.reasoning_content if streaming)
  • Collect tool requests from tool_calls (or delta.tool_calls if streaming)
When sending the assistant message back to the model, include all of them:
  • content
  • reasoning_content (verbatim, complete, in the exact original order)
  • tool_calls (as received)
Note: This rule applies to both streaming and non-streaming. Streaming only affects how you read fields (delta.*), not what you must preserve.

4. Example: Interleaved Thinking + Tool Calling (DeepSeek V3.2)

This example demonstrates DeepSeek V3.2 on SiliconFlow.
The same pattern also applies to GLM-4.7.
from openai import OpenAI
import json

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.siliconflow.com/v1/"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

messages = [
    {"role": "system", "content": "You are an assistant"},
    {"role": "user", "content": "What's the weather like in America?"}
]

# Round 1: model reasons, then calls a tool
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",
    messages=messages,
    tools=tools,
    stream=True  # optional; the same preservation rule applies to non-streaming
)

reasoning_content = ""
content = ""
tool_calls = []

for chunk in response:
    delta = chunk.choices[0].delta

    if getattr(delta, "reasoning_content", None):
        reasoning_content += delta.reasoning_content

    if getattr(delta, "content", None):
        content += delta.content

    if getattr(delta, "tool_calls", None):
        tool_calls.extend(delta.tool_calls)

# 🔑 Preserve reasoning_content verbatim (Round 1)
messages.append({
    "role": "assistant",
    "content": content,
    "reasoning_content": reasoning_content,
    "tool_calls": tool_calls
})

# Tool execution result (example)
messages.append({
    "role": "tool",
    "tool_call_id": tool_calls[0]["id"],
    "content": json.dumps({"weather": "Sunny", "temp": "25°C"})
})

# Round 2: model may produce NEW reasoning_content AFTER tool results
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",
    messages=messages,
    tools=tools,
    stream=True  # optional
)

reasoning_content_2 = ""
content_2 = ""
tool_calls_2 = []

for chunk in response:
    delta = chunk.choices[0].delta

    if getattr(delta, "reasoning_content", None):
        reasoning_content_2 += delta.reasoning_content

    if getattr(delta, "content", None):
        content_2 += delta.content

    if getattr(delta, "tool_calls", None):
        tool_calls_2.extend(delta.tool_calls)

# 🔑 Preserve reasoning_content verbatim (Round 2, i.e., after tool results)
messages.append({
    "role": "assistant",
    "content": content_2,
    "reasoning_content": reasoning_content_2,
    "tool_calls": tool_calls_2
})

5. Using GLM-4.7 Instead

To switch the example to GLM-4.7, change:
model="deepseek-ai/DeepSeek-V3.2"
to:
model="zai-org/GLM-4.7"
All Interleaved Thinking preservation rules remain the same, including preserving reasoning_content after tool results.

Summary

For DeepSeek V3.2 and GLM-4.7 on SiliconFlow:
  1. Interleaved Thinking enables multi-step, tool-aware execution
  2. With tool calling, you must preserve and replay reasoning_content verbatim—including any reasoning produced after tool results
  3. Never modify, drop, merge/split, or reorder reasoning_content
Following these rules ensures stable tool use and consistent multi-step behavior.