Overview

DeepSeek-R1 is a series of advanced language models developed by deepseek-ai, designed to enhance the accuracy of final answers by outputting reasoning chain content (reasoning_content). This interface is compatible with the deepseek interface, and it is recommended to upgrade the OpenAI SDK to support new parameters when using this model.

Supported Models:

  • deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Installation and Upgrade

Before using DeepSeek-R1, ensure that the latest version of the OpenAI SDK is installed. You can upgrade it using the following command:

pip3 install -U openai

API Parameters

  • Input Parameters:

    • max_tokens: The maximum length of the answer (including reasoning chain output). The maximum value for max_tokens is 16k.
  • Return Parameters:

    • reasoning_content: The reasoning chain content, at the same level as content.
    • content: The final answer content.
  • Usage Recommendations:

    • Set temperature between 0.5 and 0.7 (recommended value is 0.6) to prevent infinite loops or incoherent outputs.
    • Set top_p to 0.95.
    • Avoid adding system prompts; all instructions should be included in the user prompt.
    • For mathematical problems, include an instruction in the prompt, such as: “Please reason step by step and write the final answer in \boxed.”
    • When evaluating model performance, it is recommended to conduct multiple tests and average the results.
    • The DeepSeek-R1 series tends to bypass reasoning mode (outputting “\n\n”) for certain queries, which may affect model performance. To ensure adequate reasoning, it is recommended to force the model to start each output with “\n”.

Context Concatenation

During each round of conversation, the model outputs reasoning chain content (reasoning_content) and the final answer (content). In the next round of conversation, the reasoning chain content from the previous round will not be concatenated into the context.

OpenAI Request Example

Streaming Output Request

from openai import OpenAI

url = 'https://api.ap.siliconflow.com/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with streaming output
content = ""
reasoning_content=""
messages = [
    {"role": "user", "content": "Who are the legendary athletes in the Olympics?"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True,  # Enable streaming output
    max_tokens=4096
)
# Receive and process response incrementally
for chunk in response:
    if chunk.choices[0].delta.content:
        content += chunk.choices[0].delta.content
    if chunk.choices[0].delta.reasoning_content:
        reasoning_content += chunk.choices[0].delta.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=True
)

Non-Streaming Output Request

from openai import OpenAI
url = 'https://api.ap.siliconflow.com/v1/'
api_key = 'your api_key'

client = OpenAI(
    base_url=url,
    api_key=api_key
)

# Send a request with non-streaming output
messages = [
    {"role": "user", "content": "Who are the legendary athletes in the Olympics?"}
]
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False, 
    max_tokens=4096
)
content = response.choices[0].message.content
reasoning_content = response.choices[0].message.reasoning_content

# Round 2
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "Continue"})
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=messages,
    stream=False
)

Notes

  • API Key: Ensure you are using the correct API key for authentication.
  • Streaming Output: Streaming output is suitable for scenarios where incremental responses are needed, while non-streaming output is better for retrieving the complete response at once.

FAQs

  • How to obtain an API key?

    Visit SiliconFlow to register and obtain an API key.

  • How to handle long texts?

    You can control the output length by adjusting the max_tokens parameter, but note that the maximum length is 16K.