Chat completions

Authorizations

Authorization

string

header

required

Use the following format for authentication: Bearer <your api key>

Body

application/json

model

enum<string>

required

Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.

Available options:

deepseek-ai/DeepSeek-R1,

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B,

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B,

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B,

deepseek-ai/DeepSeek-V3,

deepseek-ai/DeepSeek-V3.1,

deepseek-ai/DeepSeek-V3.1-Terminus,

deepseek-ai/DeepSeek-V3.2-Exp,

deepseek-ai/deepseek-vl2,

baidu/ERNIE-4.5-300B-A47B,

THUDM/GLM-4-32B-0414,

THUDM/GLM-4-9B-0414,

THUDM/GLM-4.1V-9B-Thinking,

zai-org/GLM-4.5,

zai-org/GLM-4.5-Air,

zai-org/GLM-4.5V,

zai-org/GLM-4.6,

THUDM/GLM-Z1-32B-0414,

THUDM/GLM-Z1-9B-0414,

tencent/Hunyuan-A13B-Instruct,

tencent/Hunyuan-MT-7B,

moonshotai/Kimi-Dev-72B,

moonshotai/Kimi-K2-Instruct,

moonshotai/Kimi-K2-Instruct-0905,

inclusionAI/Ring-1T,

inclusionAI/Ling-1T,

inclusionAI/Ling-flash-2.0,

inclusionAI/Ling-mini-2.0,

inclusionAI/Ring-flash-2.0,

meta-llama/Meta-Llama-3.1-8B-Instruct,

MiniMaxAI/MiniMax-M1-80k,

Qwen/QwQ-32B,

Qwen/Qwen2.5-14B-Instruct,

Qwen/Qwen2.5-32B-Instruct,

Qwen/Qwen2.5-72B-Instruct,

Qwen/Qwen2.5-72B-Instruct-128K,

Qwen/Qwen2.5-7B-Instruct,

Qwen/Qwen2.5-Coder-32B-Instruct,

Qwen/Qwen2.5-VL-32B-Instruct,

Qwen/Qwen2.5-VL-72B-Instruct,

Qwen/Qwen2.5-VL-7B-Instruct,

Qwen/Qwen3-14B,

Qwen/Qwen3-235B-A22B,

Qwen/Qwen3-235B-A22B-Instruct-2507,

Qwen/Qwen3-235B-A22B-Thinking-2507,

Qwen/Qwen3-30B-A3B,

Qwen/Qwen3-30B-A3B-Instruct-2507,

Qwen/Qwen3-30B-A3B-Thinking-2507,

Qwen/Qwen3-32B,

Qwen/Qwen3-8B,

Qwen/Qwen3-Coder-30B-A3B-Instruct,

Qwen/Qwen3-Coder-480B-A35B-Instruct,

Qwen/Qwen3-Next-80B-A3B-Instruct,

Qwen/Qwen3-Next-80B-A3B-Thinking,

Qwen/Qwen3-Omni-30B-A3B-Captioner,

Qwen/Qwen3-Omni-30B-A3B-Instruct,

Qwen/Qwen3-Omni-30B-A3B-Thinking,

ByteDance-Seed/Seed-OSS-36B-Instruct,

openai/gpt-oss-120b,

openai/gpt-oss-20b,

stepfun-ai/step3

Example:

"Qwen/QwQ-32B"

messages

object[]

required

A list of messages comprising the conversation so far.

Required array length: 1 - 10 elements

Show child attributes

stream

boolean

If set, tokens are returned as Server-Sent Events as they are made available. Stream terminates with data: [DONE]

Example:

false

max_tokens

integer

The maximum number of tokens to generate. Ensure that input tokens + max_tokens do not exceed the model’s context window. As some services are still being updated, avoid setting max_tokens to the window’s upper bound; reserve ~10k tokens as buffer for input and system overhead. See Models(https://cloud.siliconflow.cn/models) for details.

Example:

4096

enable_thinking

boolean

Switches between thinking and non-thinking modes. Default is True. This field supports the following models:

- Qwen/Qwen3-8B
- Qwen/Qwen3-14B
- Qwen/Qwen3-32B
- wen/Qwen3-30B-A3B
- Qwen/Qwen3-235B-A22B
- tencent/Hunyuan-A13B-Instruct
- zai-org/GLM-4.5V
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1-Terminus
- deepseek-ai/DeepSeek-V3.2-Exp

If you want to use the function call feature for deepseek-ai/DeepSeek-V3.1, you need to set enable_thinking to false.

Example:

false

thinking_budget

integer

default:4096

Maximum number of tokens for chain-of-thought output. This field applies to all Reasoning models.

Required range: 128 <= x <= 32768

Example:

4096

min_p

number

Dynamic filtering threshold that adapts based on token probabilities.This field only applies to Qwen3.

Required range: 0 <= x <= 1

Example:

0.05

stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Example:

null

temperature

number

Determines the degree of randomness in the response.

Example:

0.7

top_p

number

default:0.7

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

Example:

0.7

top_k

number

Example:

50

frequency_penalty

number

Example:

0.5

integer

Number of generations to return

Example:

1

response_format

object

An object specifying the format that the model must output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.

Show child attributes

Response

200

string

choices

object[]

Show child attributes

usage

object

Show child attributes

created

integer

model

string

object

enum<string>

Available options:

chat.completion

Chat

Completions

Image

Audio

Video

Platform

Authorizations

Body

Response