Generate Dialogue (Click to View Details)
Analyze an Image (Click to View Details)
Generate JSON Data (Click for Details)
Message Type | Function Description | Example Content |
---|---|---|
system | Model instructions, defining the AI’s role and general behavior | e.g., “You are a pediatrician with 10 years of experience.” |
user | User input, passing the end user’s message to the model | e.g., “How should a persistent fever in a toddler be treated?“ |
assistant | Model-generated historical responses, providing examples of how it should respond to the current request | e.g., “I suggest measuring the temperature first…” |
temperature
, top_k
, top_p
, and frequency_penalty
.
Corresponding payload adjustments for different languages:
max_tokens
The max_tokens is equal to the context length. Since some model inference services are still being updated, please do not set max_tokens to the maximum value (context length) when making a request. It is recommended to reserve around 10k as space for input content.
3. Explanation of context_length
The context_length
varies across different LLM models. You can search for specific models on the Models to view detailed information.
4. Output Truncation Issues in Model Inference
Here are several aspects to troubleshoot the issue:
Error Code | Common Cause | Solution |
---|---|---|
400 | Parameter format error | Check the range of parameters like temperature |
401 | API Key not correctly set | Verify the API Key |
403 | Insufficient permissions | Commonly requires real-name authentication; refer to error messages for other cases |
429 | Request rate limit exceeded | Implement exponential backoff retry mechanism |
503/504 | Model overload | Switch to backup model nodes |
Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)