1. Model Core Capabilities
1.1 Basic Functions
Text Generation: Generate coherent natural language text based on context, supporting various styles and genres. Semantic Understanding: Deeply parse user intent, supporting multi-round dialogue management to ensure the coherence and accuracy of conversations. Knowledge Q&A: Cover a wide range of knowledge domains, including science, technology, culture, history, etc., providing accurate knowledge answers. Code Assistance: Support code generation, explanation, and debugging for multiple mainstream programming languages (such as Python, Java, C++, etc.).1.2 Advanced Capabilities
Long Text Processing: Support context windows of 4k to 64k tokens, suitable for long document generation and complex dialogue scenarios. Instruction Following: Precisely understand complex task instructions, such as “compare A/B schemes using a Markdown table.” Style Control: Adjust output style through system prompts, supporting various styles such as academic, conversational, and poetry. Multimodal Support: In addition to text generation, support tasks such as image description and speech-to-text.2. API Call Specifications
2.1 Basic Request Structure
You can make end-to-end API requests using the OpenAI SDKGenerate Dialogue (Click to View Details)
Generate Dialogue (Click to View Details)
Analyze an Image (Click to View Details)
Analyze an Image (Click to View Details)
Generate JSON Data (Click for Details)
Generate JSON Data (Click for Details)
2.2 Message Structure Explanation
Message Type | Function Description | Example Content |
---|---|---|
system | Model instructions, defining the AI’s role and general behavior | e.g., “You are a pediatrician with 10 years of experience.” |
user | User input, passing the end user’s message to the model | e.g., “How should a persistent fever in a toddler be treated?“ |
assistant | Model-generated historical responses, providing examples of how it should respond to the current request | e.g., “I suggest measuring the temperature first…” |
3. Model Selection Guide
Visit the Models to filter language models supporting different functionalities using the filter options on the left. Learn about specific model details such as pricing, model size, maximum context length, and cost. You can also experience the models in the Playground. Note that the Playground is only for model testing and does not retain historical conversation records. If you wish to save the conversation history, please do so manually. For more usage details, refer to the API Documentation.4. Detailed Explanation of Core Parameters
4.1 Creativity Control
4.2 Output Limits
4.3 Common Issues with Language Model Scenarios
1. Model Output Garbled Some models may produce garbled output if parameters are not set. To address this, try setting parameters liketemperature
, top_k
, top_p
, and frequency_penalty
.
Corresponding payload adjustments for different languages:
max_tokens
The max_tokens is equal to the context length. Since some model inference services are still being updated, please do not set max_tokens to the maximum value (context length) when making a request. It is recommended to reserve around 10k as space for input content.
3. Explanation of context_length
The context_length
varies across different LLM models. You can search for specific models on the Models to view detailed information.
4. Output Truncation Issues in Model Inference
Here are several aspects to troubleshoot the issue:
- When encountering output truncation through API requests:
- Max Tokens Setting: Set the max_token to an appropriate value. If the output exceeds the max_token, it will be truncated.
- Stream Request Setting: In non-stream requests, long output content is prone to 504 timeout issues.
- Client Timeout Setting: Increase the client timeout to prevent truncation before the output is fully completed.
- When encountering output truncation through third-party client requests:
- CherryStdio has a default max_tokens of 4,096. Users can enable the “Enable Message Length Limit” switch to set the max_token to an appropriate value.
Error Code | Common Cause | Solution |
---|---|---|
400 | Parameter format error | Check the range of parameters like temperature |
401 | API Key not correctly set | Verify the API Key |
403 | Insufficient permissions | Commonly requires real-name authentication; refer to error messages for other cases |
429 | Request rate limit exceeded | Implement exponential backoff retry mechanism |
503/504 | Model overload | Switch to backup model nodes |
5. Billing and Quota Management
5.1 Billing Formula
Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)
5.2 Example Pricing for Different Series
Specific model prices can be found on the Models under the model details page.6. Application Scenarios
6.1 Technical Documentation Generation
6.2 Data Analysis Reports
Model capabilities are continuously updated. It is recommended to visit the Models regularly for the latest information.