Text Generation
1. Core Capabilities of the Model
1.1 Basic Functions
- Text Generation: Generate coherent natural language text based on context, supporting various writing styles and tones.
- Semantic Understanding: Deeply analyze user intent, support multi-turn conversation management, ensuring coherence and accuracy in dialogues.
- Knowledge Q&A: Cover a wide range of knowledge domains, including science, technology, culture, history, etc., providing accurate answers.
- Code Assistance: Support code generation, explanation, and debugging for various mainstream programming languages (e.g., Python, Java, C++, etc.).
1.2 Advanced Capabilities
- Long-text Processing: Support context windows ranging from 4k to 64k tokens, suitable for generating long documents and handling complex conversational scenarios.
- Instruction Following: Accurately understand complex task instructions, such as “Compare Plan A and Plan B using a Markdown table.”
- Style Control: Adjust output style through system prompts, supporting academic, conversational, poetic, and other styles.
- Multi-modal Support: In addition to text generation, support tasks like image description and speech-to-text.
2. API Call Specifications
2.1 Basic Request Structure
You can perform end-to-end API requests using the OpenAI SDK.
2.2 Message Structure Explanation
Message Type | Function Description | Example Content |
---|---|---|
system | Model instructions, defining the AI’s role and general behavior | e.g., “You are a pediatrician with 10 years of experience.” |
user | User input, passing the end user’s message to the model | e.g., “How should a persistent fever in a toddler be treated?“ |
assistant | Model-generated historical responses, providing examples of how it should respond to the current request | e.g., “I suggest measuring the temperature first…” |
When you want the model to follow hierarchical instructions, message roles can help you achieve better outputs. However, they are not deterministic, so the best approach is to try different methods to see which yields optimal results.
3. Model Selection Guide
Visit the Models to filter language models supporting different functionalities using the filter options on the left. Learn about specific model details such as pricing, model size, maximum context length, and cost.
You can also experience the models in the Playground. Note that the Playground is only for model testing and does not retain historical conversation records. If you wish to save the conversation history, please do so manually. For more usage details, refer to the API Documentation.
4. Detailed Explanation of Core Parameters
4.1 Creativity Control
4.2 Output Limits
4.3 Common Issues with Language Model Scenarios
1. Model Output Garbled
Some models may produce garbled output if parameters are not set. To address this, try setting parameters like temperature
, top_k
, top_p
, and frequency_penalty
.
Corresponding payload adjustments for different languages:
2. Explanation of max_tokens
For LLM models provided by the platform:
- Models with a
max_tokens
limit of16384
include:- deepseek-ai/DeepSeek-R1
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Models with a
max_tokens
limit of8192
include:- Qwen/QwQ-32B-Preview
- deepseek-ai/DeepSeek-R1
- Models with a
max_tokens
limit of4096
include:- All other LLM models not mentioned above.
3. Explanation of context_length
The context_length
varies across different LLM models. You can search for specific models on the Models to view detailed information.
4. Model Output Truncation Issues
To troubleshoot truncation issues:
- For API requests:
- Adjust
max_tokens
to an appropriate value. Outputs exceedingmax_tokens
will be truncated. The DeepSeek R1 series supports up to 16384 tokens. - Enable streaming output for lengthy responses to prevent 504 timeouts.
- Increase the client timeout duration to prevent truncation due to incomplete output.
- Adjust
- For third-party clients:
- CherryStdio defaults to
max_tokens=4096
. Users can enable the “Message Length Limit” switch in settings to adjust the value.
- CherryStdio defaults to
5. Error Code Handling
Error Code | Common Cause | Solution |
---|---|---|
400 | Parameter format error | Check the range of parameters like temperature |
401 | API Key not correctly set | Verify the API Key |
403 | Insufficient permissions | Commonly requires real-name authentication; refer to error messages for other cases |
429 | Request rate limit exceeded | Implement exponential backoff retry mechanism |
503/504 | Model overload | Switch to backup model nodes |
5. Billing and Quota Management
5.1 Billing Formula
Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)
5.2 Example Pricing for Different Series
Specific model prices can be found on the Models under the model details page.