model
: The model used for speech synthesis. Supported model list.input
: The text content to be converted into audio.voice
: Reference voice, supporting system preset voices, user preset voices, and user dynamic voices. For detailed parameters, see: Create Text-to-Speech Request.speed
: Controls the audio speed. Type: float. Default value: 1.0. Range: [0.25, 4.0].gain
: Audio gain in dB, controlling the volume of the audio. Type: float. Default value: 0.0. Range: [-10, 10].response_format
: Controls the output format. Supported formats include mp3, opus, wav, and pcm. The sampling rate varies depending on the output format.sample_rate
: Controls the output sampling rate. The default value and available range vary by output type:
FunAudioLLM/CosyVoice2-0.5B:alex
indicates the alex voice from the FunAudioLLM/CosyVoice2-0.5B model.
fishaudio/fish-speech-1.5:anna
indicates the anna voice from the fishaudio/fish-speech-1.5 model.
uri
field in the response is the ID of the custom voice, which can be used as the voice
parameter in subsequent requests.
uri
field in the response is the ID of the custom voice, which can be used as the voice
parameter in subsequent requests.
uri
field in the response is the ID of the custom voice, which can be used as the voice
parameter in subsequent requests.
uri
field in the request parameters is the ID of the custom voice.