Authorizations
Use the following format for authentication: Bearer <your api key>
Body
- fish-speech-1.5
- CosyVoice2-0.5B
- IndexTTS-2
Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.
fishaudio/fish-speech-1.5 The text to generate audio for.
1 - 128000"The text to generate audio for"
fishaudio/fish-speech-1.5:alex, fishaudio/fish-speech-1.5:anna, fishaudio/fish-speech-1.5:bella, fishaudio/fish-speech-1.5:benjamin, fishaudio/fish-speech-1.5:charles, fishaudio/fish-speech-1.5:claire, fishaudio/fish-speech-1.5:david, fishaudio/fish-speech-1.5:diana The format to audio out. Supported formats are mp3, opus, wav, pcm
mp3, opus, wav, pcm Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.
8000, 16000, 24000, 32000, 44100, 48000 32000
streaming or not
The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
0.25 <= x <= 4-10 <= x <= 10Response
Generate audio based on the input text. The data generated by the interface is in binary format and requires the user to process it themselves. Reference:https://docs.siliconflow.com/capabilities/text-to-speech#5
The response is of type file.