创建文本转语音请求

Authorizations

Authorization

string

header

required

Use the following format for authentication: Bearer

Body

application/json

fish-speech-1.5
CosyVoice2-0.5B

model

enum<string>

required

Corresponding Model Name. To better enhance service quality, we will make periodic changes to the models provided by this service, including but not limited to model on/offlining and adjustments to model service capabilities. We will notify you of such changes through appropriate means such as announcements or message pushes where feasible.

Available options:

fishaudio/fish-speech-1.5

input

string

required

The text to generate audio for.

Required string length: 1 - 128000

Example:

"The text to generate audio for"

voice

enum<string>

required

Available options:

fishaudio/fish-speech-1.5:alex,

fishaudio/fish-speech-1.5:anna,

fishaudio/fish-speech-1.5:bella,

fishaudio/fish-speech-1.5:benjamin,

fishaudio/fish-speech-1.5:charles,

fishaudio/fish-speech-1.5:claire,

fishaudio/fish-speech-1.5:david,

fishaudio/fish-speech-1.5:diana

response_format

enum<string>

default:mp3

The format to audio out. Supported formats are mp3, opus, wav, pcm

Available options:

mp3,

opus,

wav,

pcm

sample_rate

enum<number>

Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.

Available options:

8000,

16000,

24000,

32000,

44100,

48000

Example:

32000

stream

boolean

default:true

streaming or not

speed

number<float>

default:1

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Required range: 0.25 <= x <= 4

gain

number<float>

default:0

Required range: -10 <= x <= 10

Response

Generate audio based on the input text. The data generated by the interface is in binary format and requires the user to process it themselves. Reference:https://docs.siliconflow.com/capabilities/text-to-speech#5

The response is of type file.