Audio 音频

了解如何将音频转换为文本。

1. Create speech

从输入文本生成音频。
Generates audio from the input text.

请求演示：

响应：
音频文件内容。
The audio file content.

Request body(入参详解)

model （string，必填）
可用的 TTS 模型之一：tts-1 或 tts-1-hd
input （string，选填）
要为其生成音频的文本。最大长度为 4096 个字符。
voice （string，选填）
生成音频时使用的语音。支持的声音有alloy, echo, fable, onyx, nova, and shimmer。语音预览可在文本转语音指南中找到。
response_format （string，选填，Defaults to mp3）
音频输入的格式。支持的格式有 mp3、opus、aac、flac、wav 和 pcm。
speed （number，选填，Defaults to 1）
生成音频的速度。选择 0.25 到 4.0 之间的值。 1.0 是默认值。

示例1. python

示例2. node

2. Create transcription

将音频转录为输入语言。
Transcribes audio into the input language.

请求演示：

响应：

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

Request body(入参详解)

file （string，必填）
要转录的音频文件，格式为以下之一：mp3、mp4、mpeg、mpga、m4a、ogg、wav 或 webm。
model （string，必填）
要使用的模型ID。目前仅提供 Whisper-1（由我们的开源 Whisper V2 模型提供支持）。
language （string，选填）
输入音频的语言。以 ISO-639-1 格式提供输入语言将提高准确性和延迟。
prompt （string，选填）
一个可选的文本，用于指导模型的风格或继续之前的音频片段。 prompt 应该与音频语言相匹配。
response_format （string，选填，Defaults to json）
转录输出的格式，可选项包括：json、文本、srt、verbose_json或vtt。
temperature （number，选填，Defaults to 0）
采样温度介于0和1之间。较高的值（如0.8）会使输出更随机，而较低的值（如0.2）则会使其更加集中和确定性。如果设置为0，则模型将使用 log probability(对数概率) 自动增加温度，直到达到某些阈值。
timestamp_granularities[] （array，选填，Defaults to segment）
为此转录填充的时间戳粒度。 response_format 必须设置 verbose_json 才能使用时间戳粒度。支持以下选项中的一个或两个：单词或段。注意：段时间戳没有额外的延迟，但生成字时间戳会产生额外的延迟。

示例1. python

示例2. node

示例3. word timestamps

python的：

node的：

回复：

示例4. segment timestamps

python的：

node的：

回复：

3. Create translation

将音频翻译成英语。
Translates audio into English.

请求演示：

响应：

{
  "text": "Hello, my name is Wolfgang and I come from Germany. Where are you heading today?"
}

Request body(入参详解)

file （string，必填）
音频文件对象（不是文件名）以以下格式之一进行翻译：flac、mp3、mp4、mpeg、mpga、m4a、ogg、wav 或 webm。
model （string，必填）
要使用的模型的 ID。目前仅提供 Whisper-1（由我们的开源 Whisper V2 模型提供支持）。
prompt （string，选填）
用于指导模型风格或继续之前的音频片段的可选文本。提示应该是英文的。
response_format （string，选填，Defaults to json）
脚本输出的格式，采用以下选项之一：json、text、srt、verbose_json 或 vtt。
temperature （number，选填，Defaults to 0）
采样温度，介于 0 和 1 之间。较高的值（如 0.8）将使输出更加随机，而较低的值（如 0.2）将使其更加集中和确定性。如果设置为 0，模型将使用对数概率自动升高温度，直到达到特定阈值。

Audio 音频

Audio 音频#

1. Create speech#

示例1. python#

示例2. node#

2. Create transcription#

示例1. python#

示例2. node#

示例3. word timestamps#

示例4. segment timestamps#

3. Create translation#

示例1. python#

示例2. node#

Audio 音频

1. Create speech

示例1. python

示例2. node

2. Create transcription

示例1. python

示例2. node

示例3. word timestamps

示例4. segment timestamps

3. Create translation

示例1. python

示例2. node