创建转录

将音频文件转录为文本，支持可选的语言检测。

认证

Authorization Bearer

在 Authorization 请求头中使用 API Key 作为 Bearer Token。

Request Body

modelstringrequired

Model ID.

languagestring

Input audio language.

promptstring

Transcription prompt.

response_formatstring

Response format.

temperaturenumber

Controls decoding randomness for transcription when supported. Lower values favor stable transcripts; higher values allow more alternatives.

timestamp_granularitiesarray

Timestamp granularity levels.

streamboolean

Streaming response flag.

enable_itnboolean

Converts spoken or formatted entities into normalized written forms, such as numbers, dates, or times, when supported.

enable_puncboolean

Restores punctuation in the transcribed text when supported.

enable_channel_splitboolean

Processes audio channels separately and returns channel-aware transcription results when supported.

enable_ddcboolean

DDC processing flag.

enable_speaker_infoboolean

Returns speaker-related metadata or labels when supported.

corpusobject

Custom corpus configuration.

hotwordsstring

Hotword hints.

request_idstring

Client request ID.

user_idstring

End-user ID.

segment_durationnumber

Audio segment duration.

audio_formatstring

Input audio format.

codecstring

Input audio codec.

rateinteger

Input audio sample rate in Hz.

bitsinteger

Input audio bit depth.

channelinteger

Number of input audio channels.

show_utterancesboolean

Returns utterance-level transcription details, usually including text and timing boundaries, when supported.

enable_nonstreamboolean

Non-streaming recognition flag.

uidstring

Speech recognition user ID.

filestring

Audio file.

file_urlstring

Audio file URL.

audio_datastring

Inline audio data.

metadataobject

Application metadata.

extra_bodyobject

Additional request body fields.

provider_optionsobject

Upstream configuration.

提供 file 或 file_url。如果二者同时存在，将使用 file_url。模型专属 ASR 字段请查看 GET /v1/models/{model_id}.parameter_schema。

Response

textstring

转录文本。

modelstring

使用的模型 ID。

languagestring

检测到的语言代码。verbose_json 格式返回。

durationnumber

音频时长，单位秒。verbose_json 格式返回。

taskstring

转录任务类型。verbose_json 格式返回。

segmentsobject[]

带时间戳的片段。verbose_json 格式返回。

segments.idinteger

片段 ID。

segments.startnumber

片段开始时间，单位秒。

segments.endnumber

片段结束时间，单位秒。

segments.textstring

片段文本。

utterancesobject[]

逐句详细信息。所选模型支持时返回。

utterances.startnumber

Utterance start time in seconds.

utterances.endnumber

Utterance end time in seconds.

utterances.textstring

Utterance text.

utterances.speakerstring

Speaker label when returned.

认证

Request Body

Response

请求

响应