Create transcription

Transcribe audio files to text, with optional language detection.

Authentication

Authorization Bearer

API key as bearer token in Authorization header.

Request Body

filefilerequired if file_url is absent

Audio file upload. Supported formats include mp3, wav, m4a, flac, and webm.

file_urlstringrequired if file is absent

URL to an audio file. Use this as an alternative to uploading file.

modelstringrequired

Speech-to-text model ID.

Example: "glm-asr-2512"

languagestring

Language code for recognition.

Example: "en"

promptstring

Hint text to guide recognition.

response_formatenum<string>default:json

Response format.

Available options: json text verbose_json srt vtt

temperaturenumber

Sampling temperature from 0 to 1.

timestamp_granularities[]enum<string>[]

Timestamp granularity. Requires response_format=verbose_json.

Available options: word segment

extra_bodyobject

Vendor-specific fields such as hotwords, request_id, and user_id.

*Provide exactly one of file or file_url.

Response

textstring

Transcribed text.

modelstring

Model ID used.

languagestring

Detected language code. Returned with verbose_json.

durationnumber

Audio duration in seconds. Returned with verbose_json.

taskstring

Transcription task type. Returned with verbose_json.

segmentsobject[]

Timestamped segments. Returned with verbose_json.

segments.idinteger

Segment ID.

segments.startnumber

Segment start time in seconds.

segments.endnumber

Segment end time in seconds.

segments.textstring

Segment text.

utterancesobject[]

Utterance-level details. Returned when supported by the selected model.