Create transcription
/v1/audio/transcriptionsTranscribe audio files to text, with optional language detection.
Authentication
Authorization Bearer
API key as bearer token in Authorization header.
Request Body
filefilerequired if file_url is absentAudio file upload. Supported formats include mp3, wav, m4a, flac, and webm.
file_urlstringrequired if file is absentURL to an audio file. Use this as an alternative to uploading file.
modelstringrequiredSpeech-to-text model ID.
Example: "glm-asr-2512"
languagestringLanguage code for recognition.
Example: "en"
promptstringHint text to guide recognition.
response_formatenum<string>default:jsonResponse format.
Available options: json text verbose_json srt vtt
temperaturenumberSampling temperature from 0 to 1.
timestamp_granularities[]enum<string>[]Timestamp granularity. Requires response_format=verbose_json.
Available options: word segment
extra_bodyobjectVendor-specific fields such as hotwords, request_id, and user_id.
*Provide exactly one of
fileorfile_url.
Response
textstringTranscribed text.
modelstringModel ID used.
languagestringDetected language code. Returned with verbose_json.
durationnumberAudio duration in seconds. Returned with verbose_json.
taskstringTranscription task type. Returned with verbose_json.
segmentsobject[]Timestamped segments. Returned with verbose_json.
segments.idintegerSegment ID.
segments.startnumberSegment start time in seconds.
segments.endnumberSegment end time in seconds.
segments.textstringSegment text.
utterancesobject[]Utterance-level details. Returned when supported by the selected model.
Previous
Delete image
Next
Submit video generation request