POST
/
api
/
gateway
/
v1
/
audio
/
transcriptions
Create transcription
curl --request POST \
  --url http://localhost:3000/api/gateway/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=openai/gpt-4o-mini \
  --form 'metadata={
  "waystone": "{\n\t\t\t\t\t\t\"user\": {\n\t\t\t\t\t\t\t\"id\": \"user123\",\n\t\t\t\t\t\t\t\"metadata\": { \"email\": \"user@example.com\", \"name\": \"John Doe\" }\n\t\t\t\t\t\t}\n\t\t\t\t\t}"
}' \
  --form store=false \
  --form 'language=<string>' \
  --form 'prompt=<string>' \
  --form response_format=json \
  --form stream=false \
  --form file=@example-file
[
  {
    "delta": "<string>",
    "type": "<string>",
    "logprobs": [
      {
        "bytes": [
          "<any>"
        ],
        "logprob": 123,
        "token": "<string>"
      }
    ]
  }
]

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
model
string
required

<provider>/<model> to use in the query. If <model> is unique across providers, can supply <model> only

Examples:

"openai/gpt-4o-mini"

"gpt-4o-mini"

metadata
object

Key-value pairs of metadata for the request. Identify end-users associated with the query with the waystone metadata. All other fields are metadata for OpenAI requests and are not used by Waystone.

Examples:
{
"waystone": "{\n\t\t\t\t\t\t\"user\": {\n\t\t\t\t\t\t\t\"id\": \"user123\",\n\t\t\t\t\t\t\t\"metadata\": { \"email\": \"user@example.com\", \"name\": \"John Doe\" }\n\t\t\t\t\t\t}\n\t\t\t\t\t}"
}
{
"waystone": "{\n\t\t\t\t\t\t\"user\": \"user123\",\n\t\t\t\t\t\t\"group\": { \"id\": \"group123\", \"metadata\": { \"name\": \"Group Name\" } }\n\t\t\t\t\t}"
}
store
boolean | null
default:false

For OpenAI: Whether or not to store the output of this chat completion request in OpenAI

file
file

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

language
string

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

prompt
string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

response_format
enum<string>
default:json

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

Available options:
json,
text,
srt,
verbose_json,
vtt
stream
boolean | null
default:false

If set to true, the model response data will be streamed to the client as it is generated. Streaming availability is dependent on the model provider.

Response

200 - application/json

Stream of transcription events

delta
string
required

The text delta that was additionally transcribed.

type
string
required

The type of the event.

logprobs
object[]
required