whisper

Model ID: @cf/openai/whisper

Automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data

Properties

Task Type: Automatic Speech Recognition

Code Examples

Workers - TypeScript

export interface Env {  AI: Ai;
}

export default {  async fetch(request, env): Promise<Response> {    const res = await fetch(      "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"    );    const blob = await res.arrayBuffer();
    const input = {      audio: [...new Uint8Array(blob)],    };
    const response = await env.AI.run(      "@cf/openai/whisper",      input    );
    return Response.json({ input: { audio: [] }, response });  },
} satisfies ExportedHandler<Env>;

curl

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper \  -X POST \  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \  --data-binary "@talking-llama.mp3"

Response

Automatic speech recognition responses return both a single string text property with the audio transciption and an optional array of words with start and end timestamps if the model supports that.

Here’s an example of the output from the @cf/openai/whisper model:

{  "text": "It is a good day",  "word_count": 5,  "words": [    {      "word": "It",      "start": 0.5600000023841858,      "end": 1    },    {      "word": "is",      "start": 1,      "end": 1.100000023841858    },    {      "word": "a",      "start": 1.100000023841858,      "end": 1.2200000286102295    },    {      "word": "good",      "start": 1.2200000286102295,      "end": 1.3200000524520874    },    {      "word": "day",      "start": 1.3200000524520874,      "end": 1.4600000381469727    }  ]
}

API Schema

The following schema is based on JSON Schema

Input JSON Schema

{
"oneOf": [  {    "type": "string",    "format": "binary"  },  {    "type": "object",    "properties": {      "audio": {        "type": "array",        "items": {          "type": "number"        }      }    },    "required": [      "audio"    ]  }
]
}

Output JSON Schema

{
"type": "object",
"contentType": "application/json",
"properties": {  "text": {    "type": "string"  },  "word_count": {    "type": "number"  },  "words": {    "type": "array",    "items": {      "type": "object",      "properties": {        "word": {          "type": "string"        },        "start": {          "type": "number"        },        "end": {          "type": "number"        }      }    }  },  "vtt": {    "type": "string"  }
},
"required": [  "text"
]
}

whisper

​​ Properties

​​ Code Examples

​​ Response

​​ API Schema

Properties

Code Examples

Response

API Schema