Providers

Azure Speech

Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call.

OpenClaw uses the Azure Speech REST API directly with SSML and sends the provider-owned output format through X-Microsoft-OutputFormat.

Detail Value
Website Azure AI Speech
Docs Speech REST text-to-speech
Auth AZURE_SPEECH_KEY plus AZURE_SPEECH_REGION
Default voice en-US-JennyNeural
Default file output audio-24khz-48kbitrate-mono-mp3
Default voice-note file ogg-24khz-16bit-mono-opus

Getting started

  • Create an Azure Speech resource

    In the Azure portal, create a Speech resource. Copy KEY 1 from Resource Management > Keys and Endpoint, and copy the resource location such as eastus.

    AZURE_SPEECH_KEY=<speech-resource-key>
    AZURE_SPEECH_REGION=eastus
    
  • Select Azure Speech in messages.tts

    {
      messages: {
        tts: {
          auto: "always",
          provider: "azure-speech",
          providers: {
            "azure-speech": {
              voice: "en-US-JennyNeural",
              lang: "en-US",
            },
          },
        },
      },
    }
    
  • Send a message

    Send a reply through any connected channel. OpenClaw synthesizes the audio with Azure Speech and delivers MP3 for standard audio, or Ogg/Opus when the channel expects a voice note.

  • Configuration options

    Option Path Description
    apiKey messages.tts.providers.azure-speech.apiKey Azure Speech resource key. Falls back to AZURE_SPEECH_KEY, AZURE_SPEECH_API_KEY, or SPEECH_KEY.
    region messages.tts.providers.azure-speech.region Azure Speech resource region. Falls back to AZURE_SPEECH_REGION or SPEECH_REGION.
    endpoint messages.tts.providers.azure-speech.endpoint Optional Azure Speech endpoint/base URL override.
    baseUrl messages.tts.providers.azure-speech.baseUrl Optional Azure Speech base URL override.
    voice messages.tts.providers.azure-speech.voice Azure voice ShortName (default en-US-JennyNeural).
    lang messages.tts.providers.azure-speech.lang SSML language code (default en-US).
    outputFormat messages.tts.providers.azure-speech.outputFormat Audio-file output format (default audio-24khz-48kbitrate-mono-mp3).
    voiceNoteOutputFormat messages.tts.providers.azure-speech.voiceNoteOutputFormat Voice-note output format (default ogg-24khz-16bit-mono-opus).

    Notes

    Authentication

    Azure Speech uses a Speech resource key, not an Azure OpenAI key. The key is sent as Ocp-Apim-Subscription-Key; OpenClaw derives https://<region>.tts.speech.microsoft.com from region unless you provide endpoint or baseUrl.

    Voice names

    Use the Azure Speech voice ShortName value, for example en-US-JennyNeural. The bundled provider can list voices through the same Speech resource and filters voices marked deprecated or retired.

    Audio outputs

    Azure accepts output formats such as audio-24khz-48kbitrate-mono-mp3, ogg-24khz-16bit-mono-opus, and riff-24khz-16bit-mono-pcm. OpenClaw requests Ogg/Opus for voice-note targets so channels can send native voice bubbles without an extra MP3 conversion.

    Alias

    azure is accepted as a provider alias for existing PRs and user config, but new config should use azure-speech to avoid confusion with Azure OpenAI model providers.