Providers

Google (Gemini)

The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.

  • Provider: google
  • Auth: GEMINI_API_KEY or GOOGLE_API_KEY
  • API: Google Gemini API
  • Runtime option: provider/model agentRuntime.id: "google-gemini-cli" reuses Gemini CLI OAuth while keeping model refs canonical as google/*.

Getting started

Choose your preferred auth method and follow the setup steps.

API key

Best for: standard Gemini API access through Google AI Studio.

  • Run onboarding

    openclaw onboard --auth-choice gemini-api-key
    

    Or pass the key directly:

    openclaw onboard --non-interactive \
      --mode local \
      --auth-choice gemini-api-key \
      --gemini-api-key "$GEMINI_API_KEY"
    
  • Set a default model

    {
      agents: {
        defaults: {
          model: { primary: "google/gemini-3.1-pro-preview" },
        },
      },
    }
    
  • Verify the model is available

    openclaw models list --provider google
    
  • Gemini CLI (OAuth)

    Best for: reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.

  • Install the Gemini CLI

    The local gemini command must be available on PATH.

    # Homebrew
    brew install gemini-cli
    
    # or npm
    npm install -g @google/gemini-cli
    

    OpenClaw supports both Homebrew installs and global npm installs, including common Windows/npm layouts.

  • Log in via OAuth

    openclaw models auth login --provider google-gemini-cli --set-default
    
  • Verify the model is available

    openclaw models list --provider google
    
    • Default model: google/gemini-3.1-pro-preview
    • Runtime: google-gemini-cli
    • Alias: gemini-cli

    Gemini 3.1 Pro's Gemini API model id is gemini-3.1-pro-preview. OpenClaw accepts the shorter google/gemini-3.1-pro as a convenience alias and normalizes it before provider calls.

    Environment variables:

    • OPENCLAW_GEMINI_OAUTH_CLIENT_ID
    • OPENCLAW_GEMINI_OAUTH_CLIENT_SECRET

    (Or the GEMINI_CLI_* variants.)

    google-gemini-cli/* model refs are legacy compatibility aliases. New configs should use google/* model refs plus the google-gemini-cli runtime when they want local Gemini CLI execution.

    Capabilities

    Capability Supported
    Chat completions Yes
    Image generation Yes
    Music generation Yes
    Text-to-speech Yes
    Realtime voice Yes (Google Live API)
    Image understanding Yes
    Audio transcription Yes
    Video understanding Yes
    Web search (Grounding) Yes
    Thinking/reasoning Yes (Gemini 2.5+ / Gemini 3+)
    Gemma 4 models Yes

    The bundled gemini web-search provider uses Gemini Google Search grounding. Configure a dedicated search key under plugins.entries.google.config.webSearch, or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:

    {
      plugins: {
        entries: {
          google: {
            config: {
              webSearch: {
                apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
                baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
                model: "gemini-2.5-flash",
              },
            },
          },
        },
      },
    }
    

    Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY, then models.providers.google.apiKey. webSearch.baseUrl is optional and exists for operator proxies or compatible Gemini API endpoints; when omitted, Gemini web search reuses models.providers.google.baseUrl. See Gemini search for the provider-specific tool behavior.

    Image generation

    The bundled google image-generation provider defaults to google/gemini-3.1-flash-image-preview.

    • Also supports google/gemini-3-pro-image-preview
    • Generate: up to 4 images per request
    • Edit mode: enabled, up to 5 input images
    • Geometry controls: size, aspectRatio, and resolution

    To use Google as the default image provider:

    {
      agents: {
        defaults: {
          imageGenerationModel: {
            primary: "google/gemini-3.1-flash-image-preview",
          },
        },
      },
    }
    

    Video generation

    The bundled google plugin also registers video generation through the shared video_generate tool.

    • Default video model: google/veo-3.1-fast-generate-preview
    • Modes: text-to-video, image-to-video, and single-video reference flows
    • Supports aspectRatio, resolution, and audio
    • Current duration clamp: 4 to 8 seconds

    To use Google as the default video provider:

    {
      agents: {
        defaults: {
          videoGenerationModel: {
            primary: "google/veo-3.1-fast-generate-preview",
          },
        },
      },
    }
    

    Music generation

    The bundled google plugin also registers music generation through the shared music_generate tool.

    • Default music model: google/lyria-3-clip-preview
    • Also supports google/lyria-3-pro-preview
    • Prompt controls: lyrics and instrumental
    • Output format: mp3 by default, plus wav on google/lyria-3-pro-preview
    • Reference inputs: up to 10 images
    • Session-backed runs detach through the shared task/status flow, including action: "status"

    To use Google as the default music provider:

    {
      agents: {
        defaults: {
          musicGenerationModel: {
            primary: "google/lyria-3-clip-preview",
          },
        },
      },
    }
    

    Text-to-speech

    The bundled google speech provider uses the Gemini API TTS path with gemini-3.1-flash-tts-preview.

    • Default voice: Kore
    • Auth: messages.tts.providers.google.apiKey, models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY
    • Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
    • Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with ffmpeg

    Google's batch Gemini TTS path returns generated audio in the completed generateContent response. For lowest-latency spoken conversations, use the Google realtime voice provider backed by the Gemini Live API instead of batch TTS.

    To use Google as the default TTS provider:

    {
      messages: {
        tts: {
          auto: "always",
          provider: "google",
          providers: {
            google: {
              model: "gemini-3.1-flash-tts-preview",
              voiceName: "Kore",
              audioProfile: "Speak professionally with a calm tone.",
            },
          },
        },
      },
    }
    

    Gemini API TTS uses natural-language prompting for style control. Set audioProfile to prepend a reusable style prompt before the spoken text. Set speakerName when your prompt text refers to a named speaker.

    Gemini API TTS also accepts expressive square-bracket audio tags in the text, such as [whispers] or [laughs]. To keep tags out of the visible chat reply while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]] block:

    Here is the clean reply text.
    
    [[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
    

    Realtime voice

    The bundled google plugin registers a realtime voice provider backed by the Gemini Live API for backend audio bridges such as Voice Call and Google Meet.

    Setting Config path Default
    Model plugins.entries.voice-call.config.realtime.providers.google.model gemini-2.5-flash-native-audio-preview-12-2025
    Voice ...google.voice Kore
    Temperature ...google.temperature (unset)
    VAD start sensitivity ...google.startSensitivity (unset)
    VAD end sensitivity ...google.endSensitivity (unset)
    Silence duration ...google.silenceDurationMs (unset)
    Activity handling ...google.activityHandling Google default, start-of-activity-interrupts
    Turn coverage ...google.turnCoverage Google default, only-activity
    Disable auto VAD ...google.automaticActivityDetectionDisabled false
    Session resumption ...google.sessionResumption true
    Context compression ...google.contextWindowCompression true
    API key ...google.apiKey Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY

    Example Voice Call realtime config:

    {
      plugins: {
        entries: {
          "voice-call": {
            enabled: true,
            config: {
              realtime: {
                enabled: true,
                provider: "google",
                providers: {
                  google: {
                    model: "gemini-2.5-flash-native-audio-preview-12-2025",
                    voice: "Kore",
                    activityHandling: "start-of-activity-interrupts",
                    turnCoverage: "only-activity",
                  },
                },
              },
            },
          },
        },
      },
    }
    

    For maintainer live verification, run OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts. The smoke also covers OpenAI backend/WebRTC paths; the Google leg mints the same constrained Live API token shape used by Control UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload, and waits for setupComplete.

    Advanced configuration

    Direct Gemini cache reuse

    For direct Gemini API runs (api: "google-generative-ai"), OpenClaw passes a configured cachedContent handle through to Gemini requests.

    • Configure per-model or global params with either cachedContent or legacy cached_content
    • If both are present, cachedContent wins
    • Example value: cachedContents/prebuilt-context
    • Gemini cache-hit usage is normalized into OpenClaw cacheRead from upstream cachedContentTokenCount
    {
      agents: {
        defaults: {
          models: {
            "google/gemini-2.5-pro": {
              params: {
                cachedContent: "cachedContents/prebuilt-context",
              },
            },
          },
        },
      },
    }
    
    Gemini CLI JSON usage notes

    When using the google-gemini-cli OAuth provider, OpenClaw normalizes the CLI JSON output as follows:

    • Reply text comes from the CLI JSON response field.
    • Usage falls back to stats when the CLI leaves usage empty.
    • stats.cached is normalized into OpenClaw cacheRead.
    • If stats.input is missing, OpenClaw derives input tokens from stats.input_tokens - stats.cached.
    Environment and daemon setup

    If the Gateway runs as a daemon (launchd/systemd), make sure GEMINI_API_KEY is available to that process (for example, in ~/.openclaw/.env or via env.shellEnv).