Providers
Google (Gemini)
The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.
- Provider:
google - Auth:
GEMINI_API_KEYorGOOGLE_API_KEY - API: Google Gemini API
- Runtime option: provider/model
agentRuntime.id: "google-gemini-cli"reuses Gemini CLI OAuth while keeping model refs canonical asgoogle/*.
Getting started
Choose your preferred auth method and follow the setup steps.
API key
Best for: standard Gemini API access through Google AI Studio.
Run onboarding
openclaw onboard --auth-choice gemini-api-key
Or pass the key directly:
openclaw onboard --non-interactive \
--mode local \
--auth-choice gemini-api-key \
--gemini-api-key "$GEMINI_API_KEY"
Set a default model
{
agents: {
defaults: {
model: { primary: "google/gemini-3.1-pro-preview" },
},
},
}
Verify the model is available
openclaw models list --provider google
Gemini CLI (OAuth)
Best for: reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.
Install the Gemini CLI
The local gemini command must be available on PATH.
# Homebrew
brew install gemini-cli
# or npm
npm install -g @google/gemini-cli
OpenClaw supports both Homebrew installs and global npm installs, including common Windows/npm layouts.
Log in via OAuth
openclaw models auth login --provider google-gemini-cli --set-default
Verify the model is available
openclaw models list --provider google
- Default model:
google/gemini-3.1-pro-preview - Runtime:
google-gemini-cli - Alias:
gemini-cli
Gemini 3.1 Pro's Gemini API model id is gemini-3.1-pro-preview. OpenClaw accepts the shorter google/gemini-3.1-pro as a convenience alias and normalizes it before provider calls.
Environment variables:
OPENCLAW_GEMINI_OAUTH_CLIENT_IDOPENCLAW_GEMINI_OAUTH_CLIENT_SECRET
(Or the GEMINI_CLI_* variants.)
google-gemini-cli/* model refs are legacy compatibility aliases. New
configs should use google/* model refs plus the google-gemini-cli
runtime when they want local Gemini CLI execution.
Capabilities
| Capability | Supported |
|---|---|
| Chat completions | Yes |
| Image generation | Yes |
| Music generation | Yes |
| Text-to-speech | Yes |
| Realtime voice | Yes (Google Live API) |
| Image understanding | Yes |
| Audio transcription | Yes |
| Video understanding | Yes |
| Web search (Grounding) | Yes |
| Thinking/reasoning | Yes (Gemini 2.5+ / Gemini 3+) |
| Gemma 4 models | Yes |
Web search
The bundled gemini web-search provider uses Gemini Google Search grounding.
Configure a dedicated search key under plugins.entries.google.config.webSearch,
or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:
{
plugins: {
entries: {
google: {
config: {
webSearch: {
apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
model: "gemini-2.5-flash",
},
},
},
},
},
}
Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY,
then models.providers.google.apiKey. webSearch.baseUrl is optional and
exists for operator proxies or compatible Gemini API endpoints; when omitted,
Gemini web search reuses models.providers.google.baseUrl. See
Gemini search for the provider-specific tool behavior.
Image generation
The bundled google image-generation provider defaults to
google/gemini-3.1-flash-image-preview.
- Also supports
google/gemini-3-pro-image-preview - Generate: up to 4 images per request
- Edit mode: enabled, up to 5 input images
- Geometry controls:
size,aspectRatio, andresolution
To use Google as the default image provider:
{
agents: {
defaults: {
imageGenerationModel: {
primary: "google/gemini-3.1-flash-image-preview",
},
},
},
}
Video generation
The bundled google plugin also registers video generation through the shared
video_generate tool.
- Default video model:
google/veo-3.1-fast-generate-preview - Modes: text-to-video, image-to-video, and single-video reference flows
- Supports
aspectRatio,resolution, andaudio - Current duration clamp: 4 to 8 seconds
To use Google as the default video provider:
{
agents: {
defaults: {
videoGenerationModel: {
primary: "google/veo-3.1-fast-generate-preview",
},
},
},
}
Music generation
The bundled google plugin also registers music generation through the shared
music_generate tool.
- Default music model:
google/lyria-3-clip-preview - Also supports
google/lyria-3-pro-preview - Prompt controls:
lyricsandinstrumental - Output format:
mp3by default, pluswavongoogle/lyria-3-pro-preview - Reference inputs: up to 10 images
- Session-backed runs detach through the shared task/status flow, including
action: "status"
To use Google as the default music provider:
{
agents: {
defaults: {
musicGenerationModel: {
primary: "google/lyria-3-clip-preview",
},
},
},
}
Text-to-speech
The bundled google speech provider uses the Gemini API TTS path with
gemini-3.1-flash-tts-preview.
- Default voice:
Kore - Auth:
messages.tts.providers.google.apiKey,models.providers.google.apiKey,GEMINI_API_KEY, orGOOGLE_API_KEY - Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
- Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with
ffmpeg
Google's batch Gemini TTS path returns generated audio in the completed
generateContent response. For lowest-latency spoken conversations, use the
Google realtime voice provider backed by the Gemini Live API instead of batch
TTS.
To use Google as the default TTS provider:
{
messages: {
tts: {
auto: "always",
provider: "google",
providers: {
google: {
model: "gemini-3.1-flash-tts-preview",
voiceName: "Kore",
audioProfile: "Speak professionally with a calm tone.",
},
},
},
},
}
Gemini API TTS uses natural-language prompting for style control. Set
audioProfile to prepend a reusable style prompt before the spoken text. Set
speakerName when your prompt text refers to a named speaker.
Gemini API TTS also accepts expressive square-bracket audio tags in the text,
such as [whispers] or [laughs]. To keep tags out of the visible chat reply
while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]]
block:
Here is the clean reply text.
[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
Realtime voice
The bundled google plugin registers a realtime voice provider backed by the
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
| Setting | Config path | Default |
|---|---|---|
| Model | plugins.entries.voice-call.config.realtime.providers.google.model |
gemini-2.5-flash-native-audio-preview-12-2025 |
| Voice | ...google.voice |
Kore |
| Temperature | ...google.temperature |
(unset) |
| VAD start sensitivity | ...google.startSensitivity |
(unset) |
| VAD end sensitivity | ...google.endSensitivity |
(unset) |
| Silence duration | ...google.silenceDurationMs |
(unset) |
| Activity handling | ...google.activityHandling |
Google default, start-of-activity-interrupts |
| Turn coverage | ...google.turnCoverage |
Google default, only-activity |
| Disable auto VAD | ...google.automaticActivityDetectionDisabled |
false |
| Session resumption | ...google.sessionResumption |
true |
| Context compression | ...google.contextWindowCompression |
true |
| API key | ...google.apiKey |
Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY |
Example Voice Call realtime config:
{
plugins: {
entries: {
"voice-call": {
enabled: true,
config: {
realtime: {
enabled: true,
provider: "google",
providers: {
google: {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
voice: "Kore",
activityHandling: "start-of-activity-interrupts",
turnCoverage: "only-activity",
},
},
},
},
},
},
},
}
For maintainer live verification, run
OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts.
The smoke also covers OpenAI backend/WebRTC paths; the Google leg mints the same
constrained Live API token shape used by Control UI Talk, opens the browser
WebSocket endpoint, sends the initial setup payload, and waits for
setupComplete.
Advanced configuration
Direct Gemini cache reuse
For direct Gemini API runs (api: "google-generative-ai"), OpenClaw
passes a configured cachedContent handle through to Gemini requests.
- Configure per-model or global params with either
cachedContentor legacycached_content - If both are present,
cachedContentwins - Example value:
cachedContents/prebuilt-context - Gemini cache-hit usage is normalized into OpenClaw
cacheReadfrom upstreamcachedContentTokenCount
{
agents: {
defaults: {
models: {
"google/gemini-2.5-pro": {
params: {
cachedContent: "cachedContents/prebuilt-context",
},
},
},
},
},
}
Gemini CLI JSON usage notes
When using the google-gemini-cli OAuth provider, OpenClaw normalizes
the CLI JSON output as follows:
- Reply text comes from the CLI JSON
responsefield. - Usage falls back to
statswhen the CLI leavesusageempty. stats.cachedis normalized into OpenClawcacheRead.- If
stats.inputis missing, OpenClaw derives input tokens fromstats.input_tokens - stats.cached.
Environment and daemon setup
If the Gateway runs as a daemon (launchd/systemd), make sure GEMINI_API_KEY
is available to that process (for example, in ~/.openclaw/.env or via
env.shellEnv).