Providers

ds4

ds4 serves DeepSeek V4 Flash from a local Metal backend with an OpenAI-compatible /v1 API. OpenClaw connects to ds4 through the generic openai-completions provider family.

ds4 is not a bundled OpenClaw provider plugin. Configure it under models.providers.ds4, then select ds4/deepseek-v4-flash.

  • Provider id: ds4
  • Plugin: none
  • API: OpenAI-compatible Chat Completions (openai-completions)
  • Suggested base URL: http://127.0.0.1:18000/v1
  • Model id: deepseek-v4-flash
  • Tool calls: supported through OpenAI-style tools and tool_calls
  • Reasoning: DeepSeek-style thinking and reasoning_effort

Requirements

  • macOS with Metal support.
  • A working ds4 checkout with ds4-server and the DeepSeek V4 Flash GGUF file.
  • Enough memory for the context you choose. Larger --ctx values allocate more KV memory when the server starts.

Quickstart

  • Start ds4-server

    Replace <DS4_DIR> with your ds4 checkout path.

    <DS4_DIR>/ds4-server \
      --model <DS4_DIR>/ds4flash.gguf \
      --host 127.0.0.1 \
      --port 18000 \
      --ctx 32768 \
      --tokens 128
    
  • Verify the OpenAI-compatible endpoint

    curl http://127.0.0.1:18000/v1/models
    

    The response should include deepseek-v4-flash.

  • Add the OpenClaw provider config

    Add the config from Full config, then run a one-shot model check:

    openclaw infer model run \
      --local \
      --model ds4/deepseek-v4-flash \
      --thinking off \
      --prompt "Reply with exactly: openclaw-ds4-ok" \
      --json
    
  • Full config

    Use this config when ds4 is already running on 127.0.0.1:18000.

    {
      agents: {
        defaults: {
          model: { primary: "ds4/deepseek-v4-flash" },
          models: {
            "ds4/deepseek-v4-flash": {
              alias: "DS4 local",
            },
          },
        },
      },
      models: {
        mode: "merge",
        providers: {
          ds4: {
            baseUrl: "http://127.0.0.1:18000/v1",
            apiKey: "ds4-local",
            api: "openai-completions",
            timeoutSeconds: 300,
            models: [
              {
                id: "deepseek-v4-flash",
                name: "DeepSeek V4 Flash (ds4)",
                reasoning: true,
                input: ["text"],
                cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
                contextWindow: 32768,
                maxTokens: 128,
                compat: {
                  supportsUsageInStreaming: true,
                  supportsReasoningEffort: true,
                  maxTokensField: "max_tokens",
                  supportsStrictMode: false,
                  thinkingFormat: "deepseek",
                  supportedReasoningEfforts: ["low", "medium", "high", "xhigh"],
                },
              },
            ],
          },
        },
      },
    }
    

    Keep contextWindow aligned with the ds4-server --ctx value. Keep maxTokens aligned with --tokens unless you intentionally want OpenClaw to request less output than the server default.

    On-demand startup

    OpenClaw can start ds4 only when a ds4/... model is selected. Add localService to the same provider entry:

    {
      models: {
        providers: {
          ds4: {
            baseUrl: "http://127.0.0.1:18000/v1",
            apiKey: "ds4-local",
            api: "openai-completions",
            timeoutSeconds: 300,
            localService: {
              command: "<DS4_DIR>/ds4-server",
              args: [
                "--model",
                "<DS4_DIR>/ds4flash.gguf",
                "--host",
                "127.0.0.1",
                "--port",
                "18000",
                "--ctx",
                "32768",
                "--tokens",
                "128",
              ],
              cwd: "<DS4_DIR>",
              healthUrl: "http://127.0.0.1:18000/v1/models",
              readyTimeoutMs: 300000,
              idleStopMs: 0,
            },
            models: [
              {
                id: "deepseek-v4-flash",
                name: "DeepSeek V4 Flash (ds4)",
                reasoning: true,
                input: ["text"],
                cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
                contextWindow: 32768,
                maxTokens: 128,
                compat: {
                  supportsUsageInStreaming: true,
                  supportsReasoningEffort: true,
                  maxTokensField: "max_tokens",
                  supportsStrictMode: false,
                  thinkingFormat: "deepseek",
                  supportedReasoningEfforts: ["low", "medium", "high", "xhigh"],
                },
              },
            ],
          },
        },
      },
    }
    

    command must be an absolute executable path. Shell lookup and ~ expansion are not used. See Local model services for every localService field.

    Think Max

    ds4 applies Think Max only when both conditions are true:

    • ds4-server starts with --ctx 393216 or higher.
    • The request uses reasoning_effort: "max" or the equivalent ds4 effort field.

    If you run that large context, update both the server flags and OpenClaw model metadata:

    {
      contextWindow: 393216,
      maxTokens: 384000,
      compat: {
        supportsUsageInStreaming: true,
        supportsReasoningEffort: true,
        maxTokensField: "max_tokens",
        supportsStrictMode: false,
        thinkingFormat: "deepseek",
        supportedReasoningEfforts: ["low", "medium", "high", "xhigh", "max"],
      },
    }
    

    Test

    Start with a direct HTTP check:

    curl http://127.0.0.1:18000/v1/chat/completions \
      -H 'content-type: application/json' \
      -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: ds4-ok"}],"max_tokens":16,"stream":false,"thinking":{"type":"disabled"}}'
    

    Then test OpenClaw model routing:

    openclaw infer model run \
      --local \
      --model ds4/deepseek-v4-flash \
      --thinking off \
      --prompt "Reply with exactly: openclaw-ds4-ok" \
      --json
    

    For a full agent and tool-call smoke, use a context of at least 32768:

    openclaw agent \
      --local \
      --session-id ds4-tool-smoke \
      --model ds4/deepseek-v4-flash \
      --thinking off \
      --message "Use the shell command pwd once, then reply exactly: tool-ok <output>" \
      --json \
      --timeout 240
    

    Expected result:

    • executionTrace.winnerProvider is ds4
    • executionTrace.winnerModel is deepseek-v4-flash
    • toolSummary.calls is at least 1
    • finalAssistantVisibleText starts with tool-ok

    Troubleshooting

    curl /v1/models cannot connect

    ds4 is not running or not bound to the host and port in baseUrl. Start ds4-server, then retry:

    curl http://127.0.0.1:18000/v1/models
    
    500 prompt exceeds context

    The configured --ctx is too small for the OpenClaw turn. Raise ds4-server --ctx, then update models.providers.ds4.models[].contextWindow to match. Full agent turns with tools need substantially more context than a direct one-message curl request.

    Think Max does not activate

    ds4 only uses Think Max when --ctx is at least 393216 and the request asks for reasoning_effort: "max". Smaller contexts fall back to high reasoning.

    The first request is slow

    ds4 has a cold Metal residency and model warmup phase. Use localService.readyTimeoutMs: 300000 when OpenClaw starts the server on demand.