POST /v1/chat/completions

Erzeugt eine Modell-Antwort auf eine Chat-Konversation.

curl https://sovrgpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $SOVR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-9b",
    "messages": [
      { "role": "system", "content": "Du bist ein hilfreicher Assistent." },
      { "role": "user",   "content": "Erkläre Diffusionsmodelle in zwei Sätzen." }
    ],
    "temperature": 0.5,
    "max_tokens": 400
  }'

Request-Body

Feld	Typ	Default	Beschreibung
`model`	string	–	Modell-ID aus `GET /v1/models`.
`messages`	array	–	Konversation. Roles: `system`, `user`, `assistant`, `tool`.
`temperature`	number	0.7	0.0 – 2.0. Höher = kreativer.
`top_p`	number	1.0	Nucleus-Sampling.
`max_tokens`	int	model-default	Maximale Antwort-Länge in Tokens.
`stream`	bool	`false`	True → SSE-Stream.
`stop`	string\|array	–	Stop-Sequenzen.
`tools`	array	–	OpenAI-Tool-Schema (Function-Calling).
`tool_choice`	string\|object	`"auto"`	`"none"`, `"auto"`, `"required"`, oder spezifisches Tool.
`response_format`	object	–	`{ "type": "json_object" }` für JSON-Modus.

Ignoriert (nicht crashend): seed, logit_bias, user, n (>1 nicht unterstützt), presence_penalty, frequency_penalty.

Antwort (non-streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1715520000,
  "model": "qwen3.5-9b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Diffusionsmodelle …"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 87,
    "total_tokens": 129
  }
}

Streaming (SSE)

Mit "stream": true oder Accept: text/event-stream:

data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Diff"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"usions"},"index":0}]}
…
data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]

Format ist mit OpenAI identisch — der OpenAI-SDK-stream: true-Modus funktioniert ohne Änderungen.

Vision-Input

Nur bei Modellen mit accepts_vision: true (vision-Tier, default-Tier ab 3.5):

{
  "model": "gemma-4-26b-a4b",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Was ist auf dem Bild?" },
        { "type": "image_url", "image_url": { "url": "https://…/foto.jpg" } }
      ]
    }
  ]
}

image_url.url kann eine Public-URL oder ein Base64-Data-URL sein (data:image/png;base64,…).

Function-Calling / Tools

Vollständig OpenAI-kompatibel. Beispiel:

{
  "model": "qwen3.5-9b",
  "messages": [{ "role": "user", "content": "Was kostet ein Bitcoin?" }],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_price",
      "description": "Holt aktuellen Preis",
      "parameters": {
        "type": "object",
        "properties": { "symbol": { "type": "string" } },
        "required": ["symbol"]
      }
    }
  }]
}

Die Antwort enthält tool_calls analog zu OpenAI. Der Client führt die Funktion aus und schickt das Ergebnis als role: "tool"-Message zurück.

Fehler & Reasoning

Bei reasoning-Tier-Modellen enthält die Antwort zusätzlich <think>…</think>- Blöcke vor der finalen Antwort. UIs können das ein- oder ausblenden — das SovrGPT-UI klappt sie standardmäßig zu.