Files
mGPUmanager/internal/registry/parse.go
mAi c81c145163 feat: Schritt 2 — mGPUmanager MVP routing + /v1/status
Go daemon listening on :8770 that fronts mvoice (8766), whisper-server
(8178), ollama (11434), comfyui (8188) behind a single /v1 façade.

What this MVP does:
- Loads config/consumers.yaml: routing table, per-consumer URL + health +
  paths + vram_resident_mib + can_coexist_with + load/unload routes.
- Background health probe (5s) on every consumer; refuses fast with a
  structured 503 if the last probe failed (no Felix-Banholzer-style
  silent fallback).
- POST /v1/{tts,stt,llm,image} proxies the request body + Content-Type
  to the routed consumer's path and streams the response back.
- GET /audio/* proxies to audio_proxy consumer (wa.sh fetches its WAV
  this way).
- GET /v1/status exposes live GPU sample (nvidia-smi every 2s),
  per-consumer health/loaded/gpu_resident_mib/active/total_requests,
  scheduler stats.
- GET /healthz, GET / — broker liveness.

The Scheduler interface is in place but the implementation is
'Passthrough' — every job runs immediately, no lock, no queue. Schritt 4
replaces it with a serialising mutex; Schritt 5 adds VRAM-pressure
eviction. The interface boundary means server.go stays unchanged.

Out of scope here:
- Schritt 3: wa.sh migration (parallel work in mAi).
- Schritt 4: queue + global GPU lock.
- Schritt 5: nvidia-smi-driven LRU eviction.

Tests: config validation (good/bad), proxy forwards body, audio proxy
streams bytes, unhealthy consumer returns 503, /v1/status JSON shape.

Refs: m/mGPUmanager#1
2026-05-11 13:30:17 +02:00

36 lines
1.0 KiB
Go

package registry
import (
"encoding/json"
)
// parseGPUFields best-effort parses a consumer's health response body to pull
// out gpu_resident_mib (mvoice exposes this since the Schritt 1 patch) and a
// 'loaded' boolean. Returns the previous loaded value if the field is absent
// (i.e. the consumer doesn't report it — assume always loaded).
func parseGPUFields(body []byte, prevLoaded bool) (mib int, loaded bool) {
var parsed struct {
GPUResidentMiB *int `json:"gpu_resident_mib"`
Loaded *bool `json:"loaded"`
}
if err := json.Unmarshal(body, &parsed); err != nil {
// Non-JSON health response (e.g. whisper-server '/' returns HTML). Treat
// 200 OK as healthy + loaded, no VRAM info.
return 0, true
}
if parsed.GPUResidentMiB != nil {
mib = *parsed.GPUResidentMiB
}
if parsed.Loaded != nil {
loaded = *parsed.Loaded
} else {
// Field absent — preserve prior state, default true on first probe.
if !prevLoaded {
loaded = true
} else {
loaded = prevLoaded
}
}
return mib, loaded
}