feat: Schritt 2 — mGPUmanager MVP routing + /v1/status

Go daemon listening on :8770 that fronts mvoice (8766), whisper-server
(8178), ollama (11434), comfyui (8188) behind a single /v1 façade.

What this MVP does:
- Loads config/consumers.yaml: routing table, per-consumer URL + health +
  paths + vram_resident_mib + can_coexist_with + load/unload routes.
- Background health probe (5s) on every consumer; refuses fast with a
  structured 503 if the last probe failed (no Felix-Banholzer-style
  silent fallback).
- POST /v1/{tts,stt,llm,image} proxies the request body + Content-Type
  to the routed consumer's path and streams the response back.
- GET /audio/* proxies to audio_proxy consumer (wa.sh fetches its WAV
  this way).
- GET /v1/status exposes live GPU sample (nvidia-smi every 2s),
  per-consumer health/loaded/gpu_resident_mib/active/total_requests,
  scheduler stats.
- GET /healthz, GET / — broker liveness.

The Scheduler interface is in place but the implementation is
'Passthrough' — every job runs immediately, no lock, no queue. Schritt 4
replaces it with a serialising mutex; Schritt 5 adds VRAM-pressure
eviction. The interface boundary means server.go stays unchanged.

Out of scope here:
- Schritt 3: wa.sh migration (parallel work in mAi).
- Schritt 4: queue + global GPU lock.
- Schritt 5: nvidia-smi-driven LRU eviction.

Tests: config validation (good/bad), proxy forwards body, audio proxy
streams bytes, unhealthy consumer returns 503, /v1/status JSON shape.

Refs: m/mGPUmanager#1
This commit is contained in:
mAi
2026-05-11 13:30:05 +02:00
parent b31b6f6580
commit c81c145163
16 changed files with 1701 additions and 1 deletions

35
Makefile Normal file
View File

@@ -0,0 +1,35 @@
# mGPUmanager build + deploy targets.
#
# `make build` — compile the Go binary into ./bin/mgpumanager.
# `make test` — go test ./...
# `make run` — run locally against ./config/consumers.yaml.
# `make deploy` — rsync binary + config + systemd unit to mRock,
# reload systemd, restart the service.
BIN := bin/mgpumanager
PKG := ./cmd/mgpumanager
GO ?= go
HOST ?= mrock
REMOTE_DIR ?= /home/m/dev/mGPUmanager
.PHONY: build test run deploy clean
build:
mkdir -p bin
$(GO) build -trimpath -ldflags="-s -w" -o $(BIN) $(PKG)
test:
$(GO) test ./...
run: build
./$(BIN) --config config/consumers.yaml --log-level debug
deploy: build
rsync -a --mkpath $(BIN) $(HOST):$(REMOTE_DIR)/$(BIN)
rsync -a --mkpath config/consumers.yaml $(HOST):$(REMOTE_DIR)/config/consumers.yaml
rsync -a --mkpath systemd/mgpumanager.service $(HOST):$(REMOTE_DIR)/systemd/mgpumanager.service
ssh $(HOST) "sudo cp $(REMOTE_DIR)/systemd/mgpumanager.service /etc/systemd/system/mgpumanager.service && sudo systemctl daemon-reload && sudo systemctl enable mgpumanager.service && sudo systemctl restart mgpumanager.service && sudo systemctl status mgpumanager.service --no-pager -l"
clean:
rm -rf bin