mGPUmanager

Files

mAi 468317e395 fix(scheduler): mark lazy consumers (Unload but no Load) as not-loaded at startup

Live deploy on mRock surfaced a Schritt 5 bug: comfyui was always
treated as preloaded at scheduler startup, which made ensureFits()
short-circuit on the very first /v1/image request — exactly the
scenario eviction is supposed to handle. mvoice was never picked as
a victim, ComfyUI then OOM'd loading FLUX on top of the still-resident
mvoice.

Fix: replace the blanket 'every consumer starts loaded' init with a
heuristic — initialLoaded(cons):

  - VRAMManaged (ollama): true. We never track/evict it; the consumer
    runs its own LRU.
  - Load+Unload both present (mvoice): true. Designed to be controllable;
    typically preloads in its own lifespan.
  - Unload only, no Load (comfyui): false. Lazy — FLUX isn't resident
    until the first /prompt, so we shouldn't bill its 13 GiB against the
    GPU budget until then.
  - SystemdUnit only (whisper-server): true. Always-on, model loaded at
    process start.
  - Empty: true. Safe fallback.

Verified live on mRock (2026-05-15):

  Before /v1/image:  nvidia-smi 8963 MiB used; mvoice gpu_resident_mib 2345
  POST /v1/image:    HTTP 400 from upstream (empty workflow), broker did
                     trigger eviction before forwarding
  After:             nvidia-smi 6547 MiB used; mvoice gpu_resident_mib 9
                     (~CUDA context only); scheduler.evictions = 2
  POST /v1/tts:      audio_url returned, tts_ms 670, audio 3.5 s
  After reload:      nvidia-smi 8943 MiB used; mvoice gpu_resident_mib 2917

Test: TestInitialLoadedHeuristic pins the four cases down so this
doesn't regress when someone adds a fifth consumer type.

Refs: m/mGPUmanager#1 (live deploy).

2026-05-15 16:54:11 +02:00

config

feat: Schritt 4 — Locked scheduler (global GPU lock, queue, stats)

2026-05-11 13:33:39 +02:00

gpu

feat: Schritt 5 — VRAM-pressure eviction + coexistence groups

2026-05-11 13:37:03 +02:00

registry

feat: Schritt 2 — mGPUmanager MVP routing + /v1/status

2026-05-11 13:30:17 +02:00

scheduler

fix(scheduler): mark lazy consumers (Unload but no Load) as not-loaded at startup

2026-05-15 16:54:11 +02:00

server

feat: Schritt 4 — Locked scheduler (global GPU lock, queue, stats)

2026-05-11 13:33:39 +02:00