m/ImaGen

Files

mAi a24ac2826f mAi: #2 - phase 1 PoC: ComfyUI on mRock + first FLUX schnell image

Native systemd install (matches Ollama pattern on Arch — Docker on mRock
has no nvidia runtime; native venv via uv is the lighter path). The
Black-Forest-Labs FLUX.1-schnell HF repo is gated, so the download script
points at ungated mirrors (Comfy-Org/flux1-schnell + sirorable/flux-ae-vae)
that ship the same Apache-2.0 weights.

First image — cat in a fishbowl, 1024x1024, 4 steps — generated end-to-end
in 9.79s via curl + workflow JSON; stored at
/home/m/dev/ImaGen/poc/first-image.png on mRiver (not committed; transient
PoC artefact). Go adapter is phase 2.

2026-05-08 16:50:16 +02:00

6.3 KiB

Raw Permalink Blame History

ComfyUI on mRock — install + ops

ImaGen's flux-schnell-local backend talks to ComfyUI on mRock at http://mrock:8188 (Tailscale-internal). This document is the reproducible install path from a clean mRock state.

mRock runs Arch Linux + systemd with an NVIDIA RTX 4070 Ti SUPER (16 GB VRAM). Ollama is already a native systemd service, so ComfyUI follows the same pattern (native Python venv + systemd unit) instead of Docker — Docker on mRock has no nvidia runtime configured, and adding one is more invasive than another systemd unit.

Prerequisites on mRock

Python via uv (already installed).
NVIDIA driver new enough for CUDA 12.4. nvidia-smi --query-gpu=driver_version should show >= 550. Driver 595 is what mRock has today.
~35 GB free on /home for the model files.
ollama.service running on port 11434 — coexistence notes below.

1. Clone ComfyUI + Python venv

mkdir -p ~/dev && cd ~/dev
git clone --depth 1 https://github.com/comfyanonymous/ComfyUI.git comfyui
cd comfyui
uv venv --python 3.12 .venv
source .venv/bin/activate.fish

# PyTorch CUDA 12.4 wheels — match the system driver
uv pip install --no-cache torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu124

uv pip install --no-cache -r requirements.txt

Verify CUDA is wired up:

.venv/bin/python -c \
  "import torch; print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# expected: 2.6.0+cu124 True NVIDIA GeForce RTX 4070 Ti SUPER

2. Models — FLUX.1 schnell

The Black-Forest-Labs primary repo (black-forest-labs/FLUX.1-schnell) is gated — curl against it without an HF token returns HTTP 401. We pull the weights from ungated mirrors of the same Apache-2.0 release.

File	Where it goes	Source
`flux1-schnell.safetensors` (~23.8 GB, fp16)	`models/unet/`	`Comfy-Org/flux1-schnell`
`ae.safetensors` (~335 MB)	`models/vae/`	`sirorable/flux-ae-vae`
`clip_l.safetensors` (~246 MB)	`models/clip/`	`comfyanonymous/flux_text_encoders`
`t5xxl_fp8_e4m3fn.safetensors` (~4.9 GB)	`models/clip/`	`comfyanonymous/flux_text_encoders`

cd ~/dev/comfyui/models

curl -L -o unet/flux1-schnell.safetensors \
  https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell.safetensors
curl -L -o vae/ae.safetensors \
  https://huggingface.co/sirorable/flux-ae-vae/resolve/main/ae.safetensors
curl -L -o clip/clip_l.safetensors \
  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
curl -L -o clip/t5xxl_fp8_e4m3fn.safetensors \
  https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors

If a new HF token is configured later (~/.cache/huggingface/token), the official black-forest-labs/FLUX.1-schnell URL is byte-identical and can be swapped in.

3. systemd unit

Drop /etc/systemd/system/comfyui.service:

[Unit]
Description=ComfyUI image generation server
Documentation=https://github.com/comfyanonymous/ComfyUI
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=m
Group=m
WorkingDirectory=/home/m/dev/comfyui
ExecStart=/home/m/dev/comfyui/.venv/bin/python /home/m/dev/comfyui/main.py \
    --listen 0.0.0.0 --port 8188 \
    --output-directory /home/m/dev/comfyui/output \
    --temp-directory /home/m/dev/comfyui/temp
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
NoNewPrivileges=true
PrivateTmp=true
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

Then:

sudo systemctl daemon-reload
sudo systemctl enable --now comfyui.service
systemctl status comfyui.service

The service binds 0.0.0.0:8188. Tailscale's wireguard fence is the only auth — do not expose port 8188 to the public internet.

4. Health check

curl -fsS --max-time 5 http://mrock:8188/system_stats | jq '.devices[0]'
# expected: name "cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER ...", vram_total ~16 GB

imagen backends (from a host with the ImaGen CLI installed) should also report flux-schnell-local: ok.

5. VRAM coexistence with Ollama

mRock has 16 GB VRAM total. Ollama parks ~8 GB resident for its current model. FLUX schnell at fp16 weights with weight_dtype=fp8_e4m3fn (the default the adapter requests) needs roughly 10–12 GB peak for a 1024×1024 generation, so concurrent Ollama + FLUX on mRock will OOM.

Two practical options:

Stop Ollama before generating — sudo systemctl stop ollama frees the GPU, run the generation, sudo systemctl start ollama afterwards. Adequate while we don't have many concurrent users.
Move Ollama off mRock — when ImaGen is in regular use, push Ollama to another host so the GPU is dedicated. Tracked separately.

Both decisions live with whoever operates the box; the adapter does not try to manage Ollama.

6. Smoke test (direct, without the imagen CLI)

# 1) Submit a workflow
curl -fsS --max-time 30 -X POST -H 'Content-Type: application/json' \
     -d @flux-schnell-workflow.json \
     http://mrock:8188/prompt
# returns: {"prompt_id": "...", "number": ..., "node_errors": {}}

# 2) Poll history until the prompt completes
PID=...   # from above
until curl -fsS http://mrock:8188/history/$PID | jq -e ".\"$PID\".status.completed == true" >/dev/null; do
  sleep 1
done

# 3) Pull the image
NAME=$(curl -fsS http://mrock:8188/history/$PID \
       | jq -r ".\"$PID\".outputs[\"9\"].images[0].filename")
curl -fsS "http://mrock:8188/view?filename=$NAME&type=output" -o /tmp/cat.png
file /tmp/cat.png       # PNG image data, 1024 x 1024

The full ImaGen smoke test is in usage.md once the Go adapter ships.

Troubleshooting

vram_free < 6 GB in /system_stats: another GPU process is holding memory. Usually Ollama (sudo systemctl stop ollama).
Workflow returns node_errors with Required input is missing for CLIPLoader: text encoder filenames don't match step 2 — check that clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors are in models/clip/, not models/text_encoders/.
Access to model … is restricted during a model pull: the script is hitting a gated mirror. Use the ungated URLs from step 2.
Service won't start: check journalctl -u comfyui --since '5 min ago'. Common cause is a stale pip install — re-run step 1.

6.3 KiB Raw Permalink Blame History Unescape Escape