ImaGen #3: Replicate API backend (FLUX hosted) + cost-tracking #3
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Implement the Replicate API backend — second real adapter for ImaGen. Cloud-hosted FLUX (and other models) via Replicate's REST API, used when local mRock isn't reachable, when m wants higher-quality FLUX dev, or when otto agents need images without GPU latency dependency.
Prerequisite: ImaGen#1 (bootstrap + Backend interface) must be merged.
Why Replicate
POST /predictions+ polling).Scope
1. Go adapter (
internal/backend/replicate.go)Implements the Backend interface:
api_token_env(env var name for the API token, defaultREPLICATE_API_TOKEN),model(e.g.black-forest-labs/flux-schnell),default_steps,default_aspect_ratio.POST https://api.replicate.com/v1/predictionswith{"version": "<model-version-hash>", "input": {...}}.GET /v1/predictions/{id}every 500ms until status issucceededorfailed(timeout 60s for schnell, 120s for dev).{model, model_version, seed_used, predict_time_seconds, cost_usd_estimate}.2. Cost-tracking
mai.imagen_usage(Supabase, mai schema) with:created_at,backend,model,seed,prompt_hash(sha256, NOT the prompt itself),latency_ms,cost_usd_estimate,caller(otto/head, mai/, etc., resolved from MAI_FROM_ID or pane @mai-name like the maimcp identity logic).internal/backend/replicate_pricing.gowith a comment noting the source URL and a TODO to refresh on schedule.imagen usage --since 2026-05-01lists rows with running totals. Useful for m to see weekly spend.3. Migration for the usage table
Create
mai.imagen_usagevia supabase migration in this issue:Migration filename:
<timestamp>_imagen_usage.sqlin~/dev/mAi/db/migrations/(or whichever directory mAi uses for cross-project migrations — check~/.m/docs/msystem.mdif unsure).4. Smoke test
5. Resilience
Acceptance criteria
imagen backendsshowsflux-schnell-replicate: okwhenREPLICATE_API_TOKENis set,not configuredotherwise.mai.imagen_usagewith non-nullcost_usd_estimate.imagen usage --since YYYY-MM-DDoutputs a clean table with totals.internal/backend/replicate.gohas unit tests with httptest server (no real Replicate calls in CI).mcp__supabase__execute_sql.Out of scope
Refs
black-forest-labs/flux-schnellreplicate_pricing.gowith a comment)Workflow
Coder role. Blocked on #1. When #1 lands, m or otto/head assigns mAi here.
Phase 1 status — built, committed, blocked on smoke
Branch:
mai/hermes/issue-3-imagen-3Commit: b282325
Build + tests: clean
Done
internal/backend/replicate.go. Supports bothowner/name(uses/v1/models/{owner}/{name}/predictions) andowner/name:hash(uses/v1/predictionswith explicit version). Polls/v1/predictions/{id}every 500 ms, model-aware timeout (60 s schnell / 120 s dev). Resilience: 401 names the env var, 429 with exponential backoff up to 3 retries (honoursRetry-After), 5xx retries once, image download retries once on transient failure.internal/backend/replicate_pricing.go. Hard-coded per-image USD for known FLUX models, snapshot date 2026-05-08, source URL + refresh-TODO comment.internal/usage/usage.go. Supabase REST sink (PostgREST +Accept-Profile: mai). DB write failure is a warning, image still lands.mai.imagen_usage(id, created_at, backend, model, seed, prompt_hash, latency_ms, cost_usd_estimate, caller) + indexes on (created_at DESC) and (caller). Grants formai,service_role. Verified via REST round-trip insert/delete. The raw prompt is never stored — onlysha256(prompt).imagen usageCLI —cmd/imagen/usage.go. Default groups by week + backend + model + caller with totals;--rawfor one-row-per-call view;--since YYYY-MM-DDfilter.imagen backends— instances oftype=replicatenow reportokwhen the token is set,not configured (set REPLICATE_API_TOKEN)otherwise. Verified.flux-schnell-replicate(default_steps: 4) and keepsflux-dev-replicate(default_steps: 28);default_backendstaysflux-schnell-local.internal/backend/replicate_test.go, all green: happy path (model + version-pinned), 401 (names env var), 429 retry policy + max-retry give-up, failed prediction surfacing API error, poll timeout with partial latency for diagnostics, image-download retry-then-fail, ctx cancel,BackendOptspassthrough,default_stepsapplied, aspect-ratio reduction,parseModelRef,hashPromptstability, pricing lookup, sink-failure-is-warning. ~3 s total.Caller identity resolves from
MAI_FROM_ID, then the tmux pane's@mai-nameoption.Blocked
REPLICATE_API_TOKENis not present in m's env. Searched$env,~/.dotfiles/.env.age,fish_variables,~/.config/fish/conf.d,~/.config/imagen.yaml. Sent delegation to head — needs either the token (then I run the single FLUX schnell smoke ~$0.003) or approval to ship without the live smoke. Mocked-HTTP tests cover the API path mechanically; AC #2 (real PNG + non-nullcost_usd_estimaterow) is the only criterion that requires the real call.Acceptance criteria status
imagen backendsok/not-configured switching: verified locally.imagen usage --sincetable: built, will run end-to-end with the smoke row.mcp__supabase__execute_sql.Merged into main (code-complete; AC #2 smoke pending m's Replicate token)
Branch
mai/hermes/issue-3-imagen-3merged via--no-ff. Pushed to origin/main.b282325What landed (1,710 lines, 10 files)
internal/backend/replicate.go(567 lines) - Backend interface impl:POST /v1/predictionswith version hash + input, pollingGET /v1/predictions/{id}every 500ms, image download, retry on 429 with exponential backoff, ONE retry on transient image-download 5xx, clean errors for 401 (namesapi_token_env), 4xx (no retry), predictionfailed, prediction-timeout (60s schnell, 120s dev). ReturnsResultwith PNG bytes +Metadatacarrying model / model_version / seed_used / predict_time_seconds / cost_usd_estimate.internal/backend/replicate_test.go(675 lines) - mocked-HTTP unit tests covering all paths above. Zero real API calls.internal/backend/replicate_pricing.go(42 lines) - hard-coded current rates per model with source URL + refresh-TODO comment.internal/usage/usage.go(160 lines) - Supabase writer formai.imagen_usage. Best-effort: DB-write failure logs a warning, image still writes / exit code 0. Prompt is stored assha256(prompt)only - never the raw prompt.cmd/imagen/usage.go(189 lines) - newimagen usage [--since DATE]subcommand: groups by backend / model / caller / week, prints a clean table with running totals.cmd/imagen/main.go,cmd/imagen/backends.go,cmd/imagen/generate.go- anonymous-import + backends listing + cost-tracking hook in the generate path.internal/config/config.go- config-sample additions:flux-schnell-replicateandflux-dev-replicateblocks.default_backend: flux-schnell-localstays unchanged.docs/usage.md- documents the new flow.Supabase migration
Applied to dev Supabase:
mai.imagen_usagetable per the spec exactly (id / created_at / backend / model / seed / prompt_hash / latency_ms / cost_usd_estimate / caller, with indexes on created_at DESC and caller).Acceptance criteria
imagen backendsshowsflux-schnell-replicate: okwhen token set,not configuredotherwisecost_usd_estimaterow inmai.imagen_usageREPLICATE_API_TOKEN. See note below.imagen usage --since YYYY-MM-DDoutputs a clean table with totalsmcp__supabase__execute_sqlNote on AC #2
The real-API smoke was the one step that needed
REPLICATE_API_TOKEN. Head's earlier briefing said the token was in m's env; on a clean check it wasn't there (not in the live env, not in~/.dotfiles/.env.age). Per house rule on credentials/spend escalations, head surfaced to m. m said "go on" - merging code-complete with the AC #2 gap explicitly noted.When m drops the token into
~/.dotfiles/.env.age(or sets it in the active shell), the smoke is a one-liner:Follow-up: if the smoke surfaces something the mocked tests didn't catch, it's a one-shot fix issue, not a re-do.