ImaGen #3: Replicate API backend (FLUX hosted) + cost-tracking #3

Open
opened 2026-05-08 12:30:05 +00:00 by mAi · 2 comments
Collaborator

Goal

Implement the Replicate API backend — second real adapter for ImaGen. Cloud-hosted FLUX (and other models) via Replicate's REST API, used when local mRock isn't reachable, when m wants higher-quality FLUX dev, or when otto agents need images without GPU latency dependency.

Prerequisite: ImaGen#1 (bootstrap + Backend interface) must be merged.

Why Replicate

  • Cheapest cloud FLUX surface in 2026: ~$0.003/img for FLUX schnell, ~$0.025/img for FLUX dev.
  • Stable REST API (POST /predictions + polling).
  • Same FLUX model family as the local backend → identical prompting style → easy A/B between local and cloud.
  • Pay-per-second billing — no monthly minimum, fits an experimental setup.

Scope

1. Go adapter (internal/backend/replicate.go)

Implements the Backend interface:

  • Config block accepts: api_token_env (env var name for the API token, default REPLICATE_API_TOKEN), model (e.g. black-forest-labs/flux-schnell), default_steps, default_aspect_ratio.
  • Submits POST https://api.replicate.com/v1/predictions with {"version": "<model-version-hash>", "input": {...}}.
  • Polls GET /v1/predictions/{id} every 500ms until status is succeeded or failed (timeout 60s for schnell, 120s for dev).
  • Downloads the image from the URL Replicate returns.
  • Returns Result with PNG bytes + Metadata: {model, model_version, seed_used, predict_time_seconds, cost_usd_estimate}.

2. Cost-tracking

  • After every successful generation, write a row to mai.imagen_usage (Supabase, mai schema) with: created_at, backend, model, seed, prompt_hash (sha256, NOT the prompt itself), latency_ms, cost_usd_estimate, caller (otto/head, mai/, etc., resolved from MAI_FROM_ID or pane @mai-name like the maimcp identity logic).
  • Cost-estimate is best-effort — Replicate's per-second billing depends on model. Hard-code current rates per model in internal/backend/replicate_pricing.go with a comment noting the source URL and a TODO to refresh on schedule.
  • imagen usage --since 2026-05-01 lists rows with running totals. Useful for m to see weekly spend.

3. Migration for the usage table

Create mai.imagen_usage via supabase migration in this issue:

CREATE TABLE mai.imagen_usage (
    id           bigserial PRIMARY KEY,
    created_at   timestamptz NOT NULL DEFAULT now(),
    backend      text NOT NULL,
    model        text NOT NULL,
    seed         bigint,
    prompt_hash  text NOT NULL,
    latency_ms   integer,
    cost_usd_estimate numeric(10,4),
    caller       text
);
CREATE INDEX ON mai.imagen_usage (created_at DESC);
CREATE INDEX ON mai.imagen_usage (caller);

Migration filename: <timestamp>_imagen_usage.sql in ~/dev/mAi/db/migrations/ (or whichever directory mAi uses for cross-project migrations — check ~/.m/docs/msystem.md if unsure).

4. Smoke test

export REPLICATE_API_TOKEN=...   # m provides
imagen generate "a small fishbowl with a cat staring out, photo, soft light" \
  --backend flux-schnell-replicate \
  --size 1024x1024 \
  --output /tmp/cat-replicate.png

# Compare to mRock output from #2 — same prompt, two backends, side-by-side
imagen usage --since 2026-05-08

5. Resilience

  • HTTP 401 → "Replicate API token missing or invalid; export REPLICATE_API_TOKEN".
  • HTTP 429 → exponential backoff with 3 retries.
  • Prediction timeout (60s/120s) → fail with the partial latency for diagnostics.
  • Network blip during image download → ONE retry.

Acceptance criteria

  1. imagen backends shows flux-schnell-replicate: ok when REPLICATE_API_TOKEN is set, not configured otherwise.
  2. Smoke test from §4 produces a real PNG and a row in mai.imagen_usage with non-null cost_usd_estimate.
  3. imagen usage --since YYYY-MM-DD outputs a clean table with totals.
  4. internal/backend/replicate.go has unit tests with httptest server (no real Replicate calls in CI).
  5. Migration applied cleanly to the dev Supabase, table accessible via mcp__supabase__execute_sql.

Out of scope

  • Other Replicate models (SDXL, Recraft, …) — easy follow-up issues once the adapter pattern is set.
  • Image-to-image / inpainting — v0 is text-to-image only.
  • Webhook delivery (Replicate supports it) — polling is fine for v0 and avoids the public-webhook setup overhead.
  • Spending alerts / hard caps — separate issue once we have usage data to set thresholds.

Refs

Workflow

Coder role. Blocked on #1. When #1 lands, m or otto/head assigns mAi here.

## Goal Implement the **Replicate API backend** — second real adapter for ImaGen. Cloud-hosted FLUX (and other models) via Replicate's REST API, used when local mRock isn't reachable, when m wants higher-quality FLUX dev, or when otto agents need images without GPU latency dependency. Prerequisite: ImaGen#1 (bootstrap + Backend interface) must be merged. ## Why Replicate - Cheapest cloud FLUX surface in 2026: ~$0.003/img for FLUX schnell, ~$0.025/img for FLUX dev. - Stable REST API (`POST /predictions` + polling). - Same FLUX model family as the local backend → identical prompting style → easy A/B between local and cloud. - Pay-per-second billing — no monthly minimum, fits an experimental setup. ## Scope ### 1. Go adapter (`internal/backend/replicate.go`) Implements the Backend interface: - Config block accepts: `api_token_env` (env var name for the API token, default `REPLICATE_API_TOKEN`), `model` (e.g. `black-forest-labs/flux-schnell`), `default_steps`, `default_aspect_ratio`. - Submits `POST https://api.replicate.com/v1/predictions` with `{"version": "<model-version-hash>", "input": {...}}`. - Polls `GET /v1/predictions/{id}` every 500ms until status is `succeeded` or `failed` (timeout 60s for schnell, 120s for dev). - Downloads the image from the URL Replicate returns. - Returns Result with PNG bytes + Metadata: `{model, model_version, seed_used, predict_time_seconds, cost_usd_estimate}`. ### 2. Cost-tracking - After every successful generation, write a row to `mai.imagen_usage` (Supabase, mai schema) with: `created_at`, `backend`, `model`, `seed`, `prompt_hash` (sha256, NOT the prompt itself), `latency_ms`, `cost_usd_estimate`, `caller` (otto/head, mai/<worker>, etc., resolved from MAI_FROM_ID or pane @mai-name like the maimcp identity logic). - Cost-estimate is best-effort — Replicate's per-second billing depends on model. Hard-code current rates per model in `internal/backend/replicate_pricing.go` with a comment noting the source URL and a TODO to refresh on schedule. - `imagen usage --since 2026-05-01` lists rows with running totals. Useful for m to see weekly spend. ### 3. Migration for the usage table Create `mai.imagen_usage` via supabase migration in this issue: ```sql CREATE TABLE mai.imagen_usage ( id bigserial PRIMARY KEY, created_at timestamptz NOT NULL DEFAULT now(), backend text NOT NULL, model text NOT NULL, seed bigint, prompt_hash text NOT NULL, latency_ms integer, cost_usd_estimate numeric(10,4), caller text ); CREATE INDEX ON mai.imagen_usage (created_at DESC); CREATE INDEX ON mai.imagen_usage (caller); ``` Migration filename: `<timestamp>_imagen_usage.sql` in `~/dev/mAi/db/migrations/` (or whichever directory mAi uses for cross-project migrations — check `~/.m/docs/msystem.md` if unsure). ### 4. Smoke test ```bash export REPLICATE_API_TOKEN=... # m provides imagen generate "a small fishbowl with a cat staring out, photo, soft light" \ --backend flux-schnell-replicate \ --size 1024x1024 \ --output /tmp/cat-replicate.png # Compare to mRock output from #2 — same prompt, two backends, side-by-side imagen usage --since 2026-05-08 ``` ### 5. Resilience - HTTP 401 → "Replicate API token missing or invalid; export REPLICATE_API_TOKEN". - HTTP 429 → exponential backoff with 3 retries. - Prediction timeout (60s/120s) → fail with the partial latency for diagnostics. - Network blip during image download → ONE retry. ## Acceptance criteria 1. `imagen backends` shows `flux-schnell-replicate: ok` when `REPLICATE_API_TOKEN` is set, `not configured` otherwise. 2. Smoke test from §4 produces a real PNG and a row in `mai.imagen_usage` with non-null `cost_usd_estimate`. 3. `imagen usage --since YYYY-MM-DD` outputs a clean table with totals. 4. `internal/backend/replicate.go` has unit tests with httptest server (no real Replicate calls in CI). 5. Migration applied cleanly to the dev Supabase, table accessible via `mcp__supabase__execute_sql`. ## Out of scope - Other Replicate models (SDXL, Recraft, …) — easy follow-up issues once the adapter pattern is set. - Image-to-image / inpainting — v0 is text-to-image only. - Webhook delivery (Replicate supports it) — polling is fine for v0 and avoids the public-webhook setup overhead. - Spending alerts / hard caps — separate issue once we have usage data to set thresholds. ## Refs - ImaGen bootstrap: ImaGen#1 — depends-on - Replicate API docs: https://replicate.com/docs/reference/http - FLUX schnell on Replicate: `black-forest-labs/flux-schnell` - Pricing: https://replicate.com/pricing (snapshot the current rates in `replicate_pricing.go` with a comment) ## Workflow Coder role. **Blocked on #1.** When #1 lands, m or otto/head assigns mAi here.
mAi self-assigned this 2026-05-08 15:16:28 +00:00
Author
Collaborator

Phase 1 status — built, committed, blocked on smoke

Branch: mai/hermes/issue-3-imagen-3
Commit: b282325
Build + tests: clean

Done

  1. Replicate adapterinternal/backend/replicate.go. Supports both owner/name (uses /v1/models/{owner}/{name}/predictions) and owner/name:hash (uses /v1/predictions with explicit version). Polls /v1/predictions/{id} every 500 ms, model-aware timeout (60 s schnell / 120 s dev). Resilience: 401 names the env var, 429 with exponential backoff up to 3 retries (honours Retry-After), 5xx retries once, image download retries once on transient failure.
  2. Pricing snapshotinternal/backend/replicate_pricing.go. Hard-coded per-image USD for known FLUX models, snapshot date 2026-05-08, source URL + refresh-TODO comment.
  3. Cost-trackinginternal/usage/usage.go. Supabase REST sink (PostgREST + Accept-Profile: mai). DB write failure is a warning, image still lands.
  4. Migration appliedmai.imagen_usage (id, created_at, backend, model, seed, prompt_hash, latency_ms, cost_usd_estimate, caller) + indexes on (created_at DESC) and (caller). Grants for mai, service_role. Verified via REST round-trip insert/delete. The raw prompt is never stored — only sha256(prompt).
  5. imagen usage CLIcmd/imagen/usage.go. Default groups by week + backend + model + caller with totals; --raw for one-row-per-call view; --since YYYY-MM-DD filter.
  6. imagen backends — instances of type=replicate now report ok when the token is set, not configured (set REPLICATE_API_TOKEN) otherwise. Verified.
  7. Config sample — adds flux-schnell-replicate (default_steps: 4) and keeps flux-dev-replicate (default_steps: 28); default_backend stays flux-schnell-local.
  8. Testsinternal/backend/replicate_test.go, all green: happy path (model + version-pinned), 401 (names env var), 429 retry policy + max-retry give-up, failed prediction surfacing API error, poll timeout with partial latency for diagnostics, image-download retry-then-fail, ctx cancel, BackendOpts passthrough, default_steps applied, aspect-ratio reduction, parseModelRef, hashPrompt stability, pricing lookup, sink-failure-is-warning. ~3 s total.

Caller identity resolves from MAI_FROM_ID, then the tmux pane's @mai-name option.

Blocked

REPLICATE_API_TOKEN is not present in m's env. Searched $env, ~/.dotfiles/.env.age, fish_variables, ~/.config/fish/conf.d, ~/.config/imagen.yaml. Sent delegation to head — needs either the token (then I run the single FLUX schnell smoke ~$0.003) or approval to ship without the live smoke. Mocked-HTTP tests cover the API path mechanically; AC #2 (real PNG + non-null cost_usd_estimate row) is the only criterion that requires the real call.

Acceptance criteria status

  • AC#1 — imagen backends ok/not-configured switching: verified locally.
  • AC#2 — real PNG + non-null cost row: blocked, needs token.
  • AC#3 — imagen usage --since table: built, will run end-to-end with the smoke row.
  • AC#4 — unit tests with httptest: done.
  • AC#5 — migration applied to dev Supabase: done, table accessible via mcp__supabase__execute_sql.
## Phase 1 status — built, committed, blocked on smoke Branch: `mai/hermes/issue-3-imagen-3` Commit: [b282325](https://mgit.msbls.de/m/ImaGen/commit/b282325) Build + tests: clean ### Done 1. **Replicate adapter** — `internal/backend/replicate.go`. Supports both `owner/name` (uses `/v1/models/{owner}/{name}/predictions`) and `owner/name:hash` (uses `/v1/predictions` with explicit version). Polls `/v1/predictions/{id}` every 500 ms, model-aware timeout (60 s schnell / 120 s dev). Resilience: 401 names the env var, 429 with exponential backoff up to 3 retries (honours `Retry-After`), 5xx retries once, image download retries once on transient failure. 2. **Pricing snapshot** — `internal/backend/replicate_pricing.go`. Hard-coded per-image USD for known FLUX models, snapshot date 2026-05-08, source URL + refresh-TODO comment. 3. **Cost-tracking** — `internal/usage/usage.go`. Supabase REST sink (PostgREST + `Accept-Profile: mai`). DB write failure is a warning, image still lands. 4. **Migration applied** — `mai.imagen_usage` (id, created_at, backend, model, seed, prompt_hash, latency_ms, cost_usd_estimate, caller) + indexes on (created_at DESC) and (caller). Grants for `mai`, `service_role`. Verified via REST round-trip insert/delete. The raw prompt is never stored — only `sha256(prompt)`. 5. **`imagen usage` CLI** — `cmd/imagen/usage.go`. Default groups by week + backend + model + caller with totals; `--raw` for one-row-per-call view; `--since YYYY-MM-DD` filter. 6. **`imagen backends`** — instances of `type=replicate` now report `ok` when the token is set, `not configured (set REPLICATE_API_TOKEN)` otherwise. Verified. 7. **Config sample** — adds `flux-schnell-replicate` (default_steps: 4) and keeps `flux-dev-replicate` (default_steps: 28); `default_backend` stays `flux-schnell-local`. 8. **Tests** — `internal/backend/replicate_test.go`, all green: happy path (model + version-pinned), 401 (names env var), 429 retry policy + max-retry give-up, failed prediction surfacing API error, poll timeout with partial latency for diagnostics, image-download retry-then-fail, ctx cancel, `BackendOpts` passthrough, `default_steps` applied, aspect-ratio reduction, `parseModelRef`, `hashPrompt` stability, pricing lookup, sink-failure-is-warning. ~3 s total. Caller identity resolves from `MAI_FROM_ID`, then the tmux pane's `@mai-name` option. ### Blocked `REPLICATE_API_TOKEN` is not present in m's env. Searched `$env`, `~/.dotfiles/.env.age`, `fish_variables`, `~/.config/fish/conf.d`, `~/.config/imagen.yaml`. Sent delegation to head — needs either the token (then I run the single FLUX schnell smoke ~$0.003) or approval to ship without the live smoke. Mocked-HTTP tests cover the API path mechanically; AC #2 (real PNG + non-null `cost_usd_estimate` row) is the only criterion that requires the real call. ### Acceptance criteria status - AC#1 — `imagen backends` ok/not-configured switching: **verified** locally. - AC#2 — real PNG + non-null cost row: **blocked**, needs token. - AC#3 — `imagen usage --since` table: **built**, will run end-to-end with the smoke row. - AC#4 — unit tests with httptest: **done**. - AC#5 — migration applied to dev Supabase: **done**, table accessible via `mcp__supabase__execute_sql`.
Author
Collaborator

Merged into main (code-complete; AC #2 smoke pending m's Replicate token)

Branch mai/hermes/issue-3-imagen-3 merged via --no-ff. Pushed to origin/main.

  • Implementation commit: b282325

What landed (1,710 lines, 10 files)

  • internal/backend/replicate.go (567 lines) - Backend interface impl: POST /v1/predictions with version hash + input, polling GET /v1/predictions/{id} every 500ms, image download, retry on 429 with exponential backoff, ONE retry on transient image-download 5xx, clean errors for 401 (names api_token_env), 4xx (no retry), prediction failed, prediction-timeout (60s schnell, 120s dev). Returns Result with PNG bytes + Metadata carrying model / model_version / seed_used / predict_time_seconds / cost_usd_estimate.
  • internal/backend/replicate_test.go (675 lines) - mocked-HTTP unit tests covering all paths above. Zero real API calls.
  • internal/backend/replicate_pricing.go (42 lines) - hard-coded current rates per model with source URL + refresh-TODO comment.
  • internal/usage/usage.go (160 lines) - Supabase writer for mai.imagen_usage. Best-effort: DB-write failure logs a warning, image still writes / exit code 0. Prompt is stored as sha256(prompt) only - never the raw prompt.
  • cmd/imagen/usage.go (189 lines) - new imagen usage [--since DATE] subcommand: groups by backend / model / caller / week, prints a clean table with running totals.
  • cmd/imagen/main.go, cmd/imagen/backends.go, cmd/imagen/generate.go - anonymous-import + backends listing + cost-tracking hook in the generate path.
  • internal/config/config.go - config-sample additions: flux-schnell-replicate and flux-dev-replicate blocks. default_backend: flux-schnell-local stays unchanged.
  • docs/usage.md - documents the new flow.

Supabase migration

Applied to dev Supabase: mai.imagen_usage table per the spec exactly (id / created_at / backend / model / seed / prompt_hash / latency_ms / cost_usd_estimate / caller, with indexes on created_at DESC and caller).

Acceptance criteria

# Check Result
1 imagen backends shows flux-schnell-replicate: ok when token set, not configured otherwise ok - covered by adapter + backends listing
2 Real-API smoke produces a real PNG + non-null cost_usd_estimate row in mai.imagen_usage NOT VERIFIED - pending REPLICATE_API_TOKEN. See note below.
3 imagen usage --since YYYY-MM-DD outputs a clean table with totals ok - subcommand built; will populate as Replicate calls land
4 Adapter has unit tests with httptest server, no real Replicate in CI ok - 675-line test suite
5 Migration applied cleanly to dev Supabase, table accessible via mcp__supabase__execute_sql ok

Note on AC #2

The real-API smoke was the one step that needed REPLICATE_API_TOKEN. Head's earlier briefing said the token was in m's env; on a clean check it wasn't there (not in the live env, not in ~/.dotfiles/.env.age). Per house rule on credentials/spend escalations, head surfaced to m. m said "go on" - merging code-complete with the AC #2 gap explicitly noted.

When m drops the token into ~/.dotfiles/.env.age (or sets it in the active shell), the smoke is a one-liner:

imagen generate "a tiny dragon perched on a teacup, photo, soft light" \
  --backend flux-schnell-replicate --output /tmp/replicate-smoke.png
# expected: ~$0.003, one row in mai.imagen_usage with cost_usd_estimate non-null
imagen usage --since 2026-05-08

Follow-up: if the smoke surfaces something the mocked tests didn't catch, it's a one-shot fix issue, not a re-do.

## Merged into main (code-complete; AC #2 smoke pending m's Replicate token) Branch `mai/hermes/issue-3-imagen-3` merged via `--no-ff`. Pushed to origin/main. - Implementation commit: `b282325` ### What landed (1,710 lines, 10 files) - `internal/backend/replicate.go` (567 lines) - Backend interface impl: `POST /v1/predictions` with version hash + input, polling `GET /v1/predictions/{id}` every 500ms, image download, retry on 429 with exponential backoff, ONE retry on transient image-download 5xx, clean errors for 401 (names `api_token_env`), 4xx (no retry), prediction `failed`, prediction-timeout (60s schnell, 120s dev). Returns `Result` with PNG bytes + `Metadata` carrying model / model_version / seed_used / predict_time_seconds / cost_usd_estimate. - `internal/backend/replicate_test.go` (675 lines) - mocked-HTTP unit tests covering all paths above. Zero real API calls. - `internal/backend/replicate_pricing.go` (42 lines) - hard-coded current rates per model with source URL + refresh-TODO comment. - `internal/usage/usage.go` (160 lines) - Supabase writer for `mai.imagen_usage`. Best-effort: DB-write failure logs a warning, image still writes / exit code 0. Prompt is stored as `sha256(prompt)` only - never the raw prompt. - `cmd/imagen/usage.go` (189 lines) - new `imagen usage [--since DATE]` subcommand: groups by backend / model / caller / week, prints a clean table with running totals. - `cmd/imagen/main.go`, `cmd/imagen/backends.go`, `cmd/imagen/generate.go` - anonymous-import + backends listing + cost-tracking hook in the generate path. - `internal/config/config.go` - config-sample additions: `flux-schnell-replicate` and `flux-dev-replicate` blocks. `default_backend: flux-schnell-local` stays unchanged. - `docs/usage.md` - documents the new flow. ### Supabase migration Applied to dev Supabase: `mai.imagen_usage` table per the spec exactly (id / created_at / backend / model / seed / prompt_hash / latency_ms / cost_usd_estimate / caller, with indexes on created_at DESC and caller). ### Acceptance criteria | # | Check | Result | |---|-------|--------| | 1 | `imagen backends` shows `flux-schnell-replicate: ok` when token set, `not configured` otherwise | ok - covered by adapter + backends listing | | 2 | Real-API smoke produces a real PNG + non-null `cost_usd_estimate` row in `mai.imagen_usage` | **NOT VERIFIED** - pending `REPLICATE_API_TOKEN`. See note below. | | 3 | `imagen usage --since YYYY-MM-DD` outputs a clean table with totals | ok - subcommand built; will populate as Replicate calls land | | 4 | Adapter has unit tests with httptest server, no real Replicate in CI | ok - 675-line test suite | | 5 | Migration applied cleanly to dev Supabase, table accessible via `mcp__supabase__execute_sql` | ok | ### Note on AC #2 The real-API smoke was the one step that needed `REPLICATE_API_TOKEN`. Head's earlier briefing said the token was in m's env; on a clean check it wasn't there (not in the live env, not in `~/.dotfiles/.env.age`). Per house rule on credentials/spend escalations, head surfaced to m. m said "go on" - merging code-complete with the AC #2 gap explicitly noted. When m drops the token into `~/.dotfiles/.env.age` (or sets it in the active shell), the smoke is a one-liner: ```bash imagen generate "a tiny dragon perched on a teacup, photo, soft light" \ --backend flux-schnell-replicate --output /tmp/replicate-smoke.png # expected: ~$0.003, one row in mai.imagen_usage with cost_usd_estimate non-null imagen usage --since 2026-05-08 ``` Follow-up: if the smoke surfaces something the mocked tests didn't catch, it's a one-shot fix issue, not a re-do.
mAi added the
done
needs-smoke
labels 2026-05-08 15:32:45 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: m/ImaGen#3
No description provided.