Image-to-image / restyle: condition generation on an existing image (local FLUX) #11

Open
opened 2026-06-06 12:37:49 +00:00 by mAi · 3 comments
Collaborator

Idea (m, 2026-06-06)

"Can we make our imagen image generation include data from existing images? For example 'restyle' an existing picture? Can we have an inventor scope that out and develop a feature like that using the local AI image model?"

m wants ImaGen to take an existing image as input and generate from it — the canonical case being restyle (keep the subject/composition, change the look: photo to watercolour, sketch to render, etc.). Must work on the local FLUX model (ComfyUI on mRock, the comfyui backend from #2), not only hosted APIs.

Why this is inventor work, not a one-liner

The current contract is text-to-image only:

  • backend.Request (internal/backend/backend.go) carries Prompt, NegativePrompt, Width, Height, Steps, Seed, Style, BackendOpts — there is no input-image source and no denoise/strength field. Adding image conditioning is a cross-backend contract change, not just a ComfyUI tweak.
  • The comfyui adapter builds a workflow by substituting Request values into a JSON template (internal/backend/workflows/, see workflow_template.go). img2img needs a new graph (LoadImage to VAEEncode to KSampler-with-denoise) — the existing text-to-image template can't be reused as-is.
  • It composes with the existing style-preset system (internal/prompt) — "restyle to anime" should be able to reuse the anime preset rather than reinventing style prompts.

What the inventor should scope

  1. Mechanism on local FLUX — evaluate and recommend. At least:

    • img2img (latent denoise) — encode the input, partially denoise. Strength/denoise knob trades fidelity vs. transformation. Simplest; works with FLUX.
    • FLUX Redux / image-prompt conditioning — image as a style/content prompt (IPAdapter-style). Good for "in the style of this image".
    • ControlNet (canny/depth) — preserve structure while fully repainting style. Strongest for "same composition, new look".
      Recommend a v0 mechanism and a path to the others. Note FLUX schnell is distilled/low-step — verify which mechanisms behave on it vs. needing flux-dev.
  2. Framework contract change. How does the input image enter Request? Proposal space: an InitImage source (path / bytes / reader) + a Strength float64. Keep it backend-agnostic so Replicate/others can implement img2img later. Define what happens when a backend doesn't support it (clear error, not silent text-to-image).

  3. CLI surface. e.g. imagen generate --from <path> --strength 0.6 "restyle as watercolour", or a dedicated imagen restyle <path> --style watercolour. Decide flag shape, how --style presets compose with the input image, defaults for strength.

  4. ComfyUI workflow template(s). What new template(s) under internal/backend/workflows/ are needed, and how the input image is uploaded to ComfyUI (/upload/image) before the prompt is submitted.

  5. Model availability on mRock. Does img2img/Redux/ControlNet need model files (Redux model, ControlNet weights, VAE) not yet on mRock? List what must be downloaded. (mRock GPU: RTX 4070 Ti SUPER, 16 GB VRAM — flag VRAM headroom for ControlNet + FLUX.)

Open questions for the design

  • One feature or two? (generic img2img vs. an opinionated restyle that auto-picks structure-preserving + a style preset.)
  • Sidecar/metadata: record the source image + strength so a restyle is reproducible and traceable (ties into #6 viewer's "re-run" idea).
  • Does this belong only on comfyui, or define the contract now and let Replicate implement later?

Constraints

  • Local model first. Hosted backends are a follow-up, not the v0.
  • House rules: no TODOs in landed code, tests next to packages, go build ./... + go test ./... clean.

Workflow

  1. Inventor designs mechanism + contract change + CLI surface + ComfyUI template plan + model-download list, writes it up (design doc under docs/ + this issue's comments). Ends with "DESIGN READY FOR REVIEW".
  2. Head gates the design to m (go/no-go) before any coder shift.
  3. On go: same worker implements in stages (contract + comfyui img2img first, then style/ControlNet refinements).

Refs

  • #2 ComfyUI local backend on mRock (FLUX schnell) — the backend this extends
  • internal/backend/backend.go (Request contract), internal/backend/comfyui.go, internal/backend/workflows/, internal/prompt (style presets)
  • #6 viewer surface — "regenerate/fork from a saved image" overlaps the restyle entry point
## Idea (m, 2026-06-06) > "Can we make our imagen image generation include data from existing images? For example 'restyle' an existing picture? Can we have an inventor scope that out and develop a feature like that using the local AI image model?" m wants ImaGen to take an **existing image as input** and generate from it — the canonical case being **restyle** (keep the subject/composition, change the look: photo to watercolour, sketch to render, etc.). Must work on the **local FLUX model** (ComfyUI on mRock, the `comfyui` backend from #2), not only hosted APIs. ## Why this is inventor work, not a one-liner The current contract is text-to-image only: - `backend.Request` (`internal/backend/backend.go`) carries `Prompt, NegativePrompt, Width, Height, Steps, Seed, Style, BackendOpts` — there is **no input-image source and no denoise/strength field**. Adding image conditioning is a cross-backend contract change, not just a ComfyUI tweak. - The `comfyui` adapter builds a workflow by substituting Request values into a JSON template (`internal/backend/workflows/`, see `workflow_template.go`). img2img needs a **new graph** (LoadImage to VAEEncode to KSampler-with-denoise) — the existing text-to-image template can't be reused as-is. - It composes with the existing **style-preset** system (`internal/prompt`) — "restyle to anime" should be able to reuse the `anime` preset rather than reinventing style prompts. ## What the inventor should scope 1. **Mechanism on local FLUX — evaluate and recommend.** At least: - **img2img (latent denoise)** — encode the input, partially denoise. Strength/denoise knob trades fidelity vs. transformation. Simplest; works with FLUX. - **FLUX Redux / image-prompt conditioning** — image as a style/content prompt (IPAdapter-style). Good for "in the style of this image". - **ControlNet (canny/depth)** — preserve *structure* while fully repainting style. Strongest for "same composition, new look". Recommend a v0 mechanism and a path to the others. Note FLUX schnell is distilled/low-step — verify which mechanisms behave on it vs. needing flux-dev. 2. **Framework contract change.** How does the input image enter `Request`? Proposal space: an `InitImage` source (path / bytes / reader) + a `Strength float64`. Keep it backend-agnostic so Replicate/others can implement img2img later. Define what happens when a backend doesn't support it (clear error, not silent text-to-image). 3. **CLI surface.** e.g. `imagen generate --from <path> --strength 0.6 "restyle as watercolour"`, or a dedicated `imagen restyle <path> --style watercolour`. Decide flag shape, how `--style` presets compose with the input image, defaults for strength. 4. **ComfyUI workflow template(s).** What new template(s) under `internal/backend/workflows/` are needed, and how the input image is uploaded to ComfyUI (`/upload/image`) before the prompt is submitted. 5. **Model availability on mRock.** Does img2img/Redux/ControlNet need model files (Redux model, ControlNet weights, VAE) not yet on mRock? List what must be downloaded. (mRock GPU: RTX 4070 Ti SUPER, 16 GB VRAM — flag VRAM headroom for ControlNet + FLUX.) ## Open questions for the design - One feature or two? (generic img2img vs. an opinionated `restyle` that auto-picks structure-preserving + a style preset.) - Sidecar/metadata: record the source image + strength so a restyle is reproducible and traceable (ties into #6 viewer's "re-run" idea). - Does this belong only on `comfyui`, or define the contract now and let Replicate implement later? ## Constraints - **Local model first.** Hosted backends are a follow-up, not the v0. - House rules: no TODOs in landed code, tests next to packages, `go build ./...` + `go test ./...` clean. ## Workflow 1. Inventor designs mechanism + contract change + CLI surface + ComfyUI template plan + model-download list, writes it up (design doc under `docs/` + this issue's comments). Ends with "DESIGN READY FOR REVIEW". 2. Head gates the design to m (go/no-go) before any coder shift. 3. On go: same worker implements in stages (contract + comfyui img2img first, then style/ControlNet refinements). ## Refs - #2 ComfyUI local backend on mRock (FLUX schnell) — the backend this extends - `internal/backend/backend.go` (Request contract), `internal/backend/comfyui.go`, `internal/backend/workflows/`, `internal/prompt` (style presets) - #6 viewer surface — "regenerate/fork from a saved image" overlaps the restyle entry point
Author
Collaborator

Shift-1 design complete — DESIGN READY FOR REVIEW pending m's decisions

Design doc committed: docs/design-img2img-restyle.md.

Mechanism evaluation (verified model/VRAM facts against live sources):

Phase Mechanism Downloads Why
v0 img2img latent denoise none covers restyle now, schnell Apache-2.0 (commercial-clean), zero ops
phase 2 ControlNet (canny/depth) ~4GB + flux-dev unet (non-commercial), custom node dramatic restyle with locked composition
phase 3 (opt.) Redux ~1GB 'style from a reference image' — a different feature, weak at text-steered restyle

Recommended v0: img2img latent denoise. Zero model downloads (VAE already on mRock; LoadImage/VAEEncode are core nodes), license-clean on schnell. Caveat flagged: schnell is 4-step distilled, so img2img bumps the step floor when denoise<1.0.

Contract change (internal/backend/backend.go):

  • Request.InitImage *ImageInput + Request.Strength float64 ([0,1], = ComfyUI denoise).
  • ImageInputCapable optional interface + ErrImageInputUnsupported sentinel — a backend that can't do img2img errors clearly, never silently falls back to txt2img.

ComfyUI: new bundled flux1-schnell-img2img.json (LoadImage → VAEEncode → KSampler{latent_image=encoded, denoise=${denoise}}), optional workflow_img2img: config key auto-resolving to <workflow>-img2img, image uploaded via /upload/image with a content-hash filename.

CLI: imagen generate "…" --from <path> --strength 0.6 (generic) + imagen restyle <path> --style watercolour (opinionated sugar over the same path; empty prompt + --style makes the style suffix the prompt). One mechanism, two entry points.

Sidecar: records mode/strength/source_image_path/source_image_sha256 for reproducibility + #6 'fork from saved image'.

Model downloads on mRock for v0: none. Phase-2 ControlNet needs flux-dev (non-commercial license — flagged for any commercial use).

Walking m through 6 decisions (mechanism, CLI shape, contract scope, strength semantics, dimensions, phase-2 priority) via the chip-picker now; decisions get folded into the doc's §11 and the branch updated before the head gates the coder shift.

(checkpoint cp-20260606-144349; posted by daedalus/inventor)

## Shift-1 design complete — DESIGN READY FOR REVIEW pending m's decisions Design doc committed: `docs/design-img2img-restyle.md`. **Mechanism evaluation** (verified model/VRAM facts against live sources): | Phase | Mechanism | Downloads | Why | |-------|-----------|-----------|-----| | **v0** | img2img latent denoise | none | covers restyle now, schnell Apache-2.0 (commercial-clean), zero ops | | phase 2 | ControlNet (canny/depth) | ~4GB + flux-dev unet (non-commercial), custom node | dramatic restyle with locked composition | | phase 3 (opt.) | Redux | ~1GB | 'style from a reference image' — a different feature, weak at text-steered restyle | **Recommended v0: img2img latent denoise.** Zero model downloads (VAE already on mRock; `LoadImage`/`VAEEncode` are core nodes), license-clean on schnell. Caveat flagged: schnell is 4-step distilled, so img2img bumps the step floor when `denoise<1.0`. **Contract change** (`internal/backend/backend.go`): - `Request.InitImage *ImageInput` + `Request.Strength float64` ([0,1], = ComfyUI `denoise`). - `ImageInputCapable` optional interface + `ErrImageInputUnsupported` sentinel — a backend that can't do img2img errors clearly, never silently falls back to txt2img. **ComfyUI:** new bundled `flux1-schnell-img2img.json` (`LoadImage → VAEEncode → KSampler{latent_image=encoded, denoise=${denoise}}`), optional `workflow_img2img:` config key auto-resolving to `<workflow>-img2img`, image uploaded via `/upload/image` with a content-hash filename. **CLI:** `imagen generate "…" --from <path> --strength 0.6` (generic) + `imagen restyle <path> --style watercolour` (opinionated sugar over the same path; empty prompt + `--style` makes the style suffix the prompt). One mechanism, two entry points. **Sidecar:** records `mode/strength/source_image_path/source_image_sha256` for reproducibility + #6 'fork from saved image'. **Model downloads on mRock for v0: none.** Phase-2 ControlNet needs flux-dev (non-commercial license — flagged for any commercial use). Walking m through 6 decisions (mechanism, CLI shape, contract scope, strength semantics, dimensions, phase-2 priority) via the chip-picker now; decisions get folded into the doc's §11 and the branch updated before the head gates the coder shift. _(checkpoint cp-20260606-144349; posted by daedalus/inventor)_
Author
Collaborator

Implementation complete — #11 img2img / restyle on local FLUX

Got the go from the head and implemented the design in three staged commits on mai/daedalus/design-image-to-image. go build ./..., go vet ./... and go test ./... are all clean.

Stage 1 — cross-backend contract (7ccdfbf)

  • backend.Request gains InitImage *ImageInput + Strength float64.
  • ImageInputCapable optional interface + ErrImageInputUnsupported sentinel.
  • Mock reports unsupported and errors on InitImage (honest reference impl).

Stage 2 — ComfyUI img2img (de41506)

  • New bundled flux1-schnell-img2img.json: empty latent replaced by LoadImage → VAEEncode feeding KSampler.latent_image, denoise = ${denoise}.
  • workflow_img2img config key, auto-resolving to <workflow>-img2img for bundled names; SupportsImageInput() reflects it.
  • /upload/image upload with a content-hash filename; buildImg2ImgWorkflow supplies init_image+denoise; metadata records mode/strength/source_image/source_image_sha256.
  • Distilled-model step floor (img2imgMinSteps) lifts steps when unset + denoise<1.

Stage 3 — CLI (b1efbc0)

  • imagen generate --from <img> --strength 0.6 (generic) and imagen restyle <img> --style <preset> ["extra prompt"] (sugar over the same runGeneration pipeline).
  • img2img output dims read from the input image (png/jpeg/webp via image.DecodeConfig); --size+--from rejected; --strength validated to [0,1]; capability gate gives an actionable error on a backend that can't img2img.
  • docs/usage.md + config sample updated.

Tests added: contract round-trip + sentinel (backend_test), img2img workflow resolution / upload+graph-rewire / unsupported / step-floor (comfyui_test), template shape (workflow_template_test), loadInitImage + img2img validation gates (generate_test), restyle arg handling (restyle_test).

Smoke-tested the built binary: help lists both entry points; all error paths (--size+--from, out-of-range strength, unsupported backend, missing image, missing target look) return clear messages and exit 2.

Model downloads on mRock for this v0: none — the VAE is already present and LoadImage/VAEEncode are core nodes. Real-ComfyUI verification on mRock is the one thing unit tests can't cover (httptest mocks /upload/image + /prompt); worth a live smoke once mRock's ComfyUI is up. Phase-2 ControlNet / phase-3 Redux remain follow-ups per the design.

(posted by daedalus/inventor→coder)

## Implementation complete — #11 img2img / restyle on local FLUX Got the go from the head and implemented the design in three staged commits on `mai/daedalus/design-image-to-image`. `go build ./...`, `go vet ./...` and `go test ./...` are all clean. **Stage 1 — cross-backend contract** (`7ccdfbf`) - `backend.Request` gains `InitImage *ImageInput` + `Strength float64`. - `ImageInputCapable` optional interface + `ErrImageInputUnsupported` sentinel. - `Mock` reports unsupported and errors on `InitImage` (honest reference impl). **Stage 2 — ComfyUI img2img** (`de41506`) - New bundled `flux1-schnell-img2img.json`: empty latent replaced by `LoadImage → VAEEncode` feeding `KSampler.latent_image`, `denoise = ${denoise}`. - `workflow_img2img` config key, auto-resolving to `<workflow>-img2img` for bundled names; `SupportsImageInput()` reflects it. - `/upload/image` upload with a content-hash filename; `buildImg2ImgWorkflow` supplies `init_image`+`denoise`; metadata records `mode`/`strength`/`source_image`/`source_image_sha256`. - Distilled-model step floor (`img2imgMinSteps`) lifts steps when unset + `denoise<1`. **Stage 3 — CLI** (`b1efbc0`) - `imagen generate --from <img> --strength 0.6` (generic) and `imagen restyle <img> --style <preset> ["extra prompt"]` (sugar over the same `runGeneration` pipeline). - img2img output dims read from the input image (png/jpeg/webp via `image.DecodeConfig`); `--size`+`--from` rejected; `--strength` validated to `[0,1]`; capability gate gives an actionable error on a backend that can't img2img. - `docs/usage.md` + config sample updated. **Tests added:** contract round-trip + sentinel (backend_test), img2img workflow resolution / upload+graph-rewire / unsupported / step-floor (comfyui_test), template shape (workflow_template_test), `loadInitImage` + img2img validation gates (generate_test), restyle arg handling (restyle_test). **Smoke-tested** the built binary: help lists both entry points; all error paths (`--size`+`--from`, out-of-range strength, unsupported backend, missing image, missing target look) return clear messages and exit 2. **Model downloads on mRock for this v0: none** — the VAE is already present and `LoadImage`/`VAEEncode` are core nodes. Real-ComfyUI verification on mRock is the one thing unit tests can't cover (httptest mocks `/upload/image` + `/prompt`); worth a live smoke once mRock's ComfyUI is up. Phase-2 ControlNet / phase-3 Redux remain follow-ups per the design. _(posted by daedalus/inventor→coder)_
Author
Collaborator

Merged to main15c65dd (merge of mai/daedalus/design-image-to-image).

v0 image-to-image / restyle on the local FLUX (comfyui) backend shipped:

  • Contract: Request.InitImage + Strength, ImageInputCapable runtime check, ErrImageInputUnsupported sentinel (backends fail loudly, never silently fall back to text-to-image).
  • ComfyUI: flux1-schnell-img2img.json (LoadImageVAEEncodeKSampler with denoise=strength), /upload/image with content-hash filenames, workflow_img2img auto-resolution, distilled-model step floor.
  • CLI: imagen generate "…" --from photo.jpg --strength 0.6 and the imagen restyle photo.jpg --style watercolor sugar verb. Output keeps the input's dimensions; --size+--from rejected.
  • Reproducibility: sidecar records source_image_path + sha256 + strength.
  • Zero model downloads for v0. go build/go vet/go test ./... clean.

Deferred (issues, not TODOs): ControlNet phase-2 (locked-composition repaint; pulls flux-dev, non-commercial license), Redux phase-3 ("style from a reference"), Replicate img2img, #6 cloud lineage.

Caveat: img2img is unit-tested against a mocked ComfyUI HTTP API, not yet live-verified against the real ComfyUI on mRock — tracked as a smoke-test follow-up.

Design doc: docs/design-img2img-restyle.md. Implemented by daedalus (inventor → coder, m-approved gate).

**Merged to main** — `15c65dd` (merge of `mai/daedalus/design-image-to-image`). v0 image-to-image / restyle on the local FLUX (`comfyui`) backend shipped: - **Contract**: `Request.InitImage` + `Strength`, `ImageInputCapable` runtime check, `ErrImageInputUnsupported` sentinel (backends fail loudly, never silently fall back to text-to-image). - **ComfyUI**: `flux1-schnell-img2img.json` (`LoadImage` → `VAEEncode` → `KSampler` with `denoise=strength`), `/upload/image` with content-hash filenames, `workflow_img2img` auto-resolution, distilled-model step floor. - **CLI**: `imagen generate "…" --from photo.jpg --strength 0.6` and the `imagen restyle photo.jpg --style watercolor` sugar verb. Output keeps the input's dimensions; `--size`+`--from` rejected. - **Reproducibility**: sidecar records `source_image_path` + sha256 + strength. - Zero model downloads for v0. `go build`/`go vet`/`go test ./...` clean. Deferred (issues, not TODOs): ControlNet phase-2 (locked-composition repaint; pulls flux-dev, **non-commercial license**), Redux phase-3 ("style from a reference"), Replicate img2img, #6 cloud lineage. **Caveat**: img2img is unit-tested against a mocked ComfyUI HTTP API, not yet live-verified against the real ComfyUI on mRock — tracked as a smoke-test follow-up. Design doc: `docs/design-img2img-restyle.md`. Implemented by daedalus (inventor → coder, m-approved gate).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: m/ImaGen#11
No description provided.