Image-to-image / restyle: condition generation on an existing image (local FLUX) #11
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Idea (m, 2026-06-06)
m wants ImaGen to take an existing image as input and generate from it — the canonical case being restyle (keep the subject/composition, change the look: photo to watercolour, sketch to render, etc.). Must work on the local FLUX model (ComfyUI on mRock, the
comfyuibackend from #2), not only hosted APIs.Why this is inventor work, not a one-liner
The current contract is text-to-image only:
backend.Request(internal/backend/backend.go) carriesPrompt, NegativePrompt, Width, Height, Steps, Seed, Style, BackendOpts— there is no input-image source and no denoise/strength field. Adding image conditioning is a cross-backend contract change, not just a ComfyUI tweak.comfyuiadapter builds a workflow by substituting Request values into a JSON template (internal/backend/workflows/, seeworkflow_template.go). img2img needs a new graph (LoadImage to VAEEncode to KSampler-with-denoise) — the existing text-to-image template can't be reused as-is.internal/prompt) — "restyle to anime" should be able to reuse theanimepreset rather than reinventing style prompts.What the inventor should scope
Mechanism on local FLUX — evaluate and recommend. At least:
Recommend a v0 mechanism and a path to the others. Note FLUX schnell is distilled/low-step — verify which mechanisms behave on it vs. needing flux-dev.
Framework contract change. How does the input image enter
Request? Proposal space: anInitImagesource (path / bytes / reader) + aStrength float64. Keep it backend-agnostic so Replicate/others can implement img2img later. Define what happens when a backend doesn't support it (clear error, not silent text-to-image).CLI surface. e.g.
imagen generate --from <path> --strength 0.6 "restyle as watercolour", or a dedicatedimagen restyle <path> --style watercolour. Decide flag shape, how--stylepresets compose with the input image, defaults for strength.ComfyUI workflow template(s). What new template(s) under
internal/backend/workflows/are needed, and how the input image is uploaded to ComfyUI (/upload/image) before the prompt is submitted.Model availability on mRock. Does img2img/Redux/ControlNet need model files (Redux model, ControlNet weights, VAE) not yet on mRock? List what must be downloaded. (mRock GPU: RTX 4070 Ti SUPER, 16 GB VRAM — flag VRAM headroom for ControlNet + FLUX.)
Open questions for the design
restylethat auto-picks structure-preserving + a style preset.)comfyui, or define the contract now and let Replicate implement later?Constraints
go build ./...+go test ./...clean.Workflow
docs/+ this issue's comments). Ends with "DESIGN READY FOR REVIEW".Refs
internal/backend/backend.go(Request contract),internal/backend/comfyui.go,internal/backend/workflows/,internal/prompt(style presets)Shift-1 design complete — DESIGN READY FOR REVIEW pending m's decisions
Design doc committed:
docs/design-img2img-restyle.md.Mechanism evaluation (verified model/VRAM facts against live sources):
Recommended v0: img2img latent denoise. Zero model downloads (VAE already on mRock;
LoadImage/VAEEncodeare core nodes), license-clean on schnell. Caveat flagged: schnell is 4-step distilled, so img2img bumps the step floor whendenoise<1.0.Contract change (
internal/backend/backend.go):Request.InitImage *ImageInput+Request.Strength float64([0,1], = ComfyUIdenoise).ImageInputCapableoptional interface +ErrImageInputUnsupportedsentinel — a backend that can't do img2img errors clearly, never silently falls back to txt2img.ComfyUI: new bundled
flux1-schnell-img2img.json(LoadImage → VAEEncode → KSampler{latent_image=encoded, denoise=${denoise}}), optionalworkflow_img2img:config key auto-resolving to<workflow>-img2img, image uploaded via/upload/imagewith a content-hash filename.CLI:
imagen generate "…" --from <path> --strength 0.6(generic) +imagen restyle <path> --style watercolour(opinionated sugar over the same path; empty prompt +--stylemakes the style suffix the prompt). One mechanism, two entry points.Sidecar: records
mode/strength/source_image_path/source_image_sha256for reproducibility + #6 'fork from saved image'.Model downloads on mRock for v0: none. Phase-2 ControlNet needs flux-dev (non-commercial license — flagged for any commercial use).
Walking m through 6 decisions (mechanism, CLI shape, contract scope, strength semantics, dimensions, phase-2 priority) via the chip-picker now; decisions get folded into the doc's §11 and the branch updated before the head gates the coder shift.
(checkpoint cp-20260606-144349; posted by daedalus/inventor)
Implementation complete — #11 img2img / restyle on local FLUX
Got the go from the head and implemented the design in three staged commits on
mai/daedalus/design-image-to-image.go build ./...,go vet ./...andgo test ./...are all clean.Stage 1 — cross-backend contract (
7ccdfbf)backend.RequestgainsInitImage *ImageInput+Strength float64.ImageInputCapableoptional interface +ErrImageInputUnsupportedsentinel.Mockreports unsupported and errors onInitImage(honest reference impl).Stage 2 — ComfyUI img2img (
de41506)flux1-schnell-img2img.json: empty latent replaced byLoadImage → VAEEncodefeedingKSampler.latent_image,denoise = ${denoise}.workflow_img2imgconfig key, auto-resolving to<workflow>-img2imgfor bundled names;SupportsImageInput()reflects it./upload/imageupload with a content-hash filename;buildImg2ImgWorkflowsuppliesinit_image+denoise; metadata recordsmode/strength/source_image/source_image_sha256.img2imgMinSteps) lifts steps when unset +denoise<1.Stage 3 — CLI (
b1efbc0)imagen generate --from <img> --strength 0.6(generic) andimagen restyle <img> --style <preset> ["extra prompt"](sugar over the samerunGenerationpipeline).image.DecodeConfig);--size+--fromrejected;--strengthvalidated to[0,1]; capability gate gives an actionable error on a backend that can't img2img.docs/usage.md+ config sample updated.Tests added: contract round-trip + sentinel (backend_test), img2img workflow resolution / upload+graph-rewire / unsupported / step-floor (comfyui_test), template shape (workflow_template_test),
loadInitImage+ img2img validation gates (generate_test), restyle arg handling (restyle_test).Smoke-tested the built binary: help lists both entry points; all error paths (
--size+--from, out-of-range strength, unsupported backend, missing image, missing target look) return clear messages and exit 2.Model downloads on mRock for this v0: none — the VAE is already present and
LoadImage/VAEEncodeare core nodes. Real-ComfyUI verification on mRock is the one thing unit tests can't cover (httptest mocks/upload/image+/prompt); worth a live smoke once mRock's ComfyUI is up. Phase-2 ControlNet / phase-3 Redux remain follow-ups per the design.(posted by daedalus/inventor→coder)
Merged to main —
15c65dd(merge ofmai/daedalus/design-image-to-image).v0 image-to-image / restyle on the local FLUX (
comfyui) backend shipped:Request.InitImage+Strength,ImageInputCapableruntime check,ErrImageInputUnsupportedsentinel (backends fail loudly, never silently fall back to text-to-image).flux1-schnell-img2img.json(LoadImage→VAEEncode→KSamplerwithdenoise=strength),/upload/imagewith content-hash filenames,workflow_img2imgauto-resolution, distilled-model step floor.imagen generate "…" --from photo.jpg --strength 0.6and theimagen restyle photo.jpg --style watercolorsugar verb. Output keeps the input's dimensions;--size+--fromrejected.source_image_path+ sha256 + strength.go build/go vet/go test ./...clean.Deferred (issues, not TODOs): ControlNet phase-2 (locked-composition repaint; pulls flux-dev, non-commercial license), Redux phase-3 ("style from a reference"), Replicate img2img, #6 cloud lineage.
Caveat: img2img is unit-tested against a mocked ComfyUI HTTP API, not yet live-verified against the real ComfyUI on mRock — tracked as a smoke-test follow-up.
Design doc:
docs/design-img2img-restyle.md. Implemented by daedalus (inventor → coder, m-approved gate).