Files
paliad/pkg/docforge/model.go
mAi 8763ab013c feat(docforge): slice 8 — neutral model + Markdown importer + Exporter iface (t-paliad-349)
The final slice: land the format-neutral document model with REAL consumers
and unify the Markdown parser — no duplication, byte-identical output.

Neutral model (pkg/docforge/model.go): Document / Block / InlineSpan.
BlockKind values are the stylemap keys. A hyperlink is a span with Link set
+ Children (the label's spans), preserving link boundaries so adjacent
same-URL links stay distinct — byte-exact with the pre-model walker.

Markdown importer (pkg/docforge/markdown): Import(md) → Document. The SINGLE
Markdown parser for docforge — block split, marker detection, inline
bold/italic/link tokenisation, {{placeholder}} pass-through (the b78a984
fix). Relocated out of the docx walker.

docx renderer (pkg/docforge/docx/markdown.go): now RENDERS a Document →
OOXML (RenderDocumentToOOXML); RenderMarkdownToOOXML[WithStyles] = render(
markdown.Import(md)). The shipped submission walker routes through the model,
so there is one parser, not two. The comprehensive byte-exact render tests
(RenderMarkdownToOOXML_*) all PASS unchanged = output identical.

Exporter interface (pkg/docforge/exporter.go, PRD §4 B4): Exporter{Format,
MIMEType, RenderBody(Document)} with the .docx impl (pkg/docforge/docx/
exporter.go). The seam a future PDF/HTML exporter slots into.

Tests: parser tests relocated to the markdown pkg (parseSpans/detectBlockMarker)
+ new importer Document tests + exporter conformance test.

Verification: go build/vet clean; gofmt clean; full NO-DB test suite GREEN
(authoritative — proves no regression); docforge byte-exact render oracle
PASS; composer live test renders through the rewired walker (PASS); bun build
+ bun test 274/274. The shared-DB live run fails ~85 tests across unrelated
services from a harness pq-42P08 $1-type seeding quirk + a stale
deadline_rules test — systemic/environmental (the no-DB run is clean), not
this change.

docforge train complete: 8 slices, the engine extracted + cleaned + a working
author→generate→export loop on uploaded templates, plus the neutral model +
importer + exporter seam for future formats/consumers.

m/paliad#157
2026-05-29 18:10:16 +02:00

59 lines
2.3 KiB
Go

package docforge
// The neutral document model — the format-independent representation an
// importer produces and an exporter consumes (PRD §3.2). A Markdown
// importer parses source into a Document; the .docx exporter renders a
// Document into OOXML; a future PDF/HTML exporter renders the same
// Document differently. The model carries editable content only —
// placeholders ({{key}}) ride through as literal span text and are
// substituted later by the format exporter's merge pass, exactly as in
// the pre-model pipeline.
//
// Slice 8 (t-paliad-349) lands this model with two real consumers: the
// Markdown importer (pkg/docforge/markdown) and the .docx renderer
// (pkg/docforge/docx), which the shipped submission walker now routes
// through — so there is one parser, not two.
// BlockKind is the logical kind of a block. Its string values are the
// stylemap keys a format exporter looks up (paragraph, heading_1, …), so
// the docx exporter maps Kind → Word paragraph style directly.
type BlockKind string
const (
KindParagraph BlockKind = "paragraph"
KindHeading1 BlockKind = "heading_1"
KindHeading2 BlockKind = "heading_2"
KindHeading3 BlockKind = "heading_3"
KindListBullet BlockKind = "list_bullet"
KindListNumbered BlockKind = "list_numbered"
KindBlockquote BlockKind = "blockquote"
)
// Document is a sequence of blocks — the whole editable content.
type Document struct {
Blocks []Block
}
// Block is one paragraph-level unit. Spans is its inline content; an empty
// Spans slice is an intentional empty paragraph (vertical spacing).
type Block struct {
Kind BlockKind
Spans []InlineSpan
}
// InlineSpan is one run of inline content. A span is either:
// - literal text with optional bold/italic (Link == "", Children nil), or
// - a hyperlink (Link != "") whose label is the Children spans.
//
// Modelling a link as a span with Children (rather than a per-span Link
// flag) preserves link boundaries: two adjacent links to the same URL stay
// two distinct hyperlink spans, so the exporter emits them byte-identically
// to the pre-model walker.
type InlineSpan struct {
Text string
Bold bool
Italic bool
Link string // non-empty → this span is a hyperlink to Link
Children []InlineSpan // hyperlink label content (only when Link != "")
}