The final slice: land the format-neutral document model with REAL consumers
and unify the Markdown parser — no duplication, byte-identical output.
Neutral model (pkg/docforge/model.go): Document / Block / InlineSpan.
BlockKind values are the stylemap keys. A hyperlink is a span with Link set
+ Children (the label's spans), preserving link boundaries so adjacent
same-URL links stay distinct — byte-exact with the pre-model walker.
Markdown importer (pkg/docforge/markdown): Import(md) → Document. The SINGLE
Markdown parser for docforge — block split, marker detection, inline
bold/italic/link tokenisation, {{placeholder}} pass-through (the b78a984
fix). Relocated out of the docx walker.
docx renderer (pkg/docforge/docx/markdown.go): now RENDERS a Document →
OOXML (RenderDocumentToOOXML); RenderMarkdownToOOXML[WithStyles] = render(
markdown.Import(md)). The shipped submission walker routes through the model,
so there is one parser, not two. The comprehensive byte-exact render tests
(RenderMarkdownToOOXML_*) all PASS unchanged = output identical.
Exporter interface (pkg/docforge/exporter.go, PRD §4 B4): Exporter{Format,
MIMEType, RenderBody(Document)} with the .docx impl (pkg/docforge/docx/
exporter.go). The seam a future PDF/HTML exporter slots into.
Tests: parser tests relocated to the markdown pkg (parseSpans/detectBlockMarker)
+ new importer Document tests + exporter conformance test.
Verification: go build/vet clean; gofmt clean; full NO-DB test suite GREEN
(authoritative — proves no regression); docforge byte-exact render oracle
PASS; composer live test renders through the rewired walker (PASS); bun build
+ bun test 274/274. The shared-DB live run fails ~85 tests across unrelated
services from a harness pq-42P08 $1-type seeding quirk + a stale
deadline_rules test — systemic/environmental (the no-DB run is clean), not
this change.
docforge train complete: 8 slices, the engine extracted + cleaned + a working
author→generate→export loop on uploaded templates, plus the neutral model +
importer + exporter seam for future formats/consumers.
m/paliad#157
59 lines
2.3 KiB
Go
59 lines
2.3 KiB
Go
package docforge
|
|
|
|
// The neutral document model — the format-independent representation an
|
|
// importer produces and an exporter consumes (PRD §3.2). A Markdown
|
|
// importer parses source into a Document; the .docx exporter renders a
|
|
// Document into OOXML; a future PDF/HTML exporter renders the same
|
|
// Document differently. The model carries editable content only —
|
|
// placeholders ({{key}}) ride through as literal span text and are
|
|
// substituted later by the format exporter's merge pass, exactly as in
|
|
// the pre-model pipeline.
|
|
//
|
|
// Slice 8 (t-paliad-349) lands this model with two real consumers: the
|
|
// Markdown importer (pkg/docforge/markdown) and the .docx renderer
|
|
// (pkg/docforge/docx), which the shipped submission walker now routes
|
|
// through — so there is one parser, not two.
|
|
|
|
// BlockKind is the logical kind of a block. Its string values are the
|
|
// stylemap keys a format exporter looks up (paragraph, heading_1, …), so
|
|
// the docx exporter maps Kind → Word paragraph style directly.
|
|
type BlockKind string
|
|
|
|
const (
|
|
KindParagraph BlockKind = "paragraph"
|
|
KindHeading1 BlockKind = "heading_1"
|
|
KindHeading2 BlockKind = "heading_2"
|
|
KindHeading3 BlockKind = "heading_3"
|
|
KindListBullet BlockKind = "list_bullet"
|
|
KindListNumbered BlockKind = "list_numbered"
|
|
KindBlockquote BlockKind = "blockquote"
|
|
)
|
|
|
|
// Document is a sequence of blocks — the whole editable content.
|
|
type Document struct {
|
|
Blocks []Block
|
|
}
|
|
|
|
// Block is one paragraph-level unit. Spans is its inline content; an empty
|
|
// Spans slice is an intentional empty paragraph (vertical spacing).
|
|
type Block struct {
|
|
Kind BlockKind
|
|
Spans []InlineSpan
|
|
}
|
|
|
|
// InlineSpan is one run of inline content. A span is either:
|
|
// - literal text with optional bold/italic (Link == "", Children nil), or
|
|
// - a hyperlink (Link != "") whose label is the Children spans.
|
|
//
|
|
// Modelling a link as a span with Children (rather than a per-span Link
|
|
// flag) preserves link boundaries: two adjacent links to the same URL stay
|
|
// two distinct hyperlink spans, so the exporter emits them byte-identically
|
|
// to the pre-model walker.
|
|
type InlineSpan struct {
|
|
Text string
|
|
Bold bool
|
|
Italic bool
|
|
Link string // non-empty → this span is a hyperlink to Link
|
|
Children []InlineSpan // hyperlink label content (only when Link != "")
|
|
}
|