Files
paliad/docs/design-paliadin-2026-05-07.md
m d24f73358c design(t-paliad-146): re-scope to PoC track — m-only + monitoring
m's reframing 2026-05-07 20:56: Paliadin is "mostly for myself now
but can be expanded — monitoring use." Two-stage shape replaces the
single-PR production-v1:

- Phase 0 (PoC): tmux-Claude pattern lifted from goldi/mVoice
  (mVoice/server.py:250-380). Claude Code window in a long-lived
  tmux session, prompts via tmux send-keys -l, response via
  /tmp/paliadin/{turn_id}.txt tail-f → SSE relay. Single user (m),
  m's laptop only (PALIADIN_ENABLED=false on prod). ~600-900 LoC,
  ~1 day. Migration 057 (PoC variant) stores full prompt + response
  for monitoring — no redaction at this scope.
- Phase 1 (production v1): the original §2-§6 Anthropic API design,
  GATED on PoC success per §0.5.7 expansion criteria (≥3 turns/wd,
  ≥50% tool-use rate, 4 weeks).

§0.5 (new) inserted as the load-bearing PoC spec. §7 leads with the
two-stage frame. §8.5 questions split into PoC-relevant (Q-PoC-1..6)
and production-v1-deferred. youpc case-law lookup promoted to
Q-PoC-6: m himself does case-law research, so include it from day
one (cross-schema SELECT into data.judgments is technically trivial
since paliad and youpc share the same Postgres).

What we drop for PoC: Anthropic API client, BYO-AI, rate limit,
token caps, multi-user RLS edge cases, /admin cost dashboard,
compliance disclosure, most i18n keys.

What we keep: system prompt voice, citation discipline (best-effort),
visibility gate (Claude is required to use paliad.can_see_project()
in queries), /paliadin surface, SSE shape, audit table.

The two-stage shape protects against the t-145 pattern: ship cheap,
observe, decide. No 4500-LoC investment based on m's gut feel about
adoption.
2026-05-07 20:59:46 +02:00

73 KiB
Raw Permalink Blame History

Design: Paliadin — in-app AI buddy / pet (t-paliad-146)

Status: READY FOR REVIEW (revised 2026-05-07 20:56 — PoC track inserted) Author: noether (inventor) Issue: m/paliad#9 Date: 2026-05-07 Branch: mai/noether/inventor-paliadin-in-app

Revision note (2026-05-07 20:56): m re-scoped this from "ship to HLC users" → "PoC for m, monitor usage, expand only if it earns it". The original Anthropic-API design in §2§6 is preserved as the production-v1 spec, but §0.5 (new) supersedes it for what gets built first: a tmux-Claude PoC lifted from goldi/mVoice, m-only on his laptop, with monitoring instrumentation as the load-bearing instrument for the expand/kill decision. §7 (Phasing) and §8.5 (Open questions) are revised to reflect the two-stage shape.


§0 TL;DR

A new conversational surface inside paliad: Paliadin, a Claudebacked assistant that answers questions grounded in the user's own paliad data and paliad's domain knowledge. The Paliadin is a longlived inprocess Go service, not a persession worker spawn — it talks to the Anthropic Messages API directly with tool use, where every tool is a thin shim over an existing paliad service (DashboardService, ProjectService, DeadlineService, CourtService, GlossaryService, DeadlineRuleService, AgendaService). RLS / visibility is enforced at the service layer, exactly as it is for the rest of the app, so Paliadin literally cannot see what the caller cannot see.

Phase 1 surface: dedicated /paliadin page + a sidebar entry under "Übersicht", serverside SSE stream of Anthropic's response (same shape paliad's parked t145 chat design specced), sessiononly conversation (no DB persistence in v1), 7 readonly tools, ~30 turns/hour rate limit per user, hard token caps (4 k input + 2 k output per turn), perrequest audit row (no full transcript v1 — store a redacted hash + token counts + toolcall list).

No avatar, no mascot SVG, no proactive onboarding popup in v1. Just a clean chat panel with the name "Paliadin" in the header. Mascot, drawer mode, persistent threads, writetools, and youpc.org caselaw lookup all deferred to Phase 2/3.

mlex / /lex-* reuse: pattern, not code. mLex turns out to be a workspace (extractions/, analysis/, docs/) — there is no Go/TS code to fork. The /lex-* skills are Claude Code instruction docs that drive Claude itself against youpc's MCP tools; they cannot be embedded in a paliad Go service. What carries over is the shape: tool catalog (search → fetch → cite), systemprompt voice (precise, citationbacked, flag uncertainty honestly), and the "every legal claim needs a citation" guardrail. §2.4 maps the carryover precisely.

Tradeoff flagged upfront (read §9.1 before approving): the same adoptionrisk concern that just parked the localchat design (tpaliad145, today 17:03) applies here. Paliadin's edge over "open ChatGPT in another tab" is only that it sees the user's own data — and that edge collapses if v1 doesn't make the datagrounding visible (citation chips, toolcall evidence) and explicit ("Paliadin sees only YOUR projects"). Without those, Paliadin is just a worse Claude. With them, it's the only Claude that can answer "welche Frist ist als nächstes auf dem MüllerVerfahren?".


§0.5 PoC track — m-only, monitored, expandable (REVISED 2026-05-07 20:56)

This section supersedes §2§7 for what actually gets built first. §2§6 stay valid as the productionv1 spec; they're picked up only if the PoC earns expansion.

0.5.1 Why the re-scope

m's reframing: "Paliadin is mostly for myself now but can be expanded — monitoring use." Two consequences:

  1. Single user (m) on m's laptop, not 38 HLC PAs on paliad.de. Multitenant concerns drop. RLS still matters because m's global_role=global_admin shouldn't let Paliadin sweep data across projects sloppily, but the crossuser PII surface goes to zero.
  2. The build is for m to feel the UX and decide whether to expand. That makes monitoring instrumentation loadbearing — it's the artefact that drives the next decision, not a compliance afterthought. PoC architecture: cheap to ship, expensive to not observe.

0.5.2 Architecture: lift goldi/mVoice tmuxClaude

Verified pattern in ~/dev/mVoice/server.py:250380 (and ~/dev/goldi/goldi/brain.py for the soul/prompt assembly). Working production code today on m's voice stack.

┌──────────────────────┐   POST /api/paliadin/turn         ┌────────────────────────────┐
│ Browser              │ ────────────────────────────────▶ │ paliad Go server (laptop)  │
│ /paliadin chat panel │                                    │                            │
│                      │ ◀──────── SSE stream ──────────── │  PaliadinService            │
└──────────────────────┘    (filetail of response)        │   ├─ ensure tmux session   │
                                                            │   ├─ tmux send-keys -l …   │
                                                            │   ├─ poll/tail            │
                                                            │   │  /tmp/paliadin/{tid} │
                                                            │   └─ audit row write      │
                                                            └──────────────┬─────────────┘
                                                                           │ tmux send-keys
                                                                           ▼
                                                            ┌────────────────────────────┐
                                                            │ tmux: paliad-paliadin     │
                                                            │   window: claude-paliad   │
                                                            │   $ claude  (interactive)  │
                                                            │     w/ system prompt +     │
                                                            │     mcp__supabase__*       │
                                                            │     scoped to paliad.*     │
                                                            └────────────────────────────┘

Lift verbatim from mVoice:

  • _ensure_voice_session()_ensure_paliadin_session(). Same tmux has-session / new-session / new-window / "wait for prompt" dance.
  • tmux_generate(prompt) → response → same shape, just reads via tailf instead of oneshot poll so we can stream deltas to the SSE consumer (see §0.5.5).
  • _reset_paliadin_session() for /clear — surfaced in the chat panel's "New conversation" button.

0.5.3 What we keep from §2§6 (it's still right)

Section Carryover Why it survives the rescope
§2.2.1 system prompt template ported as the first message sent into Claude after /clear The voice + guardrails (no fabrication, cite specifically, can't mutate) are exactly what we want. Just delivered via tmux send-keys instead of API system: field.
§2.5 tool catalog but as instructions, not as wrappers Claude already has mcp__supabase__execute_sql. The system prompt teaches it the read patterns ("to find m's pending deadlines: SELECT … FROM paliad.deadlines WHERE status='pending' AND paliad.can_see_project(project_id)"). Zero Go shim code; ~15 SQL recipes in the prompt.
§3.2 visibility gate The system prompt requires paliad.can_see_project(project_id) in every projectscoped query. Defence in depth: the supabase MCP runs with a service role, so RLS doesn't autogate — the prompt rule is the gate, and we crosscheck via audit (§0.5.6).
§4 surface placement (/paliadin full page + sidebar entry) Same UI shell.
§4.5 streaming + interruption adapted SSE stream still happens; backing source is tail -f /tmp/paliadin/{turn_id}.txt instead of Anthropic's stream events. Choppier but works.
§4.4 action chips ⚠ besteffort System prompt asks Claude to emit [#deadline-OPEN:c47bd2] markers; whether it does so reliably is an observation the PoC will surface.
§5.4 audit table (paliad.paliadin_turns) Reused for monitoring (§0.5.6). Added: pane_lines_captured so we can debug stream issues. Dropped: input_tokens/output_tokens (Claude Code doesn't expose these via the tmux interface — derive coarse cost estimate from elapsed time × Claude Code's published rates if we want it later).

0.5.4 What we drop for the PoC

Drop Reason
Anthropic Messages API client (anthropic.go) Replaced by tmux/Claude. Saves ~400 LoC.
Peruser rate limit (paliadin_rate_limit table) Single user. m's own restraint is the rate limit. Re-add at expansion.
Token caps + history truncation Claude Code manages its own context window.
BYOAI / OpenAI adapter Out of scope — m's prior message; punted.
Multiuser RLS edge cases (crossuser PII) Singleuser; not exercised.
Compliance disclosure on first use m → m's own Claude subscription. m has already accepted Anthropic's TOS.
/admin/paliadin cost dashboard One user; cost is m's monthly Claude bill.
Most i18n keys m switches DE/EN naturally; ~6 keys instead of ~25.

0.5.5 SSE shape adapted to tmux backing

Same event vocabulary as §4.5.1, fed by a goroutine that tails /tmp/paliadin/{turn_id}.txt and emits content_delta events as new bytes arrive. Tradeoffs:

  • Latency to first token: ~38 s (Claude Code "thinking" before first write). Worse than native API streaming. Mitigation: surface a "Paliadin denkt nach …" placeholder bubble until the first byte arrives.
  • No native toolcall events. Claude Code does its tooluse internally; we see only the final text written to the response file. To still surface "ran search_my_deadlines (3 results)" evidence, the system prompt instructs Claude to write a structured trailer block at the end of its response: \n\n---\n[paliadin-meta]\nused_tools: search_my_deadlines, lookup_court\nrows_seen: 3, 1\n[/paliadin-meta]\n. Frontend strips that block and renders it as the citation evidence row. Brittle but observable; this is the kind of thing the PoC's monitoring is for.
  • Heartbeat: still emit event: ping every 25 s so the SSE connection survives any reverse proxy. (Not strictly needed on localhost but keeps the production migration cheap.)

0.5.6 Monitoring instrumentation — the loadbearing artefact

Because the whole point of the PoC is "watch m use it", the audit shape is the most important thing in the PoC ship.

Migration 057 (PoC variant):

CREATE TABLE paliad.paliadin_turns (
    turn_id            uuid PRIMARY KEY,
    user_id            uuid NOT NULL REFERENCES paliad.users(id),
    started_at         timestamptz NOT NULL DEFAULT now(),
    finished_at        timestamptz,
    duration_ms        int,                   -- end - start
    user_message       text,                  -- FULL prompt (monly PoC; redact at expansion)
    response           text,                  -- FULL response (same)
    response_tokens    int,                   -- approx via word count × 1.3
    used_tools         text[],                -- parsed from [paliadin-meta] trailer
    rows_seen          int[],                 -- parallel to used_tools
    chip_count         int NOT NULL DEFAULT 0,
    abandoned          boolean NOT NULL DEFAULT false,  -- user closed mid-stream
    page_origin        text,                  -- which paliad page m was on when he asked
    error_code         text,                  -- 'tmux_unresponsive', 'pane_died', 'user_aborted', NULL on ok
    classifier_tag     text                   -- coarse self-classification: 'data', 'concept', 'navigation', 'meta', 'other'
);

CREATE INDEX paliadin_turns_started_idx
    ON paliad.paliadin_turns(started_at DESC);

Critical departure from the production design: at PoC scope we DO store the full prompt + response. m is the only user, m is m's own compliance officer, and the whole point is to read what was asked later. Redaction returns at expansion.

/admin/paliadin page (PoC variant) renders:

  • 7day rolling turn count + median/p90 duration.
  • Histogram by classifier_tag (so m sees: "60 % of my queries were 'data', 25 % 'concept', 10 % 'navigation', 5 % 'meta'" — that's the usecase shape).
  • Top 10 prompts by frequency (textually similar grouping via simple normalised string match — fancy clustering is Phase 1 expansion).
  • Tooluse rate (turns where used_tools is non-empty / total turns). Loadbearing for the expansion decision — see §0.5.7.
  • Abandonment rate (abandoned=true / total).
  • Daily usage sparkline.

The classifier_tag is set by Claude itself in the [paliadin-meta] trailer, instructed by the system prompt — same brittleness caveat as the tooluse evidence.

0.5.7 The expansion gate — what triggers production v1?

m decides; this section gives m the metric set he asked for. Suggested greenlight criteria after 4 weeks:

  1. Sustained use: ≥ 3 turns/workingday average over weeks 34.
  2. Datagrounded use: tooluse rate ≥ 50 % (otherwise Paliadin is being used like ChatGPT and there's no differentiation argument for the production build).
  3. Useful by m's own gut. No metric beats this; the dashboard helps m frame it but doesn't decide for him.

Yellow flag criteria (interesting but not green):

  • < 1 turn/day → m isn't using it; either kill or rebuild the affordance to be more discoverable.
  • Tooluse rate < 30 % → the value isn't in the data grounding; reconsider the whole premise.
  • High abandonment rate → UX issue (latency? wrong answers? broken streaming?). Investigate before expansion.

Kill criteria:

  • m looks at the dashboard 4 weeks in and shrugs.
  • Frequent tmux session deaths or /clear-too-often patterns suggest the architecture is fighting m. PoC failure ≠ Paliadin failure; might be the tmux pattern's failure.

0.5.8 PoC scope — what gets built

Item In PoC
internal/services/paliadin/tmux.go (lifted + adapted from mVoice/server.py:250380)
internal/services/paliadin/prompt.go (system prompt template + [paliadin-meta] trailer rule)
internal/services/paliadin/sse.go (filetail → SSE relay)
internal/handlers/paliadin.go (POST /turn, GET /stream/{id}, /paliadin shell page, /admin/paliadin dashboard)
Migration 057 — PoC paliadin_turns (full prompt + response stored)
frontend/src/paliadin.tsx + client/paliadin.ts (chat panel, EventSource, chip parser, "Stop"/"New" buttons)
frontend/src/admin-paliadin.tsx + .ts (the monitoring dashboard)
Sidebar entry under Übersicht with ICON_SPARKLE
~6 i18n keys (DE+EN)
PALIADIN_TMUX_SESSION env var (default paliad-paliadin), PALIADIN_RESPONSE_DIR (default /tmp/paliadin), PALIADIN_ENABLED (default false on prod, true on m's laptop)
Hard guard: if PALIADIN_ENABLED=false (paliad.de prod default) the routes are not even registered. PoC stays on m's laptop, full stop.

Estimated scope: ~600900 LoC. ~1 day of coder work. Same singlePR pattern as t144 / t-145.

0.5.9 What stays unbuilt (production v1, see §2§6)

The Anthropic API client, the 7 Go tool shims, the peruser rate limit, the encryptedkey BYOAI surface, the redacted audit, the multireplica SSE bus — all of it. Picked up only if §0.5.7's expansion gate fires.

The twostage shape protects against the t145 pattern: ship cheap, observe, decide. No 4500LoC investment based on m's gut feel about adoption.


§1 Premises verified live (2026-05-07)

Before designing on top, I checked each loadbearing claim against the running system rather than CLAUDE.md / memory.

Claim Source Verification
mLex is a workspace, not a code repo issue framing "mlex project we could partially reuse" ~/dev/mLex/ contains only extractions/, analysis/, docs/, plus CLAUDE.md + AGENTS.md. No *.go, no package.json, no tools that aren't Claude skills. The "code" is the /lex-* skill family in ~/.claude/skills/, which is instruction docs driving Claude against mcp__youpc__* MCP tools. Carryover is shape (system prompt, tool catalog, citation style), not adapters.
/lex-* skill family brief reference ~/.claude/skills/{lex-research,lex-extract,lex-classify,lex-classify-patent,mai-lexy}/SKILL.md. All five inventoried in §2.4.
Paliad has no anthropic / claude code CLAUDE.md ANTHROPIC_API_KEY "do not set" row grep -ri anthropic ~/dev/paliad/internal ~/dev/paliad/cmd → only internal/branding/firm.go comment unrelated to AI. go.mod has no anthropic-sdk-go dep. This task undefers the env var; CLAUDE.md row needs updating in the same PR.
Paliad has no SSE pattern shipped substrate scan grep -rn 'http.Flusher|text/event-stream' internal/ returns only references inside the parked t145 chat design doc — no live code. We bring our own.
Paliad and youpc share the same physical Postgres infra Both run on 100.99.98.201:11833 (port 11833 = ydb). Paliad's schema is paliad; youpc's is data. A future "search UPC case law" tool would be a sameDB crossschema SELECT, not an HTTP hop — but Phase 1 still excludes caselaw lookup (see §3).
Visibility is enforced at service layer (not via SET LOCAL auth.uid) code internal/services/visibility.go defines visibilityPredicate(alias) + visibilityPredicatePositional(alias, idx); every projectscoped query inlines it. Paliadin's tools call existing services, inheriting the predicate.
paliad.can_see_project() is the canonical visibility function in DB (RLS, t139) t139 migration 055 internal/db/migrations/055_hierarchy_aggregation.up.sql:144 CREATE OR REPLACE FUNCTION paliad.can_see_project(_project_id uuid). Same predicate echoed in services/visibility.go.
Migration tracker is at 56 (056_user_views) t144 A1 paliad_schema_migrations row. Next migration is 057. (t145 was parked before its 057_chat shipped, so 057 is open.)
tpaliad145 (local chat) was parked today 2026-05-07 17:03 memory + commit log Commit 99f08e3 "Merge: t-paliad-145 design doc only — local chat feature PARKED per m's call". The chat SSE substrate that would have been shared is not built — Paliadin builds its own minimal stream.
Sidebar bell pattern (sidebar-inbox-badge) is reusable for a chatstyle entry t138 frontend/src/components/Sidebar.tsxnavItem(href, icon, i18nKey, label, currentPath, badgeID?) already takes an optional badge id. The same plumbing fits a Paliadin entry.
Sidebar ICON_SPARKLE already exists UI scan frontend/src/components/Sidebar.tsx defines ICON_SPARKLE (a star/sparkle SVG). Free icon for the Paliadin nav item.
auth.UserIDFromContext(r.Context()) is the standard handlerside user lookup code internal/handlers/dashboard.go:31 is the canonical pattern. Paliadin handlers will use it.
branding.Name (default "HLC") is the firmname source tpaliad065 internal/branding/firm.go reads FIRM_NAME once at boot. Paliadin's system prompt + greeting must use branding.Name, never hardcode "HLC".
Single web replica on Dokploy today docker-compose.yml One web service. SSE state inprocess is fine v1; multireplica migration deferred along with chat.

Docvslive conflicts encountered (must be fixed in the implementation PR):

  1. CLAUDE.md still says ANTHROPIC_API_KEY is "Reserved for Phase H (AI FristExtraktion) which is deferred per m's 2026-04-16 decision. Do not set." Paliadin undefers it. The CLAUDE.md row needs to flip to "Required for Paliadin (readonly Claude assistant) — set on Dokploy."
  2. The earlier "do not want anthropic API" decision (memory b6a11b55…, 2026-04-16) was specifically about Frist extraction from documents. Paliadin is a different surface (interactive readonly Q&A over alreadystructured data). It does not silently revive the parked extraction feature — tpaliad011 stays blocked unless m explicitly unparks it too.

§2 Sub-design A — LLM architecture, prompt, tool use, mlex/lex reuse

Answers Q1, Q2, Q3, Q4, Q17, Q18.

2.1 LLM provider (Q1)

Recommendation: Anthropic Claude, single provider, accessed directly via the Messages API. Lock to Claude in v1; abstract behind a onefunction interface so future portability is cheap.

Provider v1? Why
Anthropic Claude (Messages API + tool use) Matches m's "wire into my claude" framing. Tooluse shape is mature. Streaming via SSE is native. Paliad already has ANTHROPIC_API_KEY reserved.
Mixed (Claude reasoning + smaller routing model) Premature optimisation; for ~30 turns/hour/user we don't need the routing layer. Singlemodel latency is fine.
OpenAI / open weight No HLC compliance review for those vendors; m's Anthropic key is on file.

Model selection within Anthropic: default to Claude Sonnet 4.6 (fast, toolusecapable, cheap enough for chat use). Allow override via PALIADIN_MODEL env var so we can drop down to Haiku for cost or up to Opus for tricky onboarding sessions without redeploying.

Wire shape: one Go HTTP client (internal/services/paliadin/anthropic.go) that POSTs /v1/messages with stream: true. We do not adopt github.com/anthropics/anthropic-sdk-go in v1 — the API surface we use (one streaming POST + tooluse loop) is small enough that a handrolled client is shorter than wiring the SDK and safer than depending on a Go SDK that has historically broken on minor version bumps in mAi's experience. Keep the option open for Phase 2 if the tokenaccounting / structured tooluse helpers in the SDK become attractive.

// internal/services/paliadin/anthropic.go
type AnthropicClient interface {
    Stream(ctx context.Context, req MessagesRequest, w StreamWriter) (Usage, error)
}

The interface is the only swappoint. Switching providers later means a new implementation, not a rewrite.

2.2 System prompt + message shape (Q2)

Recommendation: single system prompt with paliad context + tool definitions; one persistent prompt across pages (no perroute system prompts in v1).

2.2.1 System prompt (locked, v1)

The system prompt is computed at process start from branding.Name, the user's locale (DE/EN), the user's display_name, the current date, and the visibleproject count (a single count, not the project list — keeps the prompt small). Computed per request, not per process — but its template is a constant.

You are Paliadin, an AI assistant inside {{firm}}'s patent practice
platform "Paliad". You help {{display_name}} ({{office}}) answer
questions about their own work in Paliad and about UPC / EPO / DPMA
patent practice.

Today is {{today}}. The user's display language is {{language}}; reply
in {{language}} unless the user switches midconversation.

You have readonly access to the following tools:
- whats_on_my_plate     — the user's dashboard (deadline / appointment / matter buckets)
- list_my_projects      — every project the user can see
- get_project_detail    — full detail of one project (deadlines, appointments, parties, partner units)
- search_my_deadlines   — filter the user's deadlines by status / date / project
- list_my_appointments  — the user's upcoming appointments (next 30 days by default)
- lookup_court          — Paliad's catalog of patent courts (UPC LDs, German LGs/OLGs/BGH, EPO, DPMA, ...)
- lookup_glossary_term  — Paliad's bilingual patent glossary
- lookup_deadline_rule  — Paliad's Fristenrechner concept tree (named deadline rules + their triggers)

Hard rules:
1. Never invent facts. If a tool returns nothing, say so. Do not guess
   case numbers, deadline dates, court names, or party names.
2. Every concrete factual claim about the user's work MUST come from a
   tool call in the current conversation. Cite using "[#deadline-XXXX]",
   "[#projekt-XXXX]", "[court: Munich LD]", "[glossary: Klageerwiderung]"
   so the UI can render citation chips.
3. You cannot mutate any data. If the user asks you to change something,
   explain that v1 is readonly and point them to the right page in
   Paliad.
4. Visibility is enforced before tools return — if your tool call comes
   back empty, the data either doesn't exist OR the user can't see it.
   Never disclose the latter; just answer "I couldn't find anything
   matching that".
5. You cannot answer questions about other users' projects, even if the
   user names them.
6. Respect the user's role. If the user has global_role=standard, do not
   speculate about adminonly functions.

Style:
- Direct, professional, slightly warm. Lawyeradjacent.
- Reply in Markdown. Use lists, code blocks, blockquotes.
- Cite specifically (case numbers, dates, court names) — never "around
  the 14th".
- When uncertain, flag it. ("I don't see a deadline matching that
  description on the projects you can access.")
- No emojis unless the user uses one first.

You are NOT:
- A codewriting assistant
- A replacement for legal advice
- A web search

This is ~250 input tokens — well under the budget.

2.2.2 Permessage envelope

The browser POSTs to /api/paliadin/turn with { session_id, user_message, history }, where history is the prior turns in the current session only (session = browser tab; localStorage backs it). The server prepends the system prompt and runs the tooluse loop.

2.2.3 Tool use vs RAGonly (Q2 secondary)

Tool use, not RAG. RAG (vector search over chunks of paliad content) is the wrong shape for this surface — paliad data is highly structured, the most useful answers come from filtered SQL queries (e.g. "all deadlines on my projects with status='pending' and due_date<=now()+7d"), and a vector store would just paraphrase what an SQL query returns more accurately. Tools give the model the same query power the user has, with hard visibility gates. Phase 2 may add RAG over a small static corpus (HL Patents Style guide, Paliadin docs) if onboarding queries don't get good answers from glossary lookups alone.

2.3 Longlived service vs lexystyle worker spawn (Q4)

Recommendation: longlived Go service (inprocess) — not a persession Claude Code worker.

Option Latency to first token Cost / turn Operational shape
Inprocess Go service calling Anthropic API directly < 1 s (just network + queueing) Pay only for the model tokens we use Single binary, single Postgres conn, scales with paliad
mai hire paliadin per session (Claude Code worker) 515 s Worker startup overhead × N concurrent sessions × Claude Code's own context overhead Operational footprint of running a worker per active user — dozens of tmux panes, tasks, reports

The lexy / cassandra worker pattern works because it's batch: classify N judgments, emit JSON, exit. A chat surface needs subsecond response times across dozens of HLC users in parallel. A ClaudeCodepersession pattern would give each user their own Claude in the loop, with all the tooling and messagebus scaffolding that implies — wrong scale of abstraction.

That said, two things from the worker pattern do carry over:

  1. Systemprompt voice. The lexy / mai-lexy SKILL.md persona ("Sharp, analytical, direct. Cites provisions and case law naturally. Flags uncertainty honestly.") is the right voice for Paliadin. We borrow it — see §2.2.1.
  2. Tool catalog shape. The lex-research SKILL.md tool list (search → fetch full text → enrich → analyse → cite) maps cleanly onto Paliadin's read tools — see §3.

2.4 mlex / /lex-* carryover map (Q3, Q18)

Inventory result, with the shapevscode split called out for each:

Skill / asset What it does Carryover to Paliadin
~/dev/mLex/ (workspace) extractions/ (percase JSON), analysis/ (markdown reports), docs/ (legal references), extractions/queue.json None as code. Workspace artifacts are the output of the skills — they don't give us anything embeddable.
lex-research skill UPC case law search → analysis report. Tool catalog: mcp__supabase__execute_sql, mcp__youpc__*, mcp__youpc-memory__*. Output format: structured markdown with citation tables. Voice + toolcatalog shape. "Search → enrich → analyse → cite" is the Paliadin flow. The skill's outputformat conventions (case number on first mention, division comparison tables) seed the system prompt's style guidance.
lex-extract skill Read full judgment text → structured holdings / principles / interpretations JSON. Not v1. Phase 2 candidate iff Paliadin gets a extract_judgment(node_id) write tool — orthogonal to readonly v1.
lex-classify skill Classify judgments against a 47leaf taxonomy. Not v1. Same as above — writesurface, batchshaped, irrelevant to interactive Q&A.
lex-classify-patent skill Classify patents into IPC technology sectors via Anthropic. Pattern reference only. It's already an Anthropicbacked pipeline, so its prompt structure is a working example we can crib from for the systemprompt template — but the actual classification target is paliadirrelevant.
mai-lexy skill Lawyer persona that orchestrates the above. "Citationbacked, flags uncertainty." Voice template. The persona text is the closest thing to a working Paliadin system prompt; §2.2.1 borrows directly from it.
claude-api skill Anthropic SDK / Messages API patterns + prompt caching guidance. Implementation reference for the Go client + caching strategy. §6.4 picks up its prompt caching guidance.

Antireuse: the mcp__youpc__* MCP tools that lex-research uses are designed for an interactive Claude Code session. Paliadin's tools must instead be Go service calls — same data shape, different transport. Don't try to embed an MCP client in a paliad Go process; rebuild the same SQL queries against the same Postgres directly.

2.5 Tool catalog v1 (Q17)

Seven readonly tools. Each is a thin Go shim around an existing service; each enforces visibility through that service's existing visibilityPredicate.

Tool name Backing service / method Inputs Output (truncated to fit budget)
whats_on_my_plate DashboardService.Get(userID) none {deadline_summary, appointment_summary, matter_summary, upcoming_deadlines[≤10], upcoming_appointments[≤10], recent_activity[≤10]}
list_my_projects ProjectService.ListVisible(userID, filter) optional {status, kind} [{id, kind, label, status, parent_id, path}] paged 25
get_project_detail ProjectService.Get(userID, id) + DeadlineService.ListByProject + AppointmentService.ListByProject + PartyService.ListByProject + DerivationService.AttachedUnits {project_id} {project, deadlines[≤25], appointments[≤25], parties[≤10], partner_units[≤5]} — 503 if user can't see it (LLM gets a clean "not found", same response as truly missing)
search_my_deadlines new helper on DeadlineService (reuses visibilityPredicate) {q?, status?, project_id?, due_after?, due_before?, limit≤25} [{id, title, due_date, status, project_label, court}]
list_my_appointments new helper on AppointmentService {from, to, project_id?} [{id, title, start_at, end_at, location, project_label}]
lookup_court CourtService.Search(q) (firmwide; no visibility filter — courts are reference data) {q} [{slug, name, country, kind, address, vacation_periods[≤4]}] truncated 10
lookup_glossary_term static JSON loader (internal/handlers/glossary.go data) {q, lang?} [{de, en, definition, category}] top 5
lookup_deadline_rule DeadlineRuleService.SearchConcept(q) {q} [{rule_code, concept_label, trigger_event, deadline_text, legal_source}] top 5

Bumped out of v1 (Phase 2 candidates):

  • list_my_pending_approvals (the inbox bell payload) — useful but adds RLS surface; let v1 stabilise first.
  • search_youpc_case_law — m's framing example, but crossschema → bigger blast radius. Phase 2 once Paliadin proves its weight on paliadinternal data.
  • search_my_audit_log — high signal but PII heavy.
  • compute_frist — would invoke the existing DeadlineCalculator. Useful but the user can already do this on /tools/fristenrechner; defer until we see queries that actually want it.
  • All write tools (create_deadline, attach_partner_unit, etc.) — Phase 3 minimum, with hard confirmation gate (see §6).

2.6 The tooluse loop (Q2 tertiary)

Standard Anthropic tooluse loop:

1. Build messages = [system, ...history, user_message]
2. POST /v1/messages with tools=[...catalog]
3. Stream assistant reply chunks → relay to client SSE
4. If stop_reason == "tool_use":
     for each tool_use block:
        execute tool(input) on the matching Go service
        emit tool_result block back into messages
     goto 2 (with the same stream/SSE connection)
5. If stop_reason == "end_turn": close stream

Hard cap on the loop: ≤ 5 toolcall rounds per turn. After 5 rounds without end_turn, forceclose with "Sorry, I got stuck — try rephrasing." Hitting the cap is a UI red flag we want to see in audit (see §6.3).


§3 Sub-design B — Data access, RLS, PII

Answers Q5, Q6, Q7.

3.1 Knowledge sources for v1 (Q5)

Recommendation: paliadinternal data + paliad's static reference data ONLY. youpc.org case law deferred to Phase 2.

Source v1 Reason
Peruser paliad data (deadlines, appointments, projects, parties, partner units, attached units) The whole point of Paliadin. Visibility enforced via visibilityPredicate (every backing service already does this; tool inherits it).
Static reference data in paliad (court catalog t122, glossary, deadline rules, Fristenrechner concept tree) Firmwide, no peruser gating, low blast radius.
UPC case law (youpc Postgres data.judgments, data.judgment_markdown_content) Phase 2 Crossschema SELECT is technically trivial (same Postgres) but: (a) inflates the v1 surface; (b) brings in 1700+ judgments → scaling RAG/fulltext question; (c) m's framing called out research as a use case, not a v1 musthave. Ship paliadinternal Q&A first; layer caselaw on once the substrate is proven.
HL Patents Style guide / Paliad onboarding docs Phase 2 No internal corpus exists yet; would need docsauthoring + indexing. The lookup_glossary_term tool already covers the most common onboarding question shape ("was bedeutet X?").
External web search Out of scope; Paliadin is a grounded assistant, not a web surfer. m can use the regular Claude for that.

Ranking inside the v1 set (when Paliadin has to choose):

  1. Userdata tools first when the question references "my", "the case", "the deadline", or names a project / case number that resolves.
  2. Static reference next when the question is conceptual ("what's a Klageerwiderung?", "which court is the Munich LD?").
  3. Combine when both apply ("when is my Klageerwiderung due?" → lookup_deadline_rule for the rule + search_my_deadlines for the user's instance).

The system prompt names tools in this priority order; the model's toolselection follows.

3.2 Auth / visibility boundary (Q6)

The gate: every backing service already runs visibilityPredicate(alias) against the caller's UUID. The Paliadin tool shim is a 5line wrapper that calls the service with userID derived from auth.UserIDFromContext(r.Context()) at the SSE handler boundary. There is no servicerole escape — the shim simply has no other UUID to pass in.

Beltandbraces: every tool result is inspected for project_id columns; for each distinct project_id, the shim asserts paliad.can_see_project(_project_id) returns true. (Defenceindepth: catches any future servicelayer regression where someone forgets the predicate. Costs one extra cheap function call per tool turn; cheap.)

The "tell, don't disclose" rule (§2.2.1 hardrule 4): if the user names a project they cannot see, the tool returns {error: "not found"} — same response as a project that doesn't exist. The system prompt instructs the model to say "I couldn't find anything matching that" without distinguishing the two cases. This is the same rule the t144 ViewService already applies.

Crossuser PII in tool outputs: tool outputs may legitimately contain other users' display names (e.g. project teams, deadline assignees). These are visible to the caller through the regular UI already, so disclosing them through Paliadin is no worse. We do NOT redact them.

Approval / partnerunit derivation: get_project_detail returns the derived team (per t139 DerivationService.AttachedUnits). Same predicate as the rest of the app.

3.3 PII handling, retention, encryption (Q7)

v1 stance: minimum viable persistence, maximum auditability of the access pattern.

Data Stored where Retention Encryption Notes
Conversation history (the actual messages) Browser localStorage only. Cleared on browser data wipe / reloadwithfreshsession. Session only n/a Phase 2: optin DB persistence with retention controls.
Perrequest audit row New paliad.paliadin_turns table Forever (matches auditlog pattern; softdelete only) Atrest by Postgres / Supabase volume encryption Stores: turn_id, user_id, started_at, finished_at, model, input_tokens, output_tokens, tool_calls (jsonb of tool names + arg hashes — NOT arg values), prompt_hash (sha256 of redacted user message), error_code. No prompt body, no completion body.
Toolcall inputs (e.g. project_id arguments) Hashed (sha256) into the audit row's tool_calls jsonb Forever n/a The hash is enough to detect "this user kept asking about project X" patterns without storing the readable id.
Anthropic API request/response bodies Not stored. Streamed through the Go service straight to the SSE writer. n/a TLS in flight Anthropic's own retention is governed by the org's API contract — pulling Paliad onto an existing HLC enterprise key would inherit that.

Why this shape:

  • Compliancelite v1. HLC's compliance team has not yet weighed in on AImediated PII (memory says the Phase H decision was "we don't want anthropic API… for a while"). Storing the full transcript opens a retention/disclosure question we don't need to answer to ship Paliadin's MVP. The auditmetadata row is enough to demonstrate: (a) who used it, (b) how often, (c) what tools they triggered, (d) cost.
  • Phase 2 transcript persistence would add a paliadin_messages table (turn_id FK, role, content, redact_marks jsonb) and a peruser setting "keep my history". Default off.
  • Why no PII redaction in the user prompt? v1 is optin (the user typed the prompt). Redacting client names / case numbers in the audit hash would defeat the point; we redact by not storing the prompt, only its hash.

The Anthropic side: if HLC's enterprise contract forbids vendorside retention, the Go client must set metadata: {user_id: "<hash>"} and ensure the API call is on an org with zeroretention guarantees. Open question for m: which Anthropic key are we using — m's personal key (existing ANTHROPIC_API_KEY precedent in mAi/youpcms) or a new HLC enterprise key? This is the single biggest compliance question; see §9.2.


§4 Sub-design C — UX

Answers Q8, Q9, Q10, Q11, Q12.

4.1 Surface placement (Q8)

Recommendation (counter to brief): start with a dedicated /paliadin fullpage route + a sidebar entry under the "Übersicht" group. Defer the rightdrawer to Phase 2.

Option v1? Why
/paliadin full page + sidebar entry Lowest CSS risk; mobileresponsive for free (paliad's existing breakpoints work); easy to test via Playwright; matches paliad's "every feature is a toplevel page" pattern; no zindex / overlay debugging.
Rightdrawer slideout from any page Phase 2 Pretty, matches m's "panel docked into UI" framing — but adds: drawer toggle wiring on all 30 pages, scrolllock interaction, focus management, mobile smallscreen fallback. Not worth the v1 surface area. Phase 2 wraps the same /paliadin UI in a slideout container.
Floating bottomright bubble Clippy comparison is visual, not positional. A floating overlay on every page collides with the BottomNav on mobile (already 5/5 slots) and the inbox bell on desktop.
Pageembedded panel on /paliadin only This is the v1 recommendation, just framed differently.

Sidebar entry:

Übersicht
  Start
  Agenda
  Inbox 🛎
  Paliadin ✨   ← new, ICON_SPARKLE

Group placement under Übersicht (not under Tools or Wissen) because Paliadin is conversation about the user's work, not a knowledge tool.

Mobile: Paliadin is reachable via the sidebar drawer (existing mobile pattern). No BottomNav slot — those are full and the ranking (Start / Projekte / + / Agenda / Menü) is more important than a chat shortcut for v1.

4.2 Avatar / personality (Q9)

Recommendation: no avatar SVG in v1. Just a chat panel with the name "Paliadin" in the header. Mascot is Phase 2.

Why:

  • Mascot design is a real design exercise (34 iterations to get something that doesn't read as kitsch in a law firm). Not inventor's call to bash one out in a v1 ship.
  • The brand cue (limegreen #c6f41c accent) is enough to make Paliadin feel like part of paliad without a character.
  • Paliadin's personality lives in the system prompt (§2.2.1), not in pixels. Voice carries the buddy framing; mascot makes it visual but isn't loadbearing.

What we ship in v1 instead:

  • Header: " Paliadin" (sparkle icon + name) above the chat panel.
  • Emptystate prompt: "Was kann ich für dich tun?" (DE) / "How can I help?" (EN).
  • Oneline tagline under the header: "Ich kenne deine Akten und Paliads Wissensbasis." (DE) / "I know your matters and Paliad's knowledge base." (EN). This is the only v1 affordance that explicitly tells the user "I see your data" — loadbearing for the differentiation argument in §0/§9.1.

Phase 2 mascot brief (for when m greenlights it): small SVG, friendly, limegreen primary, no eyesdarting / animatedonidle (creepy), modular pose set so it can react to "thinking" / "found it" / "stuck" without being an MMORPG pet.

4.3 Onboarding hint (Q10)

Recommendation: silentuntilinvoked. No proactive popup, no firstrun modal, no toast.

Why:

  • Paliad already has a polished onboarding flow (tpaliad034). Adding a Paliadin popup on top would be the kind of "surprise the user" affordance that erodes trust the first time it misfires.
  • The emptystate inside /paliadin itself is the right onboarding surface: 3 starterprompt buttons rendered when the chat is empty.

Three starter prompts (DE primary):

  1. "Was steht heute an?" → triggers whats_on_my_plate
  2. "Welche Fristen sind diese Woche fällig?" → triggers search_my_deadlines with due_before=now()+7d
  3. "Erkläre mir Klageerwiderung." → triggers lookup_glossary_term + lookup_deadline_rule

EN equivalents: "What's on my plate?" / "Which deadlines are due this week?" / "Explain Klageerwiderung."

Picking one from the row sends it as if the user typed it. Keeps the surface zeroweight when ignored.

Phase 2 candidate: postonboarding email / inbox card "Paliadin ist live, frag ihn was deine Daten dir sagen." Driven by the existing reminder/email substrate. Out of v1 scope.

4.4 Action chips in responses (Q11)

Recommendation: action chips parsed from a simple inline syntax in the model's reply, rendered clientside, NOT a tool the model invokes.

Why simple syntax over a tool: tool invocations cost a roundtrip; we want the model to "suggest" an action without paying for an extra tool turn. The model emits a structured marker in its prose; the frontend client parses it and renders a chip below the bubble.

Marker format:

[#deadline-OPEN:c47bd2]
[#projekt-OPEN:slug-x]
[#frist-OPEN:c47bd2]
[#termin-OPEN:abc123]
[chip:nav:/projects/abc-123]   (for arbitrary navigation)
[chip:filter:status=pending&due=this_week]   (for parameterised inbox links)

The system prompt teaches the model to emit chips when navigation or filtering would help the user act on the answer. Each marker resolves to one chip, rendered as:

┌──────────────────────────────────────┐
│ Frist 16.05.2026 fällt morgen.       │
│ [Frist öffnen] [Akte ansehen]        │
└──────────────────────────────────────┘

Client parser (frontend/src/client/paliadin.ts): regex over the streamed text, replaces marker with a button. Buttons are real <a> elements (Cmdclick works, keyboard works), styled like the existing .entity-table row chips.

Why not let the model embed full URLs? Two reasons:

  1. URLs change (we renamed /akten/projekte midproject). Markers are stable; we resolve them at render time.
  2. Hallucinated URLs are real risk. If the model can only emit a marker tied to an id we know it just retrieved, the chip can't navigate to a fake page.

4.5 Streaming + interruption (Q12)

Recommendation: SSE stream from /api/paliadin/stream, client EventSource, userinitiated abort via "Stop" button.

4.5.1 Stream shape

Mirrors Anthropic's native streaming events, adapted for our SSE consumer:

event: meta
data: {"turn_id":"01H…","model":"claude-sonnet-4-6"}

event: content_delta
data: {"text":"Auf der Akte Müller…"}

event: tool_call
data: {"name":"search_my_deadlines","args_hash":"…","status":"running"}

event: tool_result
data: {"name":"search_my_deadlines","status":"ok","summary":"3 results"}

event: content_delta
data: {"text":"… ist die Klageerwiderung am 16.05. fällig."}

event: chip
data: {"kind":"deadline","action":"open","id":"c47bd2"}

event: end
data: {"input_tokens":342,"output_tokens":88,"tool_calls":1}

# heartbeat every 25 s to keep Traefik from reaping
event: ping
data: {}

The tool_call / tool_result events are visible in the UI as small dim "ran search_my_deadlines (3 results)" lines under the bubble — the citation evidence that distinguishes Paliadin from a generic chatbot. (Direct quote from the §0 framing: "the differentiation collapses if v1 doesn't make the datagrounding visible.")

4.5.2 Interruption

  • "Stop" button next to the input. Click → EventSource.close() + fetch('/api/paliadin/stream/{turn_id}/abort', {method:'POST'}).
  • Server abort closes the upstream Anthropic request via context cancellation.
  • Stopped turns still write an audit row with error_code='user_aborted' so we see how often users hit it.

4.5.3 Reconnect

Same LastEventID resume pattern the t145 chat design specced. Server keeps the inflight stream buffered for 30 s after disconnect; reconnect within that window replays missed events. After 30 s, the turn is considered done — reconnect arrives at the start of a fresh session.


§5 Sub-design D — Token budget, cost, audit

Answers Q13, Q14, Q15, Q16.

5.1 Perrequest token cap (Q13)

Recommendation: max_input_tokens=4000 (model's view of input including system + history + tool defs + user msg) and max_tokens=2000 (model's max output) — same as brief. Hardfail above; softtruncate history below.

Rationale:

  • A typical paliad data tool result is < 500 tokens (truncated lists, capped at 25 rows). Even with system prompt (~250) + tool defs (~600) + 5 prior turns (~600 each on average) the input stays well under 4 k.
  • If the conversation runs long (~8+ turns), the client/server softtruncates history (drops oldest user/assistant pairs first) before sending. The user sees a "Earlier in this conversation, we discussed X (truncated)" pseudosystem message. Cleaner than failing the turn.
  • Hard cap at 6 k input tokens — over that, refuse the turn with "Conversation too long, start a new one." Defends against jailbreak attempts that try to balloon the prompt.

Cost math at Sonnet 4.6 perturn typical (3 k input, 1 k output): ~$0.012/turn. At 30 turns/hour/user × 38 onboarded HLC users × 5 working hours/day = 5 700 turns/day = **$70/day worst case**. Realistic load is probably 10× lower. Phase 2: prompt caching (§5.4) drops it further.

5.2 Conversation history persistence (Q14)

Recommendation: sessiononly in v1. Persistent threads in Phase 2.

Option v1? Why
Sessiononly (browser localStorage, cleared on tab close + Sign Out) Zero schema. Zero retention question. Aligns with §3.3 "minimum viable persistence." Lets us ship paliadin without compliance review of stored transcripts.
Persistent threads (DBstored, named) Phase 2 Real schema (paliadin_threads, paliadin_messages), retention policy, crossdevice sync, "delete my history" UX, possibly optin toggle. None of which is needed to validate "is Paliadin actually useful".

Edge case: page reload during a conversation. localStorage persists the history for that browser tab. Closing and reopening the tab restores. Closing the browser & reopening also restores. Signout clears. Multidevice = different histories. We're explicit about this in the panel header: "Conversation lives in this browser only" tooltip.

Why opt for slightly worse UX over the easy schema work: the tpaliad145 chat just got parked over an adoptionrisk concern, not a schema concern. Paliadin should ship the smallest possible footprint that proves usefulness. Persistent threads can be a "you asked for this" Phase 2.

5.3 Rate limit per user (Q15)

Recommendation: 30 turns/hour/user (slightly tighter than the brief's 50). Plus a global ceiling of 1 000 turns/hour across the firm. Both configurable.

Peruser 30/hour because:

  • 30/hour ≈ one turn every two minutes during sustained use. That's heavy use. A reasonable user asks 35 questions in a session.
  • Soft hint at 25 ("you've used 25 of 30 messages this hour"), hard block at 30 with retryafter.
  • Lower than 50 to give us a safety margin for runaway cost in week 1; we can raise it once we see real usage.

Global 1 000/hour ceiling because:

  • Global cap = circuit breaker against the long tail (a script that sends 1000 turns/hour from one user we missed in the peruser cap, or a developer bug).
  • 1 000 turns × ~$0.012 = $12/hour worst case = $288/day. We tolerate that for a day; we'd notice and tune.

Storage: simple Postgres paliad.paliadin_rate_limit table with (user_id, hour_bucket, turn_count) upserted on every turn start. No Redis, no extra dependency. Fast at this scale.

Admin override: global_admin can lift their own cap (they typically test things). Surface this in the audit row, not in a CLI.

5.4 Audit + logging (Q16)

Recommendation: every turn writes a metadataonly row to paliad.paliadin_turns. Full transcripts are NOT stored in v1. Toolcall args are hashed. Anthropic vendor side is governed by orglevel retention.

5.4.1 Schema (migration 057)

CREATE TABLE paliad.paliadin_turns (
    turn_id           uuid PRIMARY KEY,
    user_id           uuid NOT NULL REFERENCES paliad.users(id),
    session_id        text NOT NULL,                  -- browser session, opaque
    started_at        timestamptz NOT NULL DEFAULT now(),
    finished_at       timestamptz,                    -- NULL until endofturn
    model             text NOT NULL,                  -- e.g. 'claude-sonnet-4-6'
    input_tokens      int,                            -- from Anthropic usage block
    output_tokens     int,
    tool_calls        jsonb NOT NULL DEFAULT '[]',    -- [{name, args_hash, status, latency_ms}]
    prompt_hash       text,                           -- sha256 of user_message after PII redaction (best effort)
    response_hash     text,                           -- sha256 of full response (citation only, not stored)
    chip_count        int NOT NULL DEFAULT 0,
    error_code        text,                           -- NULL on success; 'user_aborted', 'rate_limited', 'token_cap', 'tool_loop_cap', 'upstream_error'
    estimated_cost_usd numeric(10, 6)                 -- for ops dashboards
);

CREATE INDEX paliadin_turns_user_started_idx
    ON paliad.paliadin_turns(user_id, started_at DESC);
CREATE INDEX paliadin_turns_started_idx
    ON paliad.paliadin_turns(started_at DESC);

ALTER TABLE paliad.paliadin_turns ENABLE ROW LEVEL SECURITY;

-- User sees their own; global_admin sees all.
CREATE POLICY paliadin_turns_select
    ON paliad.paliadin_turns FOR SELECT
    USING (
      user_id = auth.uid()
      OR EXISTS (SELECT 1 FROM paliad.users u
                  WHERE u.id = auth.uid() AND u.global_role = 'global_admin')
    );

-- Service-role (paliad backend) writes; no userdirect INSERT.
-- (Paliad uses service-role conn, so policies on writes are inert,
-- but we still ENABLE RLS so future directauth callers are gated.)

Ratelimit table also lives in this migration:

CREATE TABLE paliad.paliadin_rate_limit (
    user_id     uuid NOT NULL REFERENCES paliad.users(id),
    hour_bucket timestamptz NOT NULL,
    turn_count  int NOT NULL DEFAULT 0,
    PRIMARY KEY (user_id, hour_bucket)
);

5.4.2 What we DON'T store (v1)

  • The user's actual prompt text. Only prompt_hash.
  • The model's actual response text. Only response_hash.
  • The tool inputs. Only tool_calls[].args_hash.

Phase 2 transcript persistence unlocks all three — deliberately separate migration so the compliance review sits at that boundary.

5.4.3 Vendor retention

The Anthropic side is governed by the orglevel contract. Open question for m (§9.2): does HLC have an enterprise / zeroretention agreement, or are we using m's personal key (matches existing ANTHROPIC_API_KEY precedent in mAi/youpcms)? The answer changes whether v1 needs a "data sent to Anthropic" disclosure on first use.

5.4.4 Prompt caching (Phase 2)

The Anthropic API supports prompt caching for repeated system prompts + tool definitions. Our system prompt + 7 tool defs is ~850 tokens — perfect cache target. Phase 2: enable cache_control on the system block; cuts input cost by ~90% on repeat turns within the 5minute cache window. Skip in v1 to keep the client minimal; pick up after the API surface stabilises.


§6 Schema, endpoints, files

6.1 New endpoints

Method Path Purpose Auth
POST /api/paliadin/turn Initiate a turn — assigns turn_id, opens SSE loggedin (302 to /login otherwise)
GET /api/paliadin/stream/{turn_id} SSE stream of the turn's response (mostly invoked from the same POST to keep the connection live; separate GET supports reconnect) loggedin
POST /api/paliadin/stream/{turn_id}/abort User cancels midturn loggedin, must own the turn
GET /api/paliadin/limits Returns {used_this_hour, hourly_cap, global_cap, global_used} loggedin
GET /paliadin The page shell (serverrenders the panel + initial empty state) loggedin
GET /admin/paliadin Peruser usage / cost dashboard global_admin

The POST /api/paliadin/turn returns {turn_id, sse_url}; the client opens an EventSource on sse_url. Twostep keeps the POST cheap for telemetry / audit row creation, while the longlived stream lives on a GET that's safe to retry / resume.

6.2 New / extended services

File Status Purpose
internal/services/paliadin/service.go NEW The orchestrator: run loop, history truncation, ratelimit check, auditrow writer
internal/services/paliadin/anthropic.go NEW Handrolled Messages API client (POST /v1/messages, stream parser)
internal/services/paliadin/tools.go NEW Tool catalog declaration + dispatch into existing services
internal/services/paliadin/prompt.go NEW System prompt template + perturn assembly
internal/handlers/paliadin.go NEW HTTP / SSE handlers
internal/services/deadline_service.go extend Add SearchVisible(userID, q, status, projectID, dueAfter, dueBefore, limit) (currently search is only on the global Fristenrechner matview)
internal/services/appointment_service.go extend Add ListVisibleInWindow(userID, from, to, projectID)
internal/services/glossary_service.go NEW (or refactor of glossary handler data load) A real service so the tool can call it; today it lives inline in the handler

6.3 Frontend

File Status Purpose
frontend/src/paliadin.tsx NEW Page shell
frontend/src/client/paliadin.ts NEW Chat panel, EventSource, history serialise to localStorage, chip parser, "Stop" button
frontend/src/styles/global.css extend New CSS section: .paliadin-panel, .paliadin-bubble, .paliadin-bubble--user/--assistant/--tool, .paliadin-chip, .paliadin-input, .paliadin-meta
frontend/src/components/Sidebar.tsx extend Add Paliadin navItem to the Übersicht group with ICON_SPARKLE
frontend/src/i18n-keys.ts extend ~25 new keys: paliadin.title, paliadin.tagline, paliadin.starter.*, paliadin.empty, paliadin.input.placeholder, paliadin.stop, paliadin.rate_limited, paliadin.error.*

6.4 Migration 057

057_paliadin.up.sql:
  - paliad.paliadin_turns (audit row, RLS, indexes)
  - paliad.paliadin_rate_limit (counter table, PK on user+hour)
  - GRANTs: service-role full, anon read disallowed by RLS
057_paliadin.down.sql: drop both tables.

6.5 Env vars (add to CLAUDE.md table)

Variable Required Purpose
ANTHROPIC_API_KEY for Paliadin Anthropic Messages API key. Replaces the "do not set" row that referred to the parked Phase H. Without it, /paliadin returns 503 (server still boots; the rest of paliad keeps working).
PALIADIN_MODEL optional (default claude-sonnet-4-6) Override model for tuning / fallback to Haiku for cost or Opus for accuracy without redeploying.
PALIADIN_HOURLY_CAP optional (default 30) Peruser turn cap per hour.
PALIADIN_GLOBAL_HOURLY_CAP optional (default 1000) Firmwide turn cap per hour.
PALIADIN_MAX_INPUT_TOKENS optional (default 4000) Soft cap; over this we truncate history.
PALIADIN_MAX_OUTPUT_TOKENS optional (default 2000) Hard cap; passed straight to Anthropic.

The Service must boot without ANTHROPIC_API_KEY (return 503 on /paliadin* routes; rest of paliad keeps working). Same pattern as DATABASE_URL and CALDAV_ENCRYPTION_KEY.


§7 Sub-design E — Phasing (REVISED 2026-05-07 20:56)

Answers Q19, Q20. Twostage shape after m's rescope:

  • Phase 0 (PoC, monly): §0.5 is the spec. ~600900 LoC, ~1 day. Ships first.
  • Phase 1 (production v1, multiuser): §7.1 below. Picked up only if §0.5.7's expansion gate fires.
  • Phase 2 / 3: unchanged.

7.1 Phase 1 (production v1) — confirmed scope, GATED on PoC success

Single coherent slice that proves the value proposition endtoend.

Item In v1
/paliadin page + sidebar entry under Übersicht
Migration 057 (paliadin_turns + paliadin_rate_limit)
Anthropic client (handrolled, streaming)
7 readonly tools
System prompt with branding.Name + visibility rules
SSE stream with meta/content_delta/tool_call/tool_result/chip/end/ping events
Citation chips (parsed from inline markers)
Rate limiting (peruser + global)
Audit row per turn (metadata only, no transcript)
Sessiononly history (browser localStorage)
3 starter prompts in DE+EN
Token caps + soft history truncation
/admin/paliadin cost dashboard (global_admin only)
~25 i18n keys (DE+EN)
Mobile responsiveness (uses sidebar drawer like every other page)
CLAUDE.md update flipping the ANTHROPIC_API_KEY row

Estimated scope: ~3 5004 500 LoC for the bundled v1 ship. Comparable to t144 (Custom Views) and t145's wouldhavebeen chat slice.

Single PR or split? Recommend single PR for v1. The Anthropic client + tool dispatch + handler + frontend panel are too tightly coupled to ship one without the others — every component is on the critical path of "demonstrate Paliadin actually works". Splitting buys nothing reviewwise (no reviewer can validate "Anthropic client works" without "the tool dispatch that exercises it"). Use the same singlePR pattern as t144 A1+A2 in retrospect.

7.2 Phase 2 candidates (postv1, prioritised)

In rough order of value:

  1. Persistent threads + peruser "keep my history" toggle. Adds paliadin_threads + paliadin_messages tables, retention policy, crossdevice sync. Compliance review attaches here, not to v1.
  2. Prompt caching for system prompt + tool defs. ~90 % inputcost reduction on repeat turns. Pure serverside change.
  3. search_youpc_case_law tool. Crossschema SELECT into data.judgments + data.judgment_markdown_content. Returns case number, division, date, headnote, top 3 holdings. The "research assistant" use case from m's framing.
  4. Rightdrawer mode. Wrap the /paliadin panel in a slideout container; toggle on every page from a header button.
  5. Mascot SVG + idle / thinking / foundit pose set. Real visual design pass.
  6. Onboarding tip — postonboarding inbox card or onetime toast on first dashboard visit after Paliadin lands.
  7. list_my_pending_approvals tool. Wraps inbox bell payload.
  8. Voice input / output. Web Speech API (paliad already has the substrate from the noVoicev1 tpaliad042 PWA).

7.3 Phase 3 candidates (validate first)

  • Write tools. create_deadline, create_appointment, attach_partner_unit, add_party. Each behind a hard confirmation gate ("Paliadin will create a deadline 16.05. on project X — confirm? [Yes / No]"). Auditrow marks these as mutating turns. Heavy compliance question; not Phase 2.
  • Perdeadline / pertermin microthreads. Longlived perentity Q&A. Plumbing collision with the (parked) chat design — reevaluate when chat unparks.
  • Proactive Paliadin. Push tips when the user hits a known confused state ("You've been on /tools/fristenrechner for 8 minutes — want me to walk you through it?"). Powerful, but creepy if poorly tuned.
  • Complianceaware redaction layer. Strip client names from the prompt before it leaves the building, swap stable hashes back in clientside. Big project; only sensible if HLC compliance forbids vendorside PII.

§8 Risks, mitigations, open questions

8.1 Adoption risk (the §0 callout, expanded)

The risk: Paliadin competes with three things HLC already has:

  1. The user's own Claude / ChatGPT in another tab (for general patentpractice questions).
  2. "Ask a colleague on Teams" (for paliadspecific questions about how to use the app).
  3. Just clicking around the UI (for "what's on my plate today").

Paliadin's edge over (1) is data grounding. Edge over (2) is 24/7 + privacy. Edge over (3) is conversational discovery and answering oneshot naturallanguage queries that the structured UI doesn't expose.

The risk realised: if v1 doesn't make the datagrounding visible (citation chips, toolcall evidence under each bubble, the tagline "I see your data"), users default to ChatGPT for everything, and Paliadin becomes a ghost feature that ate 3 weeks of build. Same pattern that just parked tpaliad145.

Mitigations baked into v1:

  • Toolcall evidence visible in every bubble. The user sees "ran search_my_deadlines (3 results)" — instant differentiation from a generic chatbot.
  • Citation chips make answers actionable, not just informative.
  • Tagline + empty state explicitly say "I see your projects."
  • Three starter prompts demonstrate the datagrounding immediately on first use.

Mitigations m should consider before approving:

  • Sanitycheck with two PA colleagues before locking v1 scope. Same recommendation t145 got. If two PAs say "I'd just open Claude in another tab", the scope shifts toward making the datagrounding more prominent (e.g. ship "Paliadin sees only your data" as a persistent banner above the input, not a tooltip) before shipping at all.
  • Soft launch + telemetry. v1's audit row gives us cheap measurement of: (a) total turns/day, (b) turns per user, (c) toolcall frequency (low = Paliadin is being used like ChatGPT, defeating the differentiation). Watch for two weeks; if toolcalls/turn < 1.5 average, the feature isn't doing what we shipped it for and Phase 2 priorities change.

8.2 Compliance / vendordata risk

The risk: sending client names + case content to Anthropic's API may not be sanctioned by HLC IT/compliance. The 20260416 "we don't want anthropic API… for a while" decision (memory b6a11b55…) was about Frist extraction from documents; Paliadin is conversational, but the data envelope sent to Anthropic still contains PII whenever a tool returns a project name.

Mitigations:

  • HLC enterprise key (vs m's personal key) if available — gives orglevel retention + DPA coverage.
  • Zeroretention configuration on the Anthropic call (metadata: {user_id: "<hash>"}, cache_control only on the system block, no eval enrolment).
  • Firstuse disclosure in the panel: "Your messages and the data Paliadin retrieves on your behalf are sent to Anthropic. [Learn more]" — loadbearing and required if the legal answer to §9.2 is "personal key, not enterprise".
  • Phase 2 hardening: serverside redaction layer that swaps client names → stable hashes before the API call, restores them clientside after. Big project; only sensible if compliance forbids vendorside PII.

8.3 Ratelimit / runawaycost risk

The risk: a user (or a bug) loops fast enough to drain budget before alarms fire.

Mitigations:

  • Peruser 30/hour + global 1 000/hour caps (§5.3). Both surfaced on /admin/paliadin.
  • Perturn token cap (§5.1).
  • Perturn toolloop cap (≤ 5 rounds, §2.6).
  • Audit row written before the upstream call so a ratelimitevading bug still leaves traces.
  • PALIADIN_HOURLY_CAP / PALIADIN_GLOBAL_HOURLY_CAP are envvar configurable so we can tighten without a deploy.

8.4 Hallucination risk (model invents a deadline)

The risk: the model fabricates a deadline date / case number that doesn't exist in the user's data.

Mitigations:

  • Hard rule in system prompt: "Every concrete factual claim about the user's work MUST come from a tool call in the current conversation."
  • Citation markers tied to toolresult IDs only. Marker #deadline-OPEN:c47bd2 resolves only if the id was returned by a real tool call this turn (frontend validates).
  • Toolcallevidence visibility: the user can see that a tool ran and what it returned. Hallucination becomes obvious because the chip says "0 results" but the bubble claims a deadline.
  • Phase 2: serverside posthoc validation that checks every cited id against the toolresult set; reject the message and retry if the model invented one.

8.5 Open questions for m (REVISED 2026-05-07 20:56 for the PoC scope)

The rescope mooted most of the original questions. Tracking which are still active vs deferred:

PoCrelevant (decide before coder shift):

  1. QPoC1: What goes in the system prompt's readrecipe set? §0.5.3 says ~15 SQL recipes; the actual list is designlevel. Recommendation: start with whats_on_my_plate, list_my_projects, get_project_detail, search_my_deadlines_by_status, lookup_court_by_name, lookup_glossary_term, lookup_deadline_rule_by_concept. Same shape as §2.5, just expressed as SQL recipes Claude follows.
  2. QPoC2: Does m want the response file (/tmp/paliadin/{turn_id}.txt) cleaned up after each turn (mVoice does), or kept around for offline review? Recommendation: keep them in ~/.paliad-poc/turns/{date}/ with a 30day janitor — m said "monitoring use", and raw response artefacts are great for posthoc analysis.
  3. QPoC3: Should /admin/paliadin be reachable from the sidebar, or hidden behind a direct URL? Recommendation: sidebar entry (/admin/paliadin) since m is the only user and the only audience for the dashboard.
  4. QPoC4: classifier_tag — let Claude selftag in the trailer block, or postprocess serverside from the prompt text? Recommendation: Claude selftags (cheap and richer); we add a serverside fallback if Claude's tag is missing.
  5. QPoC5: Expansion gate threshold — §0.5.7 suggests "≥3 turns/workingday, ≥50 % tooluse rate, 4 weeks." Tighten? Loosen? Pure feel.

Productionv1deferred (only relevant if §0.5.7 expansion gate fires):

  • QA (Anthropic key) — moot for PoC; Claude Code handles it.
  • QB (firstuse disclosure) — moot; monly.
  • QC (default model) — moot; Claude Code defaults.
  • QD (sanitycheck with 2 PAs before locking scope) — becomes the expansiongate question. Don't ask the PAs about Paliadin until the PoC has earned the conversation.
  • QE (surface confirmation) — kept; PoC ships the same /paliadin page so the question is already answered.
  • QF (mascot) — Phase 2 still.
  • QG (starter prompts) — relevant for the PoC empty state; recommendation unchanged.
  • QH (branding.Name in prompt) — relevant for PoC; recommendation: yes, but the firmagnostic prompt can read "Paliad" instead of branding.Name since m's PoC is on his laptop and the firmname distinction adds no value for a single user.
  • QI (rate limit) — moot for PoC.
  • QJ (youpc caselaw tool) — interesting at PoC since m himself does caselaw research; promoted to QPoC6: include lookup_youpc_case as one of the systemprompt SQL recipes from day one? Crossschema SELECT into data.judgments is technically trivial, and m is exactly the user who'd benefit. Recommendation: yes, include it.
  • QK (audit retention) — PoC stores everything forever (one user, no compliance pressure).
  • QL (default language) — moot; m's locale is set, Claude reads it.

§9 What this design does NOT cover (deliberately)

  • The implementation. This is a design pass; coder shift writes the code. No commits beyond this doc on the inventor branch.
  • Mascot visual design. Phase 2; deserves its own design pass (and probably a designer's eye, not an inventor's).
  • HL Patents Style guide ingestion. Out of v1; Phase 2 RAG candidate.
  • Voice input / TTS output. Phase 2.
  • Multiuser collaboration (e.g. share a paliadin chat). Out of scope; users have their own visibility, and joint chat is a chatfeature shape (parked).
  • Offline mode. Paliadin is onlineonly by definition (it calls Anthropic). The PWA service worker should NOT cache /paliadin responses.
  • The renaming question. "Paliadin" is m's name. Locked.

Same recommendation as t145: noether, or a fresh coder Sonnet that has noether's substrate context. NOT cronus per the standing memory directive on paliad.

Why:

  • Substrate touchpoints are the same set the chat design covered: visibilityPredicate, auth.UserIDFromContext, sidebar entry pattern, migration tracker discipline, Dashboard/Agenda/Project/Deadline service interfaces. noether built half of these; the other half noether mapped during the chat design pass.
  • Anthropic Go client is novel in paliad but is small and wellspecified by §6.2 + the claude-api skill.
  • Frontend SSE consumer + chip parser is a onepage TS file.

§11 End of design — STOP

This is the inventor deliverable. Per the role brief: STOP after design. Do not begin implementation. Do not load /mai-coder. Wait for m's explicit go/nogo on the questions in §8.5 before any coder shift starts.

The completion signal sent to head will use the literal phrase "DESIGN READY FOR REVIEW" so the head's gate fires.