m/paliad

Files

m d24f73358c design(t-paliad-146): re-scope to PoC track — m-only + monitoring

m's reframing 2026-05-07 20:56: Paliadin is "mostly for myself now
but can be expanded — monitoring use." Two-stage shape replaces the
single-PR production-v1:

- Phase 0 (PoC): tmux-Claude pattern lifted from goldi/mVoice
  (mVoice/server.py:250-380). Claude Code window in a long-lived
  tmux session, prompts via tmux send-keys -l, response via
  /tmp/paliadin/{turn_id}.txt tail-f → SSE relay. Single user (m),
  m's laptop only (PALIADIN_ENABLED=false on prod). ~600-900 LoC,
  ~1 day. Migration 057 (PoC variant) stores full prompt + response
  for monitoring — no redaction at this scope.
- Phase 1 (production v1): the original §2-§6 Anthropic API design,
  GATED on PoC success per §0.5.7 expansion criteria (≥3 turns/wd,
  ≥50% tool-use rate, 4 weeks).

§0.5 (new) inserted as the load-bearing PoC spec. §7 leads with the
two-stage frame. §8.5 questions split into PoC-relevant (Q-PoC-1..6)
and production-v1-deferred. youpc case-law lookup promoted to
Q-PoC-6: m himself does case-law research, so include it from day
one (cross-schema SELECT into data.judgments is technically trivial
since paliad and youpc share the same Postgres).

What we drop for PoC: Anthropic API client, BYO-AI, rate limit,
token caps, multi-user RLS edge cases, /admin cost dashboard,
compliance disclosure, most i18n keys.

What we keep: system prompt voice, citation discipline (best-effort),
visibility gate (Claude is required to use paliad.can_see_project()
in queries), /paliadin surface, SSE shape, audit table.

The two-stage shape protects against the t-145 pattern: ship cheap,
observe, decide. No 4500-LoC investment based on m's gut feel about
adoption.

2026-05-07 20:59:46 +02:00

73 KiB

Raw Permalink Blame History

Design: Paliadin — in-app AI buddy / pet (t-paliad-146)

Status: READY FOR REVIEW (revised 2026-05-07 20:56 — PoC track inserted) Author: noether (inventor) Issue: m/paliad#9 Date: 2026-05-07 Branch: mai/noether/inventor-paliadin-in-app

Revision note (2026-05-07 20:56): m re-scoped this from "ship to HLC users" → "PoC for m, monitor usage, expand only if it earns it". The original Anthropic-API design in §2–§6 is preserved as the production-v1 spec, but §0.5 (new) supersedes it for what gets built first: a tmux-Claude PoC lifted from goldi/mVoice, m-only on his laptop, with monitoring instrumentation as the load-bearing instrument for the expand/kill decision. §7 (Phasing) and §8.5 (Open questions) are revised to reflect the two-stage shape.

§0 TL;DR

A new conversational surface inside paliad: Paliadin, a Claude‑backed assistant that answers questions grounded in the user's own paliad data and paliad's domain knowledge. The Paliadin is a long‑lived in‑process Go service, not a per‑session worker spawn — it talks to the Anthropic Messages API directly with tool use, where every tool is a thin shim over an existing paliad service (DashboardService, ProjectService, DeadlineService, CourtService, GlossaryService, DeadlineRuleService, AgendaService). RLS / visibility is enforced at the service layer, exactly as it is for the rest of the app, so Paliadin literally cannot see what the caller cannot see.

Phase 1 surface: dedicated /paliadin page + a sidebar entry under "Übersicht", server‑side SSE stream of Anthropic's response (same shape paliad's parked t‑145 chat design specced), session‑only conversation (no DB persistence in v1), 7 read‑only tools, ~30 turns/hour rate limit per user, hard token caps (4 k input + 2 k output per turn), per‑request audit row (no full transcript v1 — store a redacted hash + token counts + tool‑call list).

No avatar, no mascot SVG, no proactive onboarding pop‑up in v1. Just a clean chat panel with the name "Paliadin" in the header. Mascot, drawer mode, persistent threads, write‑tools, and youpc.org case‑law lookup all deferred to Phase 2/3.

mlex / /lex-* reuse: pattern, not code. mLex turns out to be a workspace (extractions/, analysis/, docs/) — there is no Go/TS code to fork. The /lex-* skills are Claude Code instruction docs that drive Claude itself against youpc's MCP tools; they cannot be embedded in a paliad Go service. What carries over is the shape: tool catalog (search → fetch → cite), system‑prompt voice (precise, citation‑backed, flag uncertainty honestly), and the "every legal claim needs a citation" guardrail. §2.4 maps the carry‑over precisely.

Trade‑off flagged up‑front (read §9.1 before approving): the same adoption‑risk concern that just parked the local‑chat design (t‑paliad‑145, today 17:03) applies here. Paliadin's edge over "open ChatGPT in another tab" is only that it sees the user's own data — and that edge collapses if v1 doesn't make the data‑grounding visible (citation chips, tool‑call evidence) and explicit ("Paliadin sees only YOUR projects"). Without those, Paliadin is just a worse Claude. With them, it's the only Claude that can answer "welche Frist ist als nächstes auf dem Müller‑Verfahren?".

§0.5 PoC track — m-only, monitored, expandable (REVISED 2026-05-07 20:56)

This section supersedes §2–§7 for what actually gets built first. §2–§6 stay valid as the production‑v1 spec; they're picked up only if the PoC earns expansion.

0.5.1 Why the re-scope

m's reframing: "Paliadin is mostly for myself now but can be expanded — monitoring use." Two consequences:

Single user (m) on m's laptop, not 38 HLC PAs on paliad.de. Multi‑tenant concerns drop. RLS still matters because m's global_role=global_admin shouldn't let Paliadin sweep data across projects sloppily, but the cross‑user PII surface goes to zero.
The build is for m to feel the UX and decide whether to expand. That makes monitoring instrumentation load‑bearing — it's the artefact that drives the next decision, not a compliance afterthought. PoC architecture: cheap to ship, expensive to not observe.

0.5.2 Architecture: lift goldi/mVoice tmux‑Claude

Verified pattern in ~/dev/mVoice/server.py:250–380 (and ~/dev/goldi/goldi/brain.py for the soul/prompt assembly). Working production code today on m's voice stack.

┌──────────────────────┐   POST /api/paliadin/turn         ┌────────────────────────────┐
│ Browser              │ ────────────────────────────────▶ │ paliad Go server (laptop)  │
│ /paliadin chat panel │                                    │                            │
│                      │ ◀──────── SSE stream ──────────── │  PaliadinService            │
└──────────────────────┘    (file‑tail of response)        │   ├─ ensure tmux session   │
                                                            │   ├─ tmux send-keys -l …   │
                                                            │   ├─ poll/tail            │
                                                            │   │  /tmp/paliadin/{tid} │
                                                            │   └─ audit row write      │
                                                            └──────────────┬─────────────┘
                                                                           │ tmux send-keys
                                                                           ▼
                                                            ┌────────────────────────────┐
                                                            │ tmux: paliad-paliadin     │
                                                            │   window: claude-paliad   │
                                                            │   $ claude  (interactive)  │
                                                            │     w/ system prompt +     │
                                                            │     mcp__supabase__*       │
                                                            │     scoped to paliad.*     │
                                                            └────────────────────────────┘

Lift verbatim from mVoice:

_ensure_voice_session() → _ensure_paliadin_session(). Same tmux has-session / new-session / new-window / "wait for ❯ prompt" dance.
tmux_generate(prompt) → response → same shape, just reads via tail‑f instead of one‑shot poll so we can stream deltas to the SSE consumer (see §0.5.5).
_reset_paliadin_session() for /clear — surfaced in the chat panel's "New conversation" button.

0.5.3 What we keep from §2–§6 (it's still right)

Section	Carry‑over	Why it survives the re‑scope
§2.2.1 system prompt template	✅ ported as the first message sent into Claude after `/clear`	The voice + guardrails (no fabrication, cite specifically, can't mutate) are exactly what we want. Just delivered via tmux send-keys instead of API `system:` field.
§2.5 tool catalog	✅ but as instructions, not as wrappers	Claude already has `mcp__supabase__execute_sql`. The system prompt teaches it the read patterns ("to find m's pending deadlines: `SELECT … FROM paliad.deadlines WHERE status='pending' AND paliad.can_see_project(project_id)`"). Zero Go shim code; ~15 SQL recipes in the prompt.
§3.2 visibility gate	✅	The system prompt requires `paliad.can_see_project(project_id)` in every project‑scoped query. Defence in depth: the supabase MCP runs with a service role, so RLS doesn't auto‑gate — the prompt rule is the gate, and we cross‑check via audit (§0.5.6).
§4 surface placement (`/paliadin` full page + sidebar entry)	✅	Same UI shell.
§4.5 streaming + interruption	✅ adapted	SSE stream still happens; backing source is `tail -f /tmp/paliadin/{turn_id}.txt` instead of Anthropic's stream events. Choppier but works.
§4.4 action chips	⚠ best‑effort	System prompt asks Claude to emit `[#deadline-OPEN:c47bd2]` markers; whether it does so reliably is an observation the PoC will surface.
§5.4 audit table (`paliad.paliadin_turns`)	✅	Reused for monitoring (§0.5.6). Added: `pane_lines_captured` so we can debug stream issues. Dropped: `input_tokens`/`output_tokens` (Claude Code doesn't expose these via the tmux interface — derive coarse cost estimate from elapsed time × Claude Code's published rates if we want it later).

0.5.4 What we drop for the PoC

Drop	Reason
Anthropic Messages API client (`anthropic.go`)	Replaced by tmux/Claude. Saves ~400 LoC.
Per‑user rate limit (`paliadin_rate_limit` table)	Single user. m's own restraint is the rate limit. Re-add at expansion.
Token caps + history truncation	Claude Code manages its own context window.
BYO‑AI / OpenAI adapter	Out of scope — m's prior message; punted.
Multi‑user RLS edge cases (cross‑user PII)	Single‑user; not exercised.
Compliance disclosure on first use	m → m's own Claude subscription. m has already accepted Anthropic's TOS.
`/admin/paliadin` cost dashboard	One user; cost is m's monthly Claude bill.
Most i18n keys	m switches DE/EN naturally; ~6 keys instead of ~25.

0.5.5 SSE shape adapted to tmux backing

Same event vocabulary as §4.5.1, fed by a goroutine that tails /tmp/paliadin/{turn_id}.txt and emits content_delta events as new bytes arrive. Trade‑offs:

Latency to first token: ~3–8 s (Claude Code "thinking" before first write). Worse than native API streaming. Mitigation: surface a "Paliadin denkt nach …" placeholder bubble until the first byte arrives.
No native tool‑call events. Claude Code does its tool‑use internally; we see only the final text written to the response file. To still surface "ran search_my_deadlines (3 results)" evidence, the system prompt instructs Claude to write a structured trailer block at the end of its response: \n\n---\n[paliadin-meta]\nused_tools: search_my_deadlines, lookup_court\nrows_seen: 3, 1\n[/paliadin-meta]\n. Frontend strips that block and renders it as the citation evidence row. Brittle but observable; this is the kind of thing the PoC's monitoring is for.
Heartbeat: still emit event: ping every 25 s so the SSE connection survives any reverse proxy. (Not strictly needed on localhost but keeps the production migration cheap.)

0.5.6 Monitoring instrumentation — the load‑bearing artefact

Because the whole point of the PoC is "watch m use it", the audit shape is the most important thing in the PoC ship.

Migration 057 (PoC variant):

CREATE TABLE paliad.paliadin_turns (
    turn_id            uuid PRIMARY KEY,
    user_id            uuid NOT NULL REFERENCES paliad.users(id),
    started_at         timestamptz NOT NULL DEFAULT now(),
    finished_at        timestamptz,
    duration_ms        int,                   -- end - start
    user_message       text,                  -- FULL prompt (m‑only PoC; redact at expansion)
    response           text,                  -- FULL response (same)
    response_tokens    int,                   -- approx via word count × 1.3
    used_tools         text[],                -- parsed from [paliadin-meta] trailer
    rows_seen          int[],                 -- parallel to used_tools
    chip_count         int NOT NULL DEFAULT 0,
    abandoned          boolean NOT NULL DEFAULT false,  -- user closed mid-stream
    page_origin        text,                  -- which paliad page m was on when he asked
    error_code         text,                  -- 'tmux_unresponsive', 'pane_died', 'user_aborted', NULL on ok
    classifier_tag     text                   -- coarse self-classification: 'data', 'concept', 'navigation', 'meta', 'other'
);

CREATE INDEX paliadin_turns_started_idx
    ON paliad.paliadin_turns(started_at DESC);

Critical departure from the production design: at PoC scope we DO store the full prompt + response. m is the only user, m is m's own compliance officer, and the whole point is to read what was asked later. Redaction returns at expansion.

/admin/paliadin page (PoC variant) renders:

7‑day rolling turn count + median/p90 duration.
Histogram by classifier_tag (so m sees: "60 % of my queries were 'data', 25 % 'concept', 10 % 'navigation', 5 % 'meta'" — that's the use‑case shape).
Top 10 prompts by frequency (textually similar grouping via simple normalised string match — fancy clustering is Phase 1 expansion).
Tool‑use rate (turns where used_tools is non-empty / total turns). Load‑bearing for the expansion decision — see §0.5.7.
Abandonment rate (abandoned=true / total).
Daily usage sparkline.

The classifier_tag is set by Claude itself in the [paliadin-meta] trailer, instructed by the system prompt — same brittleness caveat as the tool‑use evidence.

0.5.7 The expansion gate — what triggers production v1?

m decides; this section gives m the metric set he asked for. Suggested green‑light criteria after 4 weeks:

Sustained use: ≥ 3 turns/working‑day average over weeks 3–4.
Data‑grounded use: tool‑use rate ≥ 50 % (otherwise Paliadin is being used like ChatGPT and there's no differentiation argument for the production build).
Useful by m's own gut. No metric beats this; the dashboard helps m frame it but doesn't decide for him.

Yellow flag criteria (interesting but not green):

< 1 turn/day → m isn't using it; either kill or rebuild the affordance to be more discoverable.
Tool‑use rate < 30 % → the value isn't in the data grounding; reconsider the whole premise.
High abandonment rate → UX issue (latency? wrong answers? broken streaming?). Investigate before expansion.

Kill criteria:

m looks at the dashboard 4 weeks in and shrugs.
Frequent tmux session deaths or /clear-too-often patterns suggest the architecture is fighting m. PoC failure ≠ Paliadin failure; might be the tmux pattern's failure.

0.5.8 PoC scope — what gets built

Item	In PoC
`internal/services/paliadin/tmux.go` (lifted + adapted from `mVoice/server.py:250–380`)	✅
`internal/services/paliadin/prompt.go` (system prompt template + `[paliadin-meta]` trailer rule)	✅
`internal/services/paliadin/sse.go` (file‑tail → SSE relay)	✅
`internal/handlers/paliadin.go` (POST /turn, GET /stream/{id}, /paliadin shell page, /admin/paliadin dashboard)	✅
Migration 057 — PoC `paliadin_turns` (full prompt + response stored)	✅
`frontend/src/paliadin.tsx` + `client/paliadin.ts` (chat panel, EventSource, chip parser, "Stop"/"New" buttons)	✅
`frontend/src/admin-paliadin.tsx` + `.ts` (the monitoring dashboard)	✅
Sidebar entry under Übersicht with `ICON_SPARKLE`	✅
~6 i18n keys (DE+EN)	✅
`PALIADIN_TMUX_SESSION` env var (default `paliad-paliadin`), `PALIADIN_RESPONSE_DIR` (default `/tmp/paliadin`), `PALIADIN_ENABLED` (default false on prod, true on m's laptop)	✅
Hard guard: if `PALIADIN_ENABLED=false` (paliad.de prod default) the routes are not even registered. PoC stays on m's laptop, full stop.	✅

Estimated scope: ~600–900 LoC. ~1 day of coder work. Same single‑PR pattern as t‑144 / t-145.

0.5.9 What stays unbuilt (production v1, see §2–§6)

The Anthropic API client, the 7 Go tool shims, the per‑user rate limit, the encrypted‑key BYO‑AI surface, the redacted audit, the multi‑replica SSE bus — all of it. Picked up only if §0.5.7's expansion gate fires.

The two‑stage shape protects against the t‑145 pattern: ship cheap, observe, decide. No 4500‑LoC investment based on m's gut feel about adoption.

§1 Premises verified live (2026-05-07)

Before designing on top, I checked each load‑bearing claim against the running system rather than CLAUDE.md / memory.

Claim	Source	Verification
mLex is a workspace, not a code repo	issue framing "mlex project we could partially reuse"	`~/dev/mLex/` contains only `extractions/`, `analysis/`, `docs/`, plus `CLAUDE.md` + `AGENTS.md`. No `.go`, no `package.json`, no tools that aren't Claude skills. The "code" is the `/lex-` skill family in `~/.claude/skills/`, which is instruction docs driving Claude against `mcp__youpc__` MCP tools. Carry‑over is shape (system prompt, tool catalog, citation style), not adapters.*
`/lex-*` skill family	brief reference	`~/.claude/skills/{lex-research,lex-extract,lex-classify,lex-classify-patent,mai-lexy}/SKILL.md`. All five inventoried in §2.4.
Paliad has no anthropic / claude code	CLAUDE.md `ANTHROPIC_API_KEY` "do not set" row	`grep -ri anthropic ~/dev/paliad/internal ~/dev/paliad/cmd` → only `internal/branding/firm.go` comment unrelated to AI. `go.mod` has no `anthropic-sdk-go` dep. This task un‑defers the env var; CLAUDE.md row needs updating in the same PR.
Paliad has no SSE pattern shipped	substrate scan	`grep -rn 'http.Flusher\|text/event-stream' internal/` returns only references inside the parked t‑145 chat design doc — no live code. We bring our own.
Paliad and youpc share the same physical Postgres	infra	Both run on `100.99.98.201:11833` (port 11833 = ydb). Paliad's schema is `paliad`; youpc's is `data`. A future "search UPC case law" tool would be a same‑DB cross‑schema SELECT, not an HTTP hop — but Phase 1 still excludes case‑law lookup (see §3).
Visibility is enforced at service layer (not via SET LOCAL auth.uid)	code	`internal/services/visibility.go` defines `visibilityPredicate(alias)` + `visibilityPredicatePositional(alias, idx)`; every project‑scoped query inlines it. Paliadin's tools call existing services, inheriting the predicate.
`paliad.can_see_project()` is the canonical visibility function in DB (RLS, t‑139)	t‑139 migration 055	`internal/db/migrations/055_hierarchy_aggregation.up.sql:144` `CREATE OR REPLACE FUNCTION paliad.can_see_project(_project_id uuid)`. Same predicate echoed in `services/visibility.go`.
Migration tracker is at 56 (`056_user_views`)	t‑144 A1	`paliad_schema_migrations` row. Next migration is 057. (t‑145 was parked before its `057_chat` shipped, so 057 is open.)
t‑paliad‑145 (local chat) was parked today 2026-05-07 17:03	memory + commit log	Commit `99f08e3` "Merge: t-paliad-145 design doc only — local chat feature PARKED per m's call". The chat SSE substrate that would have been shared is not built — Paliadin builds its own minimal stream.
Sidebar bell pattern (`sidebar-inbox-badge`) is reusable for a chat‑style entry	t‑138	`frontend/src/components/Sidebar.tsx` — `navItem(href, icon, i18nKey, label, currentPath, badgeID?)` already takes an optional badge id. The same plumbing fits a Paliadin entry.
Sidebar `ICON_SPARKLE` already exists	UI scan	`frontend/src/components/Sidebar.tsx` defines `ICON_SPARKLE` (a star/sparkle SVG). Free icon for the Paliadin nav item.
`auth.UserIDFromContext(r.Context())` is the standard handler‑side user lookup	code	`internal/handlers/dashboard.go:31` is the canonical pattern. Paliadin handlers will use it.
`branding.Name` (default "HLC") is the firm‑name source	t‑paliad‑065	`internal/branding/firm.go` reads `FIRM_NAME` once at boot. Paliadin's system prompt + greeting must use `branding.Name`, never hardcode "HLC".
Single web replica on Dokploy today	`docker-compose.yml`	One `web` service. SSE state in‑process is fine v1; multi‑replica migration deferred along with chat.

Doc‑vs‑live conflicts encountered (must be fixed in the implementation PR):

CLAUDE.md still says ANTHROPIC_API_KEY is "Reserved for Phase H (AI Frist‑Extraktion) which is deferred per m's 2026-04-16 decision. Do not set." Paliadin un‑defers it. The CLAUDE.md row needs to flip to "Required for Paliadin (read‑only Claude assistant) — set on Dokploy."
The earlier "do not want anthropic API" decision (memory b6a11b55…, 2026-04-16) was specifically about Frist extraction from documents. Paliadin is a different surface (interactive read‑only Q&A over already‑structured data). It does not silently revive the parked extraction feature — t‑paliad‑011 stays blocked unless m explicitly un‑parks it too.

§2 Sub-design A — LLM architecture, prompt, tool use, mlex/lex reuse

Answers Q1, Q2, Q3, Q4, Q17, Q18.

2.1 LLM provider (Q1)

Recommendation: Anthropic Claude, single provider, accessed directly via the Messages API. Lock to Claude in v1; abstract behind a one‑function interface so future portability is cheap.

Provider	v1?	Why
Anthropic Claude (Messages API + tool use)	✅	Matches m's "wire into my claude" framing. Tool‑use shape is mature. Streaming via SSE is native. Paliad already has `ANTHROPIC_API_KEY` reserved.
Mixed (Claude reasoning + smaller routing model)	❌	Premature optimisation; for ~30 turns/hour/user we don't need the routing layer. Single‑model latency is fine.
OpenAI / open weight	❌	No HLC compliance review for those vendors; m's Anthropic key is on file.

Model selection within Anthropic: default to Claude Sonnet 4.6 (fast, tool‑use‑capable, cheap enough for chat use). Allow override via PALIADIN_MODEL env var so we can drop down to Haiku for cost or up to Opus for tricky onboarding sessions without redeploying.

Wire shape: one Go HTTP client (internal/services/paliadin/anthropic.go) that POSTs /v1/messages with stream: true. We do not adopt github.com/anthropics/anthropic-sdk-go in v1 — the API surface we use (one streaming POST + tool‑use loop) is small enough that a hand‑rolled client is shorter than wiring the SDK and safer than depending on a Go SDK that has historically broken on minor version bumps in mAi's experience. Keep the option open for Phase 2 if the token‑accounting / structured tool‑use helpers in the SDK become attractive.

// internal/services/paliadin/anthropic.go
type AnthropicClient interface {
    Stream(ctx context.Context, req MessagesRequest, w StreamWriter) (Usage, error)
}

The interface is the only swap‑point. Switching providers later means a new implementation, not a rewrite.

2.2 System prompt + message shape (Q2)

Recommendation: single system prompt with paliad context + tool definitions; one persistent prompt across pages (no per‑route system prompts in v1).

2.2.1 System prompt (locked, v1)

The system prompt is computed at process start from branding.Name, the user's locale (DE/EN), the user's display_name, the current date, and the visible‑project count (a single count, not the project list — keeps the prompt small). Computed per request, not per process — but its template is a constant.

You are Paliadin, an AI assistant inside {{firm}}'s patent practice
platform "Paliad". You help {{display_name}} ({{office}}) answer
questions about their own work in Paliad and about UPC / EPO / DPMA
patent practice.

Today is {{today}}. The user's display language is {{language}}; reply
in {{language}} unless the user switches mid‑conversation.

You have read‑only access to the following tools:
- whats_on_my_plate     — the user's dashboard (deadline / appointment / matter buckets)
- list_my_projects      — every project the user can see
- get_project_detail    — full detail of one project (deadlines, appointments, parties, partner units)
- search_my_deadlines   — filter the user's deadlines by status / date / project
- list_my_appointments  — the user's upcoming appointments (next 30 days by default)
- lookup_court          — Paliad's catalog of patent courts (UPC LDs, German LGs/OLGs/BGH, EPO, DPMA, ...)
- lookup_glossary_term  — Paliad's bilingual patent glossary
- lookup_deadline_rule  — Paliad's Fristenrechner concept tree (named deadline rules + their triggers)

Hard rules:
1. Never invent facts. If a tool returns nothing, say so. Do not guess
   case numbers, deadline dates, court names, or party names.
2. Every concrete factual claim about the user's work MUST come from a
   tool call in the current conversation. Cite using "[#deadline-XXXX]",
   "[#projekt-XXXX]", "[court: Munich LD]", "[glossary: Klageerwiderung]"
   so the UI can render citation chips.
3. You cannot mutate any data. If the user asks you to change something,
   explain that v1 is read‑only and point them to the right page in
   Paliad.
4. Visibility is enforced before tools return — if your tool call comes
   back empty, the data either doesn't exist OR the user can't see it.
   Never disclose the latter; just answer "I couldn't find anything
   matching that".
5. You cannot answer questions about other users' projects, even if the
   user names them.
6. Respect the user's role. If the user has global_role=standard, do not
   speculate about admin‑only functions.

Style:
- Direct, professional, slightly warm. Lawyer‑adjacent.
- Reply in Markdown. Use lists, code blocks, blockquotes.
- Cite specifically (case numbers, dates, court names) — never "around
  the 14th".
- When uncertain, flag it. ("I don't see a deadline matching that
  description on the projects you can access.")
- No emojis unless the user uses one first.

You are NOT:
- A code‑writing assistant
- A replacement for legal advice
- A web search

This is ~250 input tokens — well under the budget.

2.2.2 Per‑message envelope

The browser POSTs to /api/paliadin/turn with { session_id, user_message, history }, where history is the prior turns in the current session only (session = browser tab; localStorage backs it). The server prepends the system prompt and runs the tool‑use loop.

2.2.3 Tool use vs RAG‑only (Q2 secondary)

Tool use, not RAG. RAG (vector search over chunks of paliad content) is the wrong shape for this surface — paliad data is highly structured, the most useful answers come from filtered SQL queries (e.g. "all deadlines on my projects with status='pending' and due_date<=now()+7d"), and a vector store would just paraphrase what an SQL query returns more accurately. Tools give the model the same query power the user has, with hard visibility gates. Phase 2 may add RAG over a small static corpus (HL Patents Style guide, Paliadin docs) if onboarding queries don't get good answers from glossary lookups alone.

2.3 Long‑lived service vs lexy‑style worker spawn (Q4)

Recommendation: long‑lived Go service (in‑process) — not a per‑session Claude Code worker.

Option	Latency to first token	Cost / turn	Operational shape
In‑process Go service calling Anthropic API directly	< 1 s (just network + queueing)	Pay only for the model tokens we use	Single binary, single Postgres conn, scales with paliad
`mai hire paliadin` per session (Claude Code worker)	5–15 s	Worker startup overhead × N concurrent sessions × Claude Code's own context overhead	Operational footprint of running a worker per active user — dozens of tmux panes, tasks, reports

The lexy / cassandra worker pattern works because it's batch: classify N judgments, emit JSON, exit. A chat surface needs sub‑second response times across dozens of HLC users in parallel. A Claude‑Code‑per‑session pattern would give each user their own Claude in the loop, with all the tooling and message‑bus scaffolding that implies — wrong scale of abstraction.

That said, two things from the worker pattern do carry over:

System‑prompt voice. The lexy / mai-lexy SKILL.md persona ("Sharp, analytical, direct. Cites provisions and case law naturally. Flags uncertainty honestly.") is the right voice for Paliadin. We borrow it — see §2.2.1.
Tool catalog shape. The lex-research SKILL.md tool list (search → fetch full text → enrich → analyse → cite) maps cleanly onto Paliadin's read tools — see §3.

2.4 mlex / `/lex-*` carry‑over map (Q3, Q18)

Inventory result, with the shape‑vs‑code split called out for each:

Skill / asset	What it does	Carry‑over to Paliadin
`~/dev/mLex/` (workspace)	`extractions/` (per‑case JSON), `analysis/` (markdown reports), `docs/` (legal references), `extractions/queue.json`	None as code. Workspace artifacts are the output of the skills — they don't give us anything embeddable.
`lex-research` skill	UPC case law search → analysis report. Tool catalog: `mcp__supabase__execute_sql`, `mcp__youpc__`, `mcp__youpc-memory__`. Output format: structured markdown with citation tables.	Voice + tool‑catalog shape. "Search → enrich → analyse → cite" is the Paliadin flow. The skill's output‑format conventions (case number on first mention, division comparison tables) seed the system prompt's style guidance.
`lex-extract` skill	Read full judgment text → structured holdings / principles / interpretations JSON.	Not v1. Phase 2 candidate iff Paliadin gets a `extract_judgment(node_id)` write tool — orthogonal to read‑only v1.
`lex-classify` skill	Classify judgments against a 47‑leaf taxonomy.	Not v1. Same as above — write‑surface, batch‑shaped, irrelevant to interactive Q&A.
`lex-classify-patent` skill	Classify patents into IPC technology sectors via Anthropic.	Pattern reference only. It's already an Anthropic‑backed pipeline, so its prompt structure is a working example we can crib from for the system‑prompt template — but the actual classification target is paliad‑irrelevant.
`mai-lexy` skill	Lawyer persona that orchestrates the above. "Citation‑backed, flags uncertainty."	Voice template. The persona text is the closest thing to a working Paliadin system prompt; §2.2.1 borrows directly from it.
`claude-api` skill	Anthropic SDK / Messages API patterns + prompt caching guidance.	Implementation reference for the Go client + caching strategy. §6.4 picks up its prompt caching guidance.

Anti‑reuse: the mcp__youpc__* MCP tools that lex-research uses are designed for an interactive Claude Code session. Paliadin's tools must instead be Go service calls — same data shape, different transport. Don't try to embed an MCP client in a paliad Go process; rebuild the same SQL queries against the same Postgres directly.

2.5 Tool catalog v1 (Q17)

Seven read‑only tools. Each is a thin Go shim around an existing service; each enforces visibility through that service's existing visibilityPredicate.

Tool name	Backing service / method	Inputs	Output (truncated to fit budget)
`whats_on_my_plate`	`DashboardService.Get(userID)`	none	`{deadline_summary, appointment_summary, matter_summary, upcoming_deadlines[≤10], upcoming_appointments[≤10], recent_activity[≤10]}`
`list_my_projects`	`ProjectService.ListVisible(userID, filter)`	optional `{status, kind}`	`[{id, kind, label, status, parent_id, path}]` paged 25
`get_project_detail`	`ProjectService.Get(userID, id) + DeadlineService.ListByProject + AppointmentService.ListByProject + PartyService.ListByProject + DerivationService.AttachedUnits`	`{project_id}`	`{project, deadlines[≤25], appointments[≤25], parties[≤10], partner_units[≤5]}` — 503 if user can't see it (LLM gets a clean "not found", same response as truly missing)
`search_my_deadlines`	new helper on `DeadlineService` (reuses `visibilityPredicate`)	`{q?, status?, project_id?, due_after?, due_before?, limit≤25}`	`[{id, title, due_date, status, project_label, court}]`
`list_my_appointments`	new helper on `AppointmentService`	`{from, to, project_id?}`	`[{id, title, start_at, end_at, location, project_label}]`
`lookup_court`	`CourtService.Search(q)` (firm‑wide; no visibility filter — courts are reference data)	`{q}`	`[{slug, name, country, kind, address, vacation_periods[≤4]}]` truncated 10
`lookup_glossary_term`	static JSON loader (`internal/handlers/glossary.go` data)	`{q, lang?}`	`[{de, en, definition, category}]` top 5
`lookup_deadline_rule`	`DeadlineRuleService.SearchConcept(q)`	`{q}`	`[{rule_code, concept_label, trigger_event, deadline_text, legal_source}]` top 5

Bumped out of v1 (Phase 2 candidates):

list_my_pending_approvals (the inbox bell payload) — useful but adds RLS surface; let v1 stabilise first.
search_youpc_case_law — m's framing example, but cross‑schema → bigger blast radius. Phase 2 once Paliadin proves its weight on paliad‑internal data.
search_my_audit_log — high signal but PII heavy.
compute_frist — would invoke the existing DeadlineCalculator. Useful but the user can already do this on /tools/fristenrechner; defer until we see queries that actually want it.
All write tools (create_deadline, attach_partner_unit, etc.) — Phase 3 minimum, with hard confirmation gate (see §6).

2.6 The tool‑use loop (Q2 tertiary)

Standard Anthropic tool‑use loop:

1. Build messages = [system, ...history, user_message]
2. POST /v1/messages with tools=[...catalog]
3. Stream assistant reply chunks → relay to client SSE
4. If stop_reason == "tool_use":
     for each tool_use block:
        execute tool(input) on the matching Go service
        emit tool_result block back into messages
     goto 2 (with the same stream/SSE connection)
5. If stop_reason == "end_turn": close stream

Hard cap on the loop: ≤ 5 tool‑call rounds per turn. After 5 rounds without end_turn, force‑close with "Sorry, I got stuck — try rephrasing." Hitting the cap is a UI red flag we want to see in audit (see §6.3).

§3 Sub-design B — Data access, RLS, PII

Answers Q5, Q6, Q7.

3.1 Knowledge sources for v1 (Q5)

Recommendation: paliad‑internal data + paliad's static reference data ONLY. youpc.org case law deferred to Phase 2.

Source	v1	Reason
Per‑user paliad data (deadlines, appointments, projects, parties, partner units, attached units)	✅	The whole point of Paliadin. Visibility enforced via `visibilityPredicate` (every backing service already does this; tool inherits it).
Static reference data in paliad (court catalog t‑122, glossary, deadline rules, Fristenrechner concept tree)	✅	Firm‑wide, no per‑user gating, low blast radius.
UPC case law (youpc Postgres `data.judgments`, `data.judgment_markdown_content`)	❌ Phase 2	Cross‑schema SELECT is technically trivial (same Postgres) but: (a) inflates the v1 surface; (b) brings in 1700+ judgments → scaling RAG/full‑text question; (c) m's framing called out research as a use case, not a v1 must‑have. Ship paliad‑internal Q&A first; layer case‑law on once the substrate is proven.
HL Patents Style guide / Paliad onboarding docs	❌ Phase 2	No internal corpus exists yet; would need docs‑authoring + indexing. The `lookup_glossary_term` tool already covers the most common onboarding question shape ("was bedeutet X?").
External web search	❌	Out of scope; Paliadin is a grounded assistant, not a web surfer. m can use the regular Claude for that.

Ranking inside the v1 set (when Paliadin has to choose):

User‑data tools first when the question references "my", "the case", "the deadline", or names a project / case number that resolves.
Static reference next when the question is conceptual ("what's a Klageerwiderung?", "which court is the Munich LD?").
Combine when both apply ("when is my Klageerwiderung due?" → lookup_deadline_rule for the rule + search_my_deadlines for the user's instance).

The system prompt names tools in this priority order; the model's tool‑selection follows.

3.2 Auth / visibility boundary (Q6)

The gate: every backing service already runs visibilityPredicate(alias) against the caller's UUID. The Paliadin tool shim is a 5‑line wrapper that calls the service with userID derived from auth.UserIDFromContext(r.Context()) at the SSE handler boundary. There is no service‑role escape — the shim simply has no other UUID to pass in.

Belt‑and‑braces: every tool result is inspected for project_id columns; for each distinct project_id, the shim asserts paliad.can_see_project(_project_id) returns true. (Defence‑in‑depth: catches any future service‑layer regression where someone forgets the predicate. Costs one extra cheap function call per tool turn; cheap.)

The "tell, don't disclose" rule (§2.2.1 hard‑rule 4): if the user names a project they cannot see, the tool returns {error: "not found"} — same response as a project that doesn't exist. The system prompt instructs the model to say "I couldn't find anything matching that" without distinguishing the two cases. This is the same rule the t‑144 ViewService already applies.

Cross‑user PII in tool outputs: tool outputs may legitimately contain other users' display names (e.g. project teams, deadline assignees). These are visible to the caller through the regular UI already, so disclosing them through Paliadin is no worse. We do NOT redact them.

Approval / partner‑unit derivation: get_project_detail returns the derived team (per t‑139 DerivationService.AttachedUnits). Same predicate as the rest of the app.

3.3 PII handling, retention, encryption (Q7)

v1 stance: minimum viable persistence, maximum auditability of the access pattern.

Data	Stored where	Retention	Encryption	Notes
Conversation history (the actual messages)	Browser localStorage only. Cleared on browser data wipe / reload‑with‑fresh‑session.	Session only	n/a	Phase 2: opt‑in DB persistence with retention controls.
Per‑request audit row	New `paliad.paliadin_turns` table	Forever (matches audit‑log pattern; soft‑delete only)	At‑rest by Postgres / Supabase volume encryption	Stores: `turn_id, user_id, started_at, finished_at, model, input_tokens, output_tokens, tool_calls (jsonb of tool names + arg hashes — NOT arg values), prompt_hash (sha256 of redacted user message), error_code`. No prompt body, no completion body.
Tool‑call inputs (e.g. project_id arguments)	Hashed (sha256) into the audit row's `tool_calls` jsonb	Forever	n/a	The hash is enough to detect "this user kept asking about project X" patterns without storing the readable id.
Anthropic API request/response bodies	Not stored. Streamed through the Go service straight to the SSE writer.	n/a	TLS in flight	Anthropic's own retention is governed by the org's API contract — pulling Paliad onto an existing HLC enterprise key would inherit that.

Why this shape:

Compliance‑lite v1. HLC's compliance team has not yet weighed in on AI‑mediated PII (memory says the Phase H decision was "we don't want anthropic API… for a while"). Storing the full transcript opens a retention/disclosure question we don't need to answer to ship Paliadin's MVP. The audit‑metadata row is enough to demonstrate: (a) who used it, (b) how often, (c) what tools they triggered, (d) cost.
Phase 2 transcript persistence would add a paliadin_messages table (turn_id FK, role, content, redact_marks jsonb) and a per‑user setting "keep my history". Default off.
Why no PII redaction in the user prompt? v1 is opt‑in (the user typed the prompt). Redacting client names / case numbers in the audit hash would defeat the point; we redact by not storing the prompt, only its hash.

The Anthropic side: if HLC's enterprise contract forbids vendor‑side retention, the Go client must set metadata: {user_id: "<hash>"} and ensure the API call is on an org with zero‑retention guarantees. Open question for m: which Anthropic key are we using — m's personal key (existing ANTHROPIC_API_KEY precedent in mAi/youpcms) or a new HLC enterprise key? This is the single biggest compliance question; see §9.2.

§4 Sub-design C — UX

Answers Q8, Q9, Q10, Q11, Q12.

4.1 Surface placement (Q8)

Recommendation (counter to brief): start with a dedicated /paliadin full‑page route + a sidebar entry under the "Übersicht" group. Defer the right‑drawer to Phase 2.

Option	v1?	Why
`/paliadin` full page + sidebar entry	✅	Lowest CSS risk; mobile‑responsive for free (paliad's existing breakpoints work); easy to test via Playwright; matches paliad's "every feature is a top‑level page" pattern; no z‑index / overlay debugging.
Right‑drawer slide‑out from any page	❌ Phase 2	Pretty, matches m's "panel docked into UI" framing — but adds: drawer toggle wiring on all 30 pages, scroll‑lock interaction, focus management, mobile small‑screen fallback. Not worth the v1 surface area. Phase 2 wraps the same `/paliadin` UI in a slide‑out container.
Floating bottom‑right bubble	❌	Clippy comparison is visual, not positional. A floating overlay on every page collides with the BottomNav on mobile (already 5/5 slots) and the inbox bell on desktop.
Page‑embedded panel on `/paliadin` only	—	This is the v1 recommendation, just framed differently.

Sidebar entry:

Übersicht
  Start
  Agenda
  Inbox 🛎
  Paliadin ✨   ← new, ICON_SPARKLE

Group placement under Übersicht (not under Tools or Wissen) because Paliadin is conversation about the user's work, not a knowledge tool.

Mobile: Paliadin is reachable via the sidebar drawer (existing mobile pattern). No BottomNav slot — those are full and the ranking (Start / Projekte / + / Agenda / Menü) is more important than a chat shortcut for v1.

4.2 Avatar / personality (Q9)

Recommendation: no avatar SVG in v1. Just a chat panel with the name "Paliadin" in the header. Mascot is Phase 2.

Why:

Mascot design is a real design exercise (3–4 iterations to get something that doesn't read as kitsch in a law firm). Not inventor's call to bash one out in a v1 ship.
The brand cue (lime‑green #c6f41c accent) is enough to make Paliadin feel like part of paliad without a character.
Paliadin's personality lives in the system prompt (§2.2.1), not in pixels. Voice carries the buddy framing; mascot makes it visual but isn't load‑bearing.

What we ship in v1 instead:

Header: "✨ Paliadin" (sparkle icon + name) above the chat panel.
Empty‑state prompt: "Was kann ich für dich tun?" (DE) / "How can I help?" (EN).
One‑line tagline under the header: "Ich kenne deine Akten und Paliads Wissensbasis." (DE) / "I know your matters and Paliad's knowledge base." (EN). This is the only v1 affordance that explicitly tells the user "I see your data" — load‑bearing for the differentiation argument in §0/§9.1.

Phase 2 mascot brief (for when m greenlights it): small SVG, friendly, lime‑green primary, no eyes‑darting / animated‑on‑idle (creepy), modular pose set so it can react to "thinking" / "found it" / "stuck" without being an MMORPG pet.

4.3 Onboarding hint (Q10)

Recommendation: silent‑until‑invoked. No proactive pop‑up, no first‑run modal, no toast.

Why:

Paliad already has a polished onboarding flow (t‑paliad‑034). Adding a Paliadin pop‑up on top would be the kind of "surprise the user" affordance that erodes trust the first time it misfires.
The empty‑state inside /paliadin itself is the right onboarding surface: 3 starter‑prompt buttons rendered when the chat is empty.

Three starter prompts (DE primary):

"Was steht heute an?" → triggers whats_on_my_plate
"Welche Fristen sind diese Woche fällig?" → triggers search_my_deadlines with due_before=now()+7d
"Erkläre mir Klageerwiderung." → triggers lookup_glossary_term + lookup_deadline_rule

EN equivalents: "What's on my plate?" / "Which deadlines are due this week?" / "Explain Klageerwiderung."

Picking one from the row sends it as if the user typed it. Keeps the surface zero‑weight when ignored.

Phase 2 candidate: post‑onboarding email / inbox card "Paliadin ist live, frag ihn was deine Daten dir sagen." Driven by the existing reminder/email substrate. Out of v1 scope.

4.4 Action chips in responses (Q11)

Recommendation: action chips parsed from a simple inline syntax in the model's reply, rendered client‑side, NOT a tool the model invokes.

Why simple syntax over a tool: tool invocations cost a round‑trip; we want the model to "suggest" an action without paying for an extra tool turn. The model emits a structured marker in its prose; the frontend client parses it and renders a chip below the bubble.

Marker format:

[#deadline-OPEN:c47bd2]
[#projekt-OPEN:slug-x]
[#frist-OPEN:c47bd2]
[#termin-OPEN:abc123]
[chip:nav:/projects/abc-123]   (for arbitrary navigation)
[chip:filter:status=pending&due=this_week]   (for parameterised inbox links)

The system prompt teaches the model to emit chips when navigation or filtering would help the user act on the answer. Each marker resolves to one chip, rendered as:

┌──────────────────────────────────────┐
│ Frist 16.05.2026 fällt morgen.       │
│ [Frist öffnen] [Akte ansehen]        │
└──────────────────────────────────────┘

Client parser (frontend/src/client/paliadin.ts): regex over the streamed text, replaces marker with a button. Buttons are real <a> elements (Cmd‑click works, keyboard works), styled like the existing .entity-table row chips.

Why not let the model embed full URLs? Two reasons:

URLs change (we renamed /akten → /projekte mid‑project). Markers are stable; we resolve them at render time.
Hallucinated URLs are real risk. If the model can only emit a marker tied to an id we know it just retrieved, the chip can't navigate to a fake page.

4.5 Streaming + interruption (Q12)

Recommendation: SSE stream from /api/paliadin/stream, client EventSource, user‑initiated abort via "Stop" button.

4.5.1 Stream shape

Mirrors Anthropic's native streaming events, adapted for our SSE consumer:

event: meta
data: {"turn_id":"01H…","model":"claude-sonnet-4-6"}

event: content_delta
data: {"text":"Auf der Akte Müller…"}

event: tool_call
data: {"name":"search_my_deadlines","args_hash":"…","status":"running"}

event: tool_result
data: {"name":"search_my_deadlines","status":"ok","summary":"3 results"}

event: content_delta
data: {"text":"… ist die Klageerwiderung am 16.05. fällig."}

event: chip
data: {"kind":"deadline","action":"open","id":"c47bd2"}

event: end
data: {"input_tokens":342,"output_tokens":88,"tool_calls":1}

# heartbeat every 25 s to keep Traefik from reaping
event: ping
data: {}

The tool_call / tool_result events are visible in the UI as small dim "ran search_my_deadlines (3 results)" lines under the bubble — the citation evidence that distinguishes Paliadin from a generic chatbot. (Direct quote from the §0 framing: "the differentiation collapses if v1 doesn't make the data‑grounding visible.")

4.5.2 Interruption

"Stop" button next to the input. Click → EventSource.close() + fetch('/api/paliadin/stream/{turn_id}/abort', {method:'POST'}).
Server abort closes the upstream Anthropic request via context cancellation.
Stopped turns still write an audit row with error_code='user_aborted' so we see how often users hit it.

4.5.3 Reconnect

Same Last‑Event‑ID resume pattern the t‑145 chat design specced. Server keeps the in‑flight stream buffered for 30 s after disconnect; reconnect within that window replays missed events. After 30 s, the turn is considered done — reconnect arrives at the start of a fresh session.

§5 Sub-design D — Token budget, cost, audit

Answers Q13, Q14, Q15, Q16.

5.1 Per‑request token cap (Q13)

Recommendation: max_input_tokens=4000 (model's view of input including system + history + tool defs + user msg) and max_tokens=2000 (model's max output) — same as brief. Hard‑fail above; soft‑truncate history below.

Rationale:

A typical paliad data tool result is < 500 tokens (truncated lists, capped at 25 rows). Even with system prompt (~250) + tool defs (~600) + 5 prior turns (~600 each on average) the input stays well under 4 k.
If the conversation runs long (~8+ turns), the client/server soft‑truncates history (drops oldest user/assistant pairs first) before sending. The user sees a "Earlier in this conversation, we discussed X (truncated)" pseudo‑system message. Cleaner than failing the turn.
Hard cap at 6 k input tokens — over that, refuse the turn with "Conversation too long, start a new one." Defends against jailbreak attempts that try to balloon the prompt.

Cost math at Sonnet 4.6 per‑turn typical (3 k input, 1 k output): ~$0.012/turn. At 30 turns/hour/user × 38 onboarded HLC users × 5 working hours/day = 5 700 turns/day = **$70/day worst case**. Realistic load is probably 10× lower. Phase 2: prompt caching (§5.4) drops it further.

5.2 Conversation history persistence (Q14)

Recommendation: session‑only in v1. Persistent threads in Phase 2.

Option	v1?	Why
Session‑only (browser localStorage, cleared on tab close + Sign Out)	✅	Zero schema. Zero retention question. Aligns with §3.3 "minimum viable persistence." Lets us ship paliadin without compliance review of stored transcripts.
Persistent threads (DB‑stored, named)	❌ Phase 2	Real schema (`paliadin_threads`, `paliadin_messages`), retention policy, cross‑device sync, "delete my history" UX, possibly opt‑in toggle. None of which is needed to validate "is Paliadin actually useful".

Edge case: page reload during a conversation. localStorage persists the history for that browser tab. Closing and reopening the tab restores. Closing the browser & reopening also restores. Sign‑out clears. Multi‑device = different histories. We're explicit about this in the panel header: "Conversation lives in this browser only" tooltip.

Why opt for slightly worse UX over the easy schema work: the t‑paliad‑145 chat just got parked over an adoption‑risk concern, not a schema concern. Paliadin should ship the smallest possible footprint that proves usefulness. Persistent threads can be a "you asked for this" Phase 2.

5.3 Rate limit per user (Q15)

Recommendation: 30 turns/hour/user (slightly tighter than the brief's 50). Plus a global ceiling of 1 000 turns/hour across the firm. Both configurable.

Per‑user 30/hour because:

30/hour ≈ one turn every two minutes during sustained use. That's heavy use. A reasonable user asks 3–5 questions in a session.
Soft hint at 25 ("you've used 25 of 30 messages this hour"), hard block at 30 with retry‑after.
Lower than 50 to give us a safety margin for runaway cost in week 1; we can raise it once we see real usage.

Global 1 000/hour ceiling because:

Global cap = circuit breaker against the long tail (a script that sends 1000 turns/hour from one user we missed in the per‑user cap, or a developer bug).
1 000 turns × ~$0.012 = $12/hour worst case = $288/day. We tolerate that for a day; we'd notice and tune.

Storage: simple Postgres paliad.paliadin_rate_limit table with (user_id, hour_bucket, turn_count) upserted on every turn start. No Redis, no extra dependency. Fast at this scale.

Admin override: global_admin can lift their own cap (they typically test things). Surface this in the audit row, not in a CLI.

5.4 Audit + logging (Q16)

Recommendation: every turn writes a metadata‑only row to paliad.paliadin_turns. Full transcripts are NOT stored in v1. Tool‑call args are hashed. Anthropic vendor side is governed by org‑level retention.

5.4.1 Schema (migration 057)

CREATE TABLE paliad.paliadin_turns (
    turn_id           uuid PRIMARY KEY,
    user_id           uuid NOT NULL REFERENCES paliad.users(id),
    session_id        text NOT NULL,                  -- browser session, opaque
    started_at        timestamptz NOT NULL DEFAULT now(),
    finished_at       timestamptz,                    -- NULL until end‑of‑turn
    model             text NOT NULL,                  -- e.g. 'claude-sonnet-4-6'
    input_tokens      int,                            -- from Anthropic usage block
    output_tokens     int,
    tool_calls        jsonb NOT NULL DEFAULT '[]',    -- [{name, args_hash, status, latency_ms}]
    prompt_hash       text,                           -- sha256 of user_message after PII redaction (best effort)
    response_hash     text,                           -- sha256 of full response (citation only, not stored)
    chip_count        int NOT NULL DEFAULT 0,
    error_code        text,                           -- NULL on success; 'user_aborted', 'rate_limited', 'token_cap', 'tool_loop_cap', 'upstream_error'
    estimated_cost_usd numeric(10, 6)                 -- for ops dashboards
);

CREATE INDEX paliadin_turns_user_started_idx
    ON paliad.paliadin_turns(user_id, started_at DESC);
CREATE INDEX paliadin_turns_started_idx
    ON paliad.paliadin_turns(started_at DESC);

ALTER TABLE paliad.paliadin_turns ENABLE ROW LEVEL SECURITY;

-- User sees their own; global_admin sees all.
CREATE POLICY paliadin_turns_select
    ON paliad.paliadin_turns FOR SELECT
    USING (
      user_id = auth.uid()
      OR EXISTS (SELECT 1 FROM paliad.users u
                  WHERE u.id = auth.uid() AND u.global_role = 'global_admin')
    );

-- Service-role (paliad backend) writes; no user‑direct INSERT.
-- (Paliad uses service-role conn, so policies on writes are inert,
-- but we still ENABLE RLS so future direct‑auth callers are gated.)

Rate‑limit table also lives in this migration:

CREATE TABLE paliad.paliadin_rate_limit (
    user_id     uuid NOT NULL REFERENCES paliad.users(id),
    hour_bucket timestamptz NOT NULL,
    turn_count  int NOT NULL DEFAULT 0,
    PRIMARY KEY (user_id, hour_bucket)
);

5.4.2 What we DON'T store (v1)

The user's actual prompt text. Only prompt_hash.
The model's actual response text. Only response_hash.
The tool inputs. Only tool_calls[].args_hash.

Phase 2 transcript persistence unlocks all three — deliberately separate migration so the compliance review sits at that boundary.

5.4.3 Vendor retention

The Anthropic side is governed by the org‑level contract. Open question for m (§9.2): does HLC have an enterprise / zero‑retention agreement, or are we using m's personal key (matches existing ANTHROPIC_API_KEY precedent in mAi/youpcms)? The answer changes whether v1 needs a "data sent to Anthropic" disclosure on first use.

5.4.4 Prompt caching (Phase 2)

The Anthropic API supports prompt caching for repeated system prompts + tool definitions. Our system prompt + 7 tool defs is ~850 tokens — perfect cache target. Phase 2: enable cache_control on the system block; cuts input cost by ~90% on repeat turns within the 5‑minute cache window. Skip in v1 to keep the client minimal; pick up after the API surface stabilises.

§6 Schema, endpoints, files

6.1 New endpoints

Method	Path	Purpose	Auth
`POST`	`/api/paliadin/turn`	Initiate a turn — assigns `turn_id`, opens SSE	logged‑in (302 to /login otherwise)
`GET`	`/api/paliadin/stream/{turn_id}`	SSE stream of the turn's response (mostly invoked from the same `POST` to keep the connection live; separate GET supports reconnect)	logged‑in
`POST`	`/api/paliadin/stream/{turn_id}/abort`	User cancels mid‑turn	logged‑in, must own the turn
`GET`	`/api/paliadin/limits`	Returns `{used_this_hour, hourly_cap, global_cap, global_used}`	logged‑in
`GET`	`/paliadin`	The page shell (server‑renders the panel + initial empty state)	logged‑in
`GET`	`/admin/paliadin`	Per‑user usage / cost dashboard	global_admin

The POST /api/paliadin/turn returns {turn_id, sse_url}; the client opens an EventSource on sse_url. Two‑step keeps the POST cheap for telemetry / audit row creation, while the long‑lived stream lives on a GET that's safe to retry / resume.

6.2 New / extended services

File	Status	Purpose
`internal/services/paliadin/service.go`	NEW	The orchestrator: run loop, history truncation, rate‑limit check, audit‑row writer
`internal/services/paliadin/anthropic.go`	NEW	Hand‑rolled Messages API client (POST `/v1/messages`, stream parser)
`internal/services/paliadin/tools.go`	NEW	Tool catalog declaration + dispatch into existing services
`internal/services/paliadin/prompt.go`	NEW	System prompt template + per‑turn assembly
`internal/handlers/paliadin.go`	NEW	HTTP / SSE handlers
`internal/services/deadline_service.go`	extend	Add `SearchVisible(userID, q, status, projectID, dueAfter, dueBefore, limit)` (currently search is only on the global Fristenrechner matview)
`internal/services/appointment_service.go`	extend	Add `ListVisibleInWindow(userID, from, to, projectID)`
`internal/services/glossary_service.go`	NEW (or refactor of glossary handler data load)	A real service so the tool can call it; today it lives inline in the handler

6.3 Frontend

File	Status	Purpose
`frontend/src/paliadin.tsx`	NEW	Page shell
`frontend/src/client/paliadin.ts`	NEW	Chat panel, EventSource, history serialise to localStorage, chip parser, "Stop" button
`frontend/src/styles/global.css`	extend	New CSS section: `.paliadin-panel`, `.paliadin-bubble`, `.paliadin-bubble--user/--assistant/--tool`, `.paliadin-chip`, `.paliadin-input`, `.paliadin-meta`
`frontend/src/components/Sidebar.tsx`	extend	Add Paliadin navItem to the Übersicht group with `ICON_SPARKLE`
`frontend/src/i18n-keys.ts`	extend	~25 new keys: `paliadin.title`, `paliadin.tagline`, `paliadin.starter.`, `paliadin.empty`, `paliadin.input.placeholder`, `paliadin.stop`, `paliadin.rate_limited`, `paliadin.error.`

6.4 Migration 057

057_paliadin.up.sql:
  - paliad.paliadin_turns (audit row, RLS, indexes)
  - paliad.paliadin_rate_limit (counter table, PK on user+hour)
  - GRANTs: service-role full, anon read disallowed by RLS
057_paliadin.down.sql: drop both tables.

6.5 Env vars (add to CLAUDE.md table)

Variable	Required	Purpose
`ANTHROPIC_API_KEY`	for Paliadin	Anthropic Messages API key. Replaces the "do not set" row that referred to the parked Phase H. Without it, `/paliadin` returns 503 (server still boots; the rest of paliad keeps working).
`PALIADIN_MODEL`	optional (default `claude-sonnet-4-6`)	Override model for tuning / fallback to Haiku for cost or Opus for accuracy without redeploying.
`PALIADIN_HOURLY_CAP`	optional (default `30`)	Per‑user turn cap per hour.
`PALIADIN_GLOBAL_HOURLY_CAP`	optional (default `1000`)	Firm‑wide turn cap per hour.
`PALIADIN_MAX_INPUT_TOKENS`	optional (default `4000`)	Soft cap; over this we truncate history.
`PALIADIN_MAX_OUTPUT_TOKENS`	optional (default `2000`)	Hard cap; passed straight to Anthropic.

The Service must boot without ANTHROPIC_API_KEY (return 503 on /paliadin* routes; rest of paliad keeps working). Same pattern as DATABASE_URL and CALDAV_ENCRYPTION_KEY.

§7 Sub-design E — Phasing (REVISED 2026-05-07 20:56)

Answers Q19, Q20. Two‑stage shape after m's re‑scope:

Phase 0 (PoC, m‑only): §0.5 is the spec. ~600–900 LoC, ~1 day. Ships first.
Phase 1 (production v1, multi‑user): §7.1 below. Picked up only if §0.5.7's expansion gate fires.
Phase 2 / 3: unchanged.

7.1 Phase 1 (production v1) — confirmed scope, GATED on PoC success

Single coherent slice that proves the value proposition end‑to‑end.

Item	In v1
`/paliadin` page + sidebar entry under Übersicht	✅
Migration 057 (`paliadin_turns` + `paliadin_rate_limit`)	✅
Anthropic client (hand‑rolled, streaming)	✅
7 read‑only tools	✅
System prompt with `branding.Name` + visibility rules	✅
SSE stream with `meta`/`content_delta`/`tool_call`/`tool_result`/`chip`/`end`/`ping` events	✅
Citation chips (parsed from inline markers)	✅
Rate limiting (per‑user + global)	✅
Audit row per turn (metadata only, no transcript)	✅
Session‑only history (browser localStorage)	✅
3 starter prompts in DE+EN	✅
Token caps + soft history truncation	✅
`/admin/paliadin` cost dashboard (global_admin only)	✅
~25 i18n keys (DE+EN)	✅
Mobile responsiveness (uses sidebar drawer like every other page)	✅
CLAUDE.md update flipping the `ANTHROPIC_API_KEY` row	✅

Estimated scope: ~3 500–4 500 LoC for the bundled v1 ship. Comparable to t‑144 (Custom Views) and t‑145's would‑have‑been chat slice.

Single PR or split? Recommend single PR for v1. The Anthropic client + tool dispatch + handler + frontend panel are too tightly coupled to ship one without the others — every component is on the critical path of "demonstrate Paliadin actually works". Splitting buys nothing review‑wise (no reviewer can validate "Anthropic client works" without "the tool dispatch that exercises it"). Use the same single‑PR pattern as t‑144 A1+A2 in retrospect.

7.2 Phase 2 candidates (post‑v1, prioritised)

In rough order of value:

Persistent threads + per‑user "keep my history" toggle. Adds paliadin_threads + paliadin_messages tables, retention policy, cross‑device sync. Compliance review attaches here, not to v1.
Prompt caching for system prompt + tool defs. ~90 % input‑cost reduction on repeat turns. Pure server‑side change.
search_youpc_case_law tool. Cross‑schema SELECT into data.judgments + data.judgment_markdown_content. Returns case number, division, date, headnote, top 3 holdings. The "research assistant" use case from m's framing.
Right‑drawer mode. Wrap the /paliadin panel in a slide‑out container; toggle on every page from a header button.
Mascot SVG + idle / thinking / found‑it pose set. Real visual design pass.
Onboarding tip — post‑onboarding inbox card or one‑time toast on first dashboard visit after Paliadin lands.
list_my_pending_approvals tool. Wraps inbox bell payload.
Voice input / output. Web Speech API (paliad already has the substrate from the no‑Voice‑v1 t‑paliad‑042 PWA).

7.3 Phase 3 candidates (validate first)

Write tools. create_deadline, create_appointment, attach_partner_unit, add_party. Each behind a hard confirmation gate ("Paliadin will create a deadline 16.05. on project X — confirm? [Yes / No]"). Audit‑row marks these as mutating turns. Heavy compliance question; not Phase 2.
Per‑deadline / per‑termin micro‑threads. Long‑lived per‑entity Q&A. Plumbing collision with the (parked) chat design — re‑evaluate when chat un‑parks.
Proactive Paliadin. Push tips when the user hits a known confused state ("You've been on /tools/fristenrechner for 8 minutes — want me to walk you through it?"). Powerful, but creepy if poorly tuned.
Compliance‑aware redaction layer. Strip client names from the prompt before it leaves the building, swap stable hashes back in client‑side. Big project; only sensible if HLC compliance forbids vendor‑side PII.

§8 Risks, mitigations, open questions

8.1 Adoption risk (the §0 callout, expanded)

The risk: Paliadin competes with three things HLC already has:

The user's own Claude / ChatGPT in another tab (for general patent‑practice questions).
"Ask a colleague on Teams" (for paliad‑specific questions about how to use the app).
Just clicking around the UI (for "what's on my plate today").

Paliadin's edge over (1) is data grounding. Edge over (2) is 24/7 + privacy. Edge over (3) is conversational discovery and answering one‑shot natural‑language queries that the structured UI doesn't expose.

The risk realised: if v1 doesn't make the data‑grounding visible (citation chips, tool‑call evidence under each bubble, the tagline "I see your data"), users default to ChatGPT for everything, and Paliadin becomes a ghost feature that ate 3 weeks of build. Same pattern that just parked t‑paliad‑145.

Mitigations baked into v1:

Tool‑call evidence visible in every bubble. The user sees "ran search_my_deadlines (3 results)" — instant differentiation from a generic chatbot.
Citation chips make answers actionable, not just informative.
Tagline + empty state explicitly say "I see your projects."
Three starter prompts demonstrate the data‑grounding immediately on first use.

Mitigations m should consider before approving:

Sanity‑check with two PA colleagues before locking v1 scope. Same recommendation t‑145 got. If two PAs say "I'd just open Claude in another tab", the scope shifts toward making the data‑grounding more prominent (e.g. ship "Paliadin sees only your data" as a persistent banner above the input, not a tooltip) before shipping at all.
Soft launch + telemetry. v1's audit row gives us cheap measurement of: (a) total turns/day, (b) turns per user, (c) tool‑call frequency (low = Paliadin is being used like ChatGPT, defeating the differentiation). Watch for two weeks; if tool‑calls/turn < 1.5 average, the feature isn't doing what we shipped it for and Phase 2 priorities change.

8.2 Compliance / vendor‑data risk

The risk: sending client names + case content to Anthropic's API may not be sanctioned by HLC IT/compliance. The 2026‑04‑16 "we don't want anthropic API… for a while" decision (memory b6a11b55…) was about Frist extraction from documents; Paliadin is conversational, but the data envelope sent to Anthropic still contains PII whenever a tool returns a project name.

Mitigations:

HLC enterprise key (vs m's personal key) if available — gives org‑level retention + DPA coverage.
Zero‑retention configuration on the Anthropic call (metadata: {user_id: "<hash>"}, cache_control only on the system block, no eval enrolment).
First‑use disclosure in the panel: "Your messages and the data Paliadin retrieves on your behalf are sent to Anthropic. [Learn more]" — load‑bearing and required if the legal answer to §9.2 is "personal key, not enterprise".
Phase 2 hardening: server‑side redaction layer that swaps client names → stable hashes before the API call, restores them client‑side after. Big project; only sensible if compliance forbids vendor‑side PII.

8.3 Rate‑limit / runaway‑cost risk

The risk: a user (or a bug) loops fast enough to drain budget before alarms fire.

Mitigations:

Per‑user 30/hour + global 1 000/hour caps (§5.3). Both surfaced on /admin/paliadin.
Per‑turn token cap (§5.1).
Per‑turn tool‑loop cap (≤ 5 rounds, §2.6).
Audit row written before the upstream call so a rate‑limit‑evading bug still leaves traces.
PALIADIN_HOURLY_CAP / PALIADIN_GLOBAL_HOURLY_CAP are env‑var configurable so we can tighten without a deploy.

8.4 Hallucination risk (model invents a deadline)

The risk: the model fabricates a deadline date / case number that doesn't exist in the user's data.

Mitigations:

Hard rule in system prompt: "Every concrete factual claim about the user's work MUST come from a tool call in the current conversation."
Citation markers tied to tool‑result IDs only. Marker #deadline-OPEN:c47bd2 resolves only if the id was returned by a real tool call this turn (frontend validates).
Tool‑call‑evidence visibility: the user can see that a tool ran and what it returned. Hallucination becomes obvious because the chip says "0 results" but the bubble claims a deadline.
Phase 2: server‑side post‑hoc validation that checks every cited id against the tool‑result set; reject the message and retry if the model invented one.

8.5 Open questions for m (REVISED 2026-05-07 20:56 for the PoC scope)

The re‑scope mooted most of the original questions. Tracking which are still active vs deferred:

PoC‑relevant (decide before coder shift):

Q‑PoC‑1: What goes in the system prompt's read‑recipe set? §0.5.3 says ~15 SQL recipes; the actual list is design‑level. Recommendation: start with whats_on_my_plate, list_my_projects, get_project_detail, search_my_deadlines_by_status, lookup_court_by_name, lookup_glossary_term, lookup_deadline_rule_by_concept. Same shape as §2.5, just expressed as SQL recipes Claude follows.
Q‑PoC‑2: Does m want the response file (/tmp/paliadin/{turn_id}.txt) cleaned up after each turn (mVoice does), or kept around for offline review? Recommendation: keep them in ~/.paliad-poc/turns/{date}/ with a 30‑day janitor — m said "monitoring use", and raw response artefacts are great for post‑hoc analysis.
Q‑PoC‑3: Should /admin/paliadin be reachable from the sidebar, or hidden behind a direct URL? Recommendation: sidebar entry (/admin/paliadin) since m is the only user and the only audience for the dashboard.
Q‑PoC‑4: classifier_tag — let Claude self‑tag in the trailer block, or post‑process server‑side from the prompt text? Recommendation: Claude self‑tags (cheap and richer); we add a server‑side fallback if Claude's tag is missing.
Q‑PoC‑5: Expansion gate threshold — §0.5.7 suggests "≥3 turns/working‑day, ≥50 % tool‑use rate, 4 weeks." Tighten? Loosen? Pure feel.

Production‑v1‑deferred (only relevant if §0.5.7 expansion gate fires):

Q‑A (Anthropic key) — moot for PoC; Claude Code handles it.
Q‑B (first‑use disclosure) — moot; m‑only.
Q‑C (default model) — moot; Claude Code defaults.
Q‑D (sanity‑check with 2 PAs before locking scope) — becomes the expansion‑gate question. Don't ask the PAs about Paliadin until the PoC has earned the conversation.
Q‑E (surface confirmation) — kept; PoC ships the same /paliadin page so the question is already answered.
Q‑F (mascot) — Phase 2 still.
Q‑G (starter prompts) — relevant for the PoC empty state; recommendation unchanged.
Q‑H (branding.Name in prompt) — relevant for PoC; recommendation: yes, but the firm‑agnostic prompt can read "Paliad" instead of branding.Name since m's PoC is on his laptop and the firm‑name distinction adds no value for a single user.
Q‑I (rate limit) — moot for PoC.
Q‑J (youpc case‑law tool) — interesting at PoC since m himself does case‑law research; promoted to Q‑PoC‑6: include lookup_youpc_case as one of the system‑prompt SQL recipes from day one? Cross‑schema SELECT into data.judgments is technically trivial, and m is exactly the user who'd benefit. Recommendation: yes, include it.
Q‑K (audit retention) — PoC stores everything forever (one user, no compliance pressure).
Q‑L (default language) — moot; m's locale is set, Claude reads it.

§9 What this design does NOT cover (deliberately)

The implementation. This is a design pass; coder shift writes the code. No commits beyond this doc on the inventor branch.
Mascot visual design. Phase 2; deserves its own design pass (and probably a designer's eye, not an inventor's).
HL Patents Style guide ingestion. Out of v1; Phase 2 RAG candidate.
Voice input / TTS output. Phase 2.
Multi‑user collaboration (e.g. share a paliadin chat). Out of scope; users have their own visibility, and joint chat is a chat‑feature shape (parked).
Offline mode. Paliadin is online‑only by definition (it calls Anthropic). The PWA service worker should NOT cache /paliadin responses.
The renaming question. "Paliadin" is m's name. Locked.

§10 Recommended implementer

Same recommendation as t‑145: noether, or a fresh coder Sonnet that has noether's substrate context. NOT cronus per the standing memory directive on paliad.

Why:

Substrate touchpoints are the same set the chat design covered: visibilityPredicate, auth.UserIDFromContext, sidebar entry pattern, migration tracker discipline, Dashboard/Agenda/Project/Deadline service interfaces. noether built half of these; the other half noether mapped during the chat design pass.
Anthropic Go client is novel in paliad but is small and well‑specified by §6.2 + the claude-api skill.
Front‑end SSE consumer + chip parser is a one‑page TS file.

§11 End of design — STOP

This is the inventor deliverable. Per the role brief: STOP after design. Do not begin implementation. Do not load /mai-coder. Wait for m's explicit go/no‑go on the questions in §8.5 before any coder shift starts.

The completion signal sent to head will use the literal phrase "DESIGN READY FOR REVIEW" so the head's gate fires.

73 KiB Raw Permalink Blame History Unescape Escape

Design: Paliadin — in-app AI buddy / pet (t-paliad-146)

§0 TL;DR

§0.5 PoC track — m-only, monitored, expandable (REVISED 2026-05-07 20:56)

0.5.1 Why the re-scope

0.5.2 Architecture: lift goldi/mVoice tmux‑Claude

0.5.3 What we keep from §2–§6 (it's still right)

0.5.4 What we drop for the PoC

0.5.5 SSE shape adapted to tmux backing

0.5.6 Monitoring instrumentation — the load‑bearing artefact

0.5.7 The expansion gate — what triggers production v1?

0.5.8 PoC scope — what gets built

0.5.9 What stays unbuilt (production v1, see §2–§6)

§1 Premises verified live (2026-05-07)

§2 Sub-design A — LLM architecture, prompt, tool use, mlex/lex reuse

2.1 LLM provider (Q1)

2.2 System prompt + message shape (Q2)

2.2.1 System prompt (locked, v1)

2.2.2 Per‑message envelope

2.2.3 Tool use vs RAG‑only (Q2 secondary)

2.3 Long‑lived service vs lexy‑style worker spawn (Q4)

2.4 mlex / /lex-* carry‑over map (Q3, Q18)

2.5 Tool catalog v1 (Q17)

2.6 The tool‑use loop (Q2 tertiary)

§3 Sub-design B — Data access, RLS, PII

3.1 Knowledge sources for v1 (Q5)

3.2 Auth / visibility boundary (Q6)

3.3 PII handling, retention, encryption (Q7)

§4 Sub-design C — UX

4.1 Surface placement (Q8)

4.2 Avatar / personality (Q9)

4.3 Onboarding hint (Q10)

4.4 Action chips in responses (Q11)

4.5 Streaming + interruption (Q12)

4.5.1 Stream shape

4.5.2 Interruption

4.5.3 Reconnect

§5 Sub-design D — Token budget, cost, audit

5.1 Per‑request token cap (Q13)

5.2 Conversation history persistence (Q14)

5.3 Rate limit per user (Q15)

5.4 Audit + logging (Q16)

5.4.1 Schema (migration 057)

5.4.2 What we DON'T store (v1)

5.4.3 Vendor retention

5.4.4 Prompt caching (Phase 2)

§6 Schema, endpoints, files

6.1 New endpoints

6.2 New / extended services

6.3 Frontend

6.4 Migration 057

6.5 Env vars (add to CLAUDE.md table)

§7 Sub-design E — Phasing (REVISED 2026-05-07 20:56)

7.1 Phase 1 (production v1) — confirmed scope, GATED on PoC success

7.2 Phase 2 candidates (post‑v1, prioritised)

7.3 Phase 3 candidates (validate first)

§8 Risks, mitigations, open questions

8.1 Adoption risk (the §0 callout, expanded)

8.2 Compliance / vendor‑data risk

8.3 Rate‑limit / runaway‑cost risk

8.4 Hallucination risk (model invents a deadline)

8.5 Open questions for m (REVISED 2026-05-07 20:56 for the PoC scope)

§9 What this design does NOT cover (deliberately)

§10 Recommended implementer

§11 End of design — STOP

73 KiB

Raw Permalink Blame History

2.4 mlex / `/lex-*` carry‑over map (Q3, Q18)