From 2aa532e71736377c44a98743c5b879c36946a314 Mon Sep 17 00:00:00 2001 From: m Date: Fri, 15 May 2026 17:31:20 +0200 Subject: [PATCH] =?UTF-8?q?chore:=20initial=20commit=20=E2=80=94=20spinout?= =?UTF-8?q?=20from=20m/otto?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Spun out mDMS strategy + tooling from m/otto into its own repo on 2026-05-15. Migrated: - docs/strategy.md (was: m/otto:docs/mdms-strategy.md) - infra/paperless/ (config + audit + migrate scripts) - infra/samba-canon/ (Canon MB5100 SMB1 bridge container) History in m/otto: issues #429–#438. Going forward, all mDMS issues file here. Sibling m/paperless (separate repo) remains the bare Docker Compose for Paperless-ngx itself. --- CLAUDE.md | 36 + README.md | 67 + docs/strategy.md | 288 +++ infra/paperless/Dockerfile | 14 + infra/paperless/README.md | 24 + infra/paperless/SYSTEM_PROMPT.txt | 24 + infra/paperless/docker-compose.yml | 52 + infra/paperless/migrate-apply-2026-05-13.log | 368 ++++ infra/paperless/migrate_types.py | 279 +++ .../migrate_types_audit_20260513T085119.json | 1715 +++++++++++++++++ infra/samba-canon/Dockerfile | 18 + infra/samba-canon/README.md | 120 ++ infra/samba-canon/docker-compose.yml | 36 + infra/samba-canon/entrypoint.sh | 41 + infra/samba-canon/smb.conf | 49 + 15 files changed, 3131 insertions(+) create mode 100644 CLAUDE.md create mode 100644 README.md create mode 100644 docs/strategy.md create mode 100644 infra/paperless/Dockerfile create mode 100644 infra/paperless/README.md create mode 100644 infra/paperless/SYSTEM_PROMPT.txt create mode 100644 infra/paperless/docker-compose.yml create mode 100644 infra/paperless/migrate-apply-2026-05-13.log create mode 100644 infra/paperless/migrate_types.py create mode 100644 infra/paperless/migrate_types_audit_20260513T085119.json create mode 100644 infra/samba-canon/Dockerfile create mode 100644 infra/samba-canon/README.md create mode 100644 infra/samba-canon/docker-compose.yml create mode 100644 infra/samba-canon/entrypoint.sh create mode 100644 infra/samba-canon/smb.conf diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..6a2d99c --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,36 @@ +# mDMS + +Document-management strategy + tooling: Paperless-ngx + Paperless-AI + Canon SMB bridge. + +**Memory group_id:** `mdms` (new — formerly `otto` for these issues) + +**Project type:** infrastructure + AI-classification pipeline. No web frontend, no application server. Deploys live on mDock; data on mTrueNAS. + +## Spinout context + +Migrated out of `m/otto` on 2026-05-15. Strategy doc + paperless-AI tooling + samba-canon bridge moved here. The original implementation history is in `m/otto` issues #429–#438. Going forward, file all mDMS issues here. + +## Layout + +- `docs/strategy.md` — the bible. Taxonomy (10 types, 13 tags), filename conventions, OCR-pipeline decisions. Read first. +- `infra/paperless/` — AI-classification layer config: `SYSTEM_PROMPT.txt`, audit log, `migrate_types.py`. +- `infra/samba-canon/` — host-network Samba 4.10 SMB1 bridge for Canon MB5100. + +## Sibling repo + +`m/paperless` — separate, bare Docker Compose for Paperless-ngx itself. `~/paperless/` on mDock is its checkout. Keep that for deployment; this repo is for *strategy* + *AI/classification* + *Canon bridge*. + +## Live deployment touchpoints + +- `mdock:8777` — Paperless-ngx (managed via `~/paperless/`, i.e. `m/paperless` repo) +- `mdock:3077` — Paperless-AI (config in this repo: `infra/paperless/`) +- mDock `~/samba-canon/` — Canon SMB bridge (source in this repo: `infra/samba-canon/`) +- mDock `~/mdms-mover/` — Age-gated inbox mover (source still in `m/otto` per issue #438, to be migrated in) + +When code in this repo and the live deployment drift, fix in the repo first, then deploy. + +## Conventions + +- Audit JSON: `infra/paperless/_.json` — keep them in-repo as historical record (migrate_types_audit_*.json etc.) +- Issues filed here, not in `m/otto`. +- Per global CLAUDE.md: Always `--netrc-file ~/.netrc-mai` for Gitea API as mAi. diff --git a/README.md b/README.md new file mode 100644 index 0000000..8f4cb92 --- /dev/null +++ b/README.md @@ -0,0 +1,67 @@ +# mDMS + +m's document management — Paperless-ngx + AI-classification pipeline, Canon scanner SMB bridge, strategy + tooling. + +Spun out from `m/otto` on 2026-05-15 — issues #429–#438 in `m/otto` are the +provenance trail. Going forward, all mDMS work lives here. + +## Layout + +``` +mDMS/ +├── docs/ +│ └── strategy.md # Taxonomy, layout, conventions (the bible) +├── infra/ +│ ├── paperless/ # Paperless-AI config: SYSTEM_PROMPT, audit scripts, +│ │ # migrate_types.py, deploy docker-compose +│ └── samba-canon/ # SMB1 bridge container for Canon MB5100 scanner +│ # (host-network + nmbd, SMB1+NTLMv1 for old printer) +└── README.md +``` + +## Components + +### Paperless-ngx (deployment) + +Compose lives in **`m/paperless`** (separate repo). That repo is the +deployment artifact — `~/paperless/` on mDock is its checkout. This repo +(`m/mDMS`) tracks the *AI classification* layer that sits on top of +Paperless-ngx (`infra/paperless/SYSTEM_PROMPT.txt`, the type/tag/ +correspondent migration scripts, the audit pipeline). + +### Paperless-AI + +Runs on `mdock:3077` in front of Paperless-ngx (`mdock:8777`). Classifies +each ingested document into one of the 10 canonical types and ≤2 of the +13 canonical tags. The system prompt + the migration scripts in +`infra/paperless/` are the source of truth — keep this repo and the +live Paperless-AI `aidata/.env` in sync. + +### Canon SMB bridge + +`infra/samba-canon/` is the host-network Samba 4.10 container on mDock +that the Canon MB5100 scans to. Files land in `/mnt/mdms/inbox/` (NFS +from mTrueNAS) and Paperless polls every 60s. The two-stage inbox +(staging dir + age-gated mover) lives separately under `~/mdms-mover/` +on mDock — see `m/otto` issue #438. + +## Data + +NFS-mounted from mTrueNAS: `/mnt/mPool/mdms/` → `/mnt/mdms/` on all +consumers. Layout: + +``` +/mnt/mPool/mdms/ +├── inbox/ # SMB scanner target (Canon writes here) +├── toprocess/ # Age-gated staging → Paperless consumes here +├── paperless/ # Paperless storage (post-ingest) +├── archive/ # Long-term archive +├── templates/ # Document templates +└── export/ # Manual exports +``` + +## Reference + +- `docs/strategy.md` — full strategy, taxonomy decisions, type/tag rationale +- `m/otto` issues #429–#438 — original implementation history +- `m/paperless` — the bare Paperless-ngx Docker Compose setup diff --git a/docs/strategy.md b/docs/strategy.md new file mode 100644 index 0000000..6c471c9 --- /dev/null +++ b/docs/strategy.md @@ -0,0 +1,288 @@ +# mDMS: Dokumentenmanagement-Strategie + +## Aktueller Stand (nach Cleanup 2026-04-06) + +### Paperless-ngx (mDock) +- **129 Dokumente** (PDFs), Storage Path aktiv +- **41 Correspondents** — bereinigt (OCR-Duplikate gemergt, Müll entfernt) +- **13 Document Types** — Rechnung, Vertrag, Bescheid, Bescheinigung, Brief, Mitteilung, Abrechnung, Protokoll, Urkunde, Vollmacht, Gutachten, Angebot, Medizinisch +- **16 Tags** — hierarchisch: Kategorie (Steuer, Versicherung, Gesundheit, Wohnung, Arbeit, Finanzen, Erbschaft, Gewährleistung, Anleitung) + Status (offen, wichtig, Frist) + Kontext (Windscheid33, Paul) +- **1 Storage Path**: `{created_year}/{document_type}/{created} - {correspondent} - {title}` +- Dateien strukturiert: `2024/Rechnung/2024-03-15 - DAK - Beitragsrechnung.pdf` +- API-User: `mAi` +- Docker Compose: `~/paperless/` auf mDock, NFS-Mount `/mnt/paperless` von TrueNAS (`mPool/paperless`) + +### Was bereinigt wurde +- 68 Webp-Preview-Dokumente gelöscht (keine Originale, nur schlechte Vorschaubilder) +- 51 → 41 Correspondents (OCR-Duplikate gemergt: Hogan Lovells, Matthias Siebels, Ammerländer, Schubert, eprimo, Helios, Paul Siebels, Versorgungswerk etc.) +- 39 → 13 Document Types (Merge-Mapping umgesetzt) +- 172 → 16 Tags (Noise gelöscht, Kategorie-Mapping vor Löschung durchgeführt) +- 13 kaputte SynoResource-Dateien aus Consume gelöscht +- 126 orphaned flat-PDFs aus Originals gelöscht +- 43 Dokumententitel bereinigt (Nummern → beschreibende Titel) +- 5 Birthday-Datumsfehler korrigiert (1987-02-22 → korrekte Dokumentdaten) + +### mDocs (Gitea-Repo m/mDocs) — MIGRATION PENDING +- **72 Dateien**, 60 MB (Steuer, Versicherungen, Windscheid33) +- Wird in Paperless inbox migriert, Repo danach löschen + +### TrueNAS (mtruenas) +- Dataset `mPool/paperless` existiert bereits +- NFS-Export nach mDock (192.168.178.0/24) +- SMB-Share `mStash` als Referenz für mdms-Share + +--- + +## Entscheidungen + +### 1. Storage Path Format + +**Format: `{created_year}/{document_type}/{created} - {correspondent} - {title}.pdf`** ✓ Bestätigt + +Beispiele: +``` +2024/Rechnung/2024-03-15 - DAK - Beitragsrechnung Q1.pdf +2024/Bescheid/2024-01-20 - Finanzamt - Grundsteuerbescheid.pdf +2023/Vertrag/2023-06-01 - Vodafone - GigaTV Vertragsverlängerung.pdf +2025/Abrechnung/2025-01-31 - Hogan Lovells - Gehaltsabrechnung Januar.pdf +``` + +**Warum dieses Format:** +- **Jahr als Top-Level**: Chronologisches Browsen, ganzen Jahrgang für Steuerberater kopierbar +- **Typ als zweite Ebene**: "Zeig mir alle Rechnungen 2024" = `2024/Rechnung/` +- **Datum + Correspondent + Titel im Dateinamen**: Sortierbar, durchsuchbar, kontextreich +- **Max 2 Ordnerebenen**: Nicht zu tief, Finder/Explorer-freundlich +- **Navigierbar ohne Paperless**: Reiner Dateibrowser funktioniert + +**Verworfene Alternativen:** +- `{correspondent}/{year}/...` — zu viele sparse Ordner, schlecht für zeitliche Navigation +- `{year}-{month}/...` — zu granular, monatliche Ordner für oft nur 1-2 Dokumente +- Flach: `{created}-{correspondent}-{title}.pdf` — bei 500+ Dokumenten unbrauchbar + +### 2. Dataset-Struktur: mPool/mdms + +``` +/mnt/mPool/mdms/ +├── paperless/ # Paperless storage (originals, archive, thumbnails) +│ ├── documents/ +│ │ ├── originals/ # Originaldateien +│ │ └── archive/ # OCR-Versionen +│ └── ... +├── inbox/ # Paperless consume — Auto-Import +│ # mScan-App, Drag-and-Drop, SFTP +├── templates/ # Vertragsvorlagen, Formulare, Muster +│ # Nicht in Paperless — statische Referenzdokumente +├── archive/ # Dokumente die nicht in Paperless passen: +│ # Große Dateien (CAD, Pläne), Sammlungen, Binaries +└── export/ # Paperless-Exporte, Backups, Snapshots +``` + +### 3. Dokumenten-Routing + +| Dokument | Ziel | Begründung | +|----------|------|------------| +| Rechnungen, Bescheide, Briefe | Paperless (inbox/) | OCR + AI-Klassifikation + Suche | +| Verträge, Urkunden | Paperless | Langzeitarchiv mit Volltextsuche | +| Steuerunterlagen | Paperless + Tag "Steuer" | Filterbar für Steuerberater-Export | +| Gehaltsabrechnungen | Paperless + Tag "Arbeit" | Chronologisch abrufbar | +| Arztbriefe, Befunde | Paperless + Tag "Gesundheit" | Suchbar, datiert | +| Phone-Scans (mScan) | inbox/ → Paperless auto-import | Scannen → fertig | +| Vertragsvorlagen, Formulare | templates/ | Keine OCR nötig, statische Referenz | +| Baupläne, CAD, große Dateien | archive/ | Zu groß/speziell für Paperless | +| Fotos von Dokumenten | Paperless | OCR funktioniert auch auf Fotos | + +**Nicht in mDMS:** +- Fotos generell → Immich +- Bücher, eBooks → Calibre (mCalibre) +- Arbeitsrechtliche Dokumente (HL) → mWork-Vault (Obsidian, nicht mDMS) + +### 4. Paperless Taxonomy — Aufräumen + +#### Document Types (39 → 15) + +Reduziert auf sinnvolle, stabile Kategorien: + +| Behalten | Zusammenführen aus | +|----------|-------------------| +| **Rechnung** | Rechnung, Invoice, Beitragsrechnung | +| **Vertrag** | (neu — für Verträge, Verlängerungen) | +| **Bescheid** | Bescheid, Beitragsbescheid, Versicherungsbescheid | +| **Bescheinigung** | Bescheinigung, Lohnsteuerbescheinigung, Spendenbescheinigung | +| **Brief** | Brief, Anschreiben, Korrespondenz | +| **Mitteilung** | Mitteilung, Benachrichtigung, Information, Erinnerung | +| **Abrechnung** | Abrechnung, Entgeltabrechnung | +| **Protokoll** | Protokoll | +| **Urkunde** | Urkunde | +| **Vollmacht** | Vollmacht | +| **Gutachten** | Gutachten, Befund | +| **Angebot** | Angebot | +| **Energieausweis** | Energieausweis | +| **Schadenmeldung** | Schadenmeldung, Schadenanzeige | +| **Medizinisch** | Arbeitsunfähigkeitsbescheinigung, Aufklärungsbogen | + +Entfernen: Empfehlung, Preisanpassungsschreiben, Kündigungsbestätigung, Eintragungsbekanntmachung, Auftragsbestätigung, Testament (→ Tag), Einladung (→ Brief) + +#### Tags (172 → ~25 manuell kuratierte) + +Die meisten Auto-Tags sind Noise. Ziel: wenige, stabile Kategorie-Tags + manuelle Pflege. + +**Kategorie-Tags (Pflicht, einer pro Dokument):** +| Tag | Für | +|-----|-----| +| `Steuer` | Alles steuerrelevante | +| `Versicherung` | Policen, Schäden, Beiträge | +| `Gesundheit` | Arzt, Krankenhaus, Krankenkasse | +| `Wohnung` | Miete, Eigentum, Nebenkosten, WEG | +| `Arbeit` | Gehalt, Arbeitgeber, Kammer | +| `Finanzen` | Bank, Kredit, Altersvorsorge | +| `Erbschaft` | Testament, Nachlassangelegenheiten | +| `Gewährleistung` | Kaufbelege mit Garantie, Reklamationen | +| `Anleitung` | Bedienungsanleitungen, Handbücher, Datenblätter | + +**Aktions-Tags (optional):** +| Tag | Bedeutung | +|-----|-----------| +| `wichtig` | Aufbewahrungspflichtig, Schlüsseldokument | +| `Frist` | Hat eine Frist — regelmäßig prüfen | +| `offen` | Noch Handlung erforderlich | + +**Kontext-Tags (sparsam, bei Bedarf):** +| Tag | Für | +|-----|-----| +| `Windscheid33` | Immobilie Windscheidstr. 33 | +| `Paul` | Dokumente bzgl. Paul Siebels | + +**Löschen:** +- Jahres-Tags ("2022", "2025") — redundant mit created-Datum +- Personen-Tags ("Matthias Siebels") — gehört als Correspondent +- Ultra-granulare Tags ("Finger", "Hand", "Shimano", "Oral-B") — kein Nutzen +- Duplikate ("Rechtsanwalt" + "Rechtsanwälte" + "Rechtsanwaltschaft") + +#### Correspondents (51 → ~30) + +OCR-Duplikate zusammenführen: +- "Hogan Lovells International LLP" + "Hogan Lovells lnternational LLP" → **Hogan Lovells** +- "HELIÜS Klinikurn Duisburg" + "Helios Klinikum Duisburg" → **Helios Klinikum Duisburg** +- "Herr Matthias Siebels" + "Herrn Matthias Siebels" + "Matthias Siebels" + "Herrn Rechtsanwalt Matthias Siebels" → **Matthias Siebels** (eigene Dokumente) +- "Ammerländer Versicherung VVaG" + "Ammerländer Versicherung WaG" → **Ammerländer Versicherung** +- "SCHUBERT GmbH" + "Schubert GmbH Haus- und Grundbesitzverwaltung" → **Schubert Hausverwaltung** +- "Dr. figegeberH lcankenkas*" → identifizieren oder löschen (OCR-Müll) +- "Dr/Heikö Gemmel" → **Dr. Heiko Gemmel** +- "eprimo CmbH" + "eprimo GmbH" → **eprimo** +- "lndula Shopsystem GmbH" → **Indula Shopsystem** + +### 5. SMB-Share + +**Ja — mdms als SMB-Share wie mStash.** + +Konfiguration auf TrueNAS: +- Share-Name: `mdms` +- Dataset: `mPool/mdms` +- User: `m` (wie mStash) +- Mount auf mBreeze/mPebble: `~/mDMS` (LaunchAgent, analog zu mStash) + +Nutzen: +- `~/mDMS/inbox/` für Drag-and-Drop-Import (Paperless consumed automatisch) +- `~/mDMS/templates/` für schnellen Zugriff auf Vorlagen +- `~/mDMS/paperless/documents/originals/` für Dateibrowser-Navigation (via Storage Path) +- `~/mDMS/archive/` für große Dateien + +### 6. Vertrauliche Dokumente + +**Kein separates Verschlüsselungssystem nötig.** ✓ Bestätigt +- Alles läuft auf HomeServer (mforge/mtruenas), nur via Tailscale erreichbar +- SMB mit User-Auth + Paperless-Login reichen aus + +### 7. Obsidian-Integration + +Der Storage Path soll als Teil eines Obsidian-Vaults nutzbar sein. Das bedeutet: +- `mdms/paperless/documents/originals/` (oder `archive/`) via SMB als Vault-Ordner einbinden +- Obsidian zeigt die Ordnerstruktur (`2024/Rechnung/...`) direkt im Dateibrowser +- PDFs sind in Obsidian inline-viewbar und verlinkbar (`![[2024-03-15 - DAK - Beitragsrechnung.pdf]]`) +- Keine Sonderzeichen in Dateinamen die Obsidian-Links brechen (Spaces sind ok) + +**Umsetzung:** +- Option A: Symlink `~/m2/mDMS/` → `~/mDMS/paperless/documents/originals/` im Obsidian-Vault +- Option B: Separater Mini-Vault nur für Dokumente +- Option C: mdms als Unterordner im Hauptvault (m2) + +Empfehlung: **Option A (Symlink)** — kein Daten-Overhead, Vault bleibt schlank, Dokumente sind trotzdem verlinkbar. Braucht nur einen Symlink pro Maschine. + +### 8. E-Mail-Inbox + +**docs@msbls.de** — Alias auf mail@msbls.de (Hostinger). + +Paperless pollt mail@msbls.de via IMAP und konsumiert Anhänge aus Emails an docs@msbls.de: +- IMAP: `imap.hostinger.com:993` (SSL/TLS), User: `mail@msbls.de` +- Mail-Regel: Filter `To: docs@msbls.de`, nur Attachments, Action: als gelesen markieren +- Correspondent wird automatisch vom Absender übernommen +- Titel vom Dateinamen + +**Workflow:** Dokument als PDF an docs@msbls.de weiterleiten → Paperless importiert automatisch. + +--- + +## Paperless-AI Konfiguration + +Paperless-AI (Port 3077) soll die Klassifikation übernehmen. Konfigurieren mit: +- **Auto-assign correspondent** basierend auf OCR-Text (Absender-Erkennung) +- **Auto-assign document type** aus den 15 reduzierten Typen +- **Auto-assign 1-2 Kategorie-Tags** aus der Kurzliste +- **Nicht**: Auto-generierte Freitext-Tags (das erzeugt das aktuelle Chaos) + +--- + +## Migration: Schritt für Schritt + +### Phase 1: TrueNAS Setup ✓ DONE +1. ~~Dataset `mPool/mdms` erstellt~~ (LZ4, 1.24 TiB frei) +2. ~~Unterordner angelegt~~ (paperless, inbox, templates, archive, export) +3. ~~NFS-Export~~ (id:8, 192.168.178.0/24), Mount auf mDock als `/mnt/mdms` +4. ~~SMB-Share `mDMS`~~ (id:3, User `m`) + +### Phase 2: Paperless Migration ✓ DONE +5. ~~Paperless auf mDock gestoppt~~ +6. ~~Media, data, ai kopiert~~; pgdata als lokales Docker Volume (NFS-Ownership inkompatibel mit Postgres uid 999) +7. ~~consume → inbox kopiert~~ +8. ~~SynoResource-Dateien gelöscht~~ +9. ~~NFS-Mount `/mnt/mdms` auf mDock~~ (fstab via Proxmox agent) +10. ~~Docker Compose aktualisiert~~ (`~/paperless/docker-compose.yml`) +11. ~~Storage Path konfiguriert~~ +12. ~~Paperless verifiziert~~ — 129 Docs, alle Metadaten intakt + +**Hinweis:** pgdata lebt als Docker Volume `paperless_pgdata` auf mDock (nicht auf NFS). DB-Backup über `pg_dump` in `mdms/export/` planen. + +### Phase 3: Cleanup ✓ DONE +13. ~~Paperless Correspondents zusammenführen~~ → 51 → 41 +14. ~~Document Types reduzieren~~ → 39 → 13 +15. ~~Tags aufräumen~~ → 172 → 16 (hierarchisch: Kategorie + Status + Kontext) +16. Paperless-AI mit neuer Taxonomy konfigurieren (TODO) + +### Phase 4: mDocs Migration +17. mDocs-Repo klonen, alle PDFs nach mdms/inbox/ kopieren +18. Paperless konsumiert und klassifiziert automatisch +19. Manuell verifizieren: Correspondents, Types, Tags korrekt? +20. mDocs-Repo auf Gitea löschen + +### Phase 5: Client-Setup +21. SMB-Mount auf mBreeze: `~/mDMS` (LaunchAgent wie mStash) +22. SMB-Mount auf mPebble: `~/mDMS` +23. mScan-App auf mdms/inbox/ konfigurieren (falls SFTP/SMB-Upload möglich) + +--- + +## Offene Punkte + +- [x] Paperless Admin-Credentials — mAi-User auf Paperless angelegt +- [x] Paperless Cleanup — Correspondents, Types, Tags bereinigt +- [x] Storage Path konfiguriert und Dateien umbenannt +- [x] Webp-Previews gelöscht, SynoResource-Müll bereinigt +- [x] TrueNAS Dataset `mPool/mdms` erstellt (NFS id:8, SMB `mDMS` id:3) +- [x] Paperless media auf mdms/paperless umgestellt (Docker Compose aktualisiert) +- [x] SMB-Share `mDMS` eingerichtet auf TrueNAS +- [x] mDocs-Migration: 69 PDFs in Paperless inbox, consumption läuft +- [x] **docs@msbls.de** — E-Mail-Inbox für Paperless (IMAP-Polling, Alias auf mail@msbls.de, Regel filtert auf To: docs@msbls.de) +- [ ] Paperless-AI mit neuer Taxonomy konfigurieren +- [ ] Regelmäßiger Export/Backup-Job (Paperless → mdms/export/) +- [ ] `m doc` CLI-Subcommand für Paperless-Zugriff? (search, list, tag) +- [ ] Obsidian-Vault Symlink-Setup auf mBreeze/mPebble diff --git a/infra/paperless/Dockerfile b/infra/paperless/Dockerfile new file mode 100644 index 0000000..6ee1ade --- /dev/null +++ b/infra/paperless/Dockerfile @@ -0,0 +1,14 @@ +# Thin overlay on clusterzx/paperless-ai:3.0.9 — same digest as +# the :latest tag pulled on 2026-04-06, but pinned so future image +# refreshes do not silently wipe the type-restriction patches. +# +# Patch 1: routes/setup.js — restrict-existing-document-types on +# the manual processing route (already applied previously +# by docker cp, but volatile across container recreation). +# Patch 2: server.js — same restriction on the scheduled-scan +# loop. Without this, new document types kept appearing +# even with RESTRICT_TO_EXISTING_DOCUMENT_TYPES=yes. +FROM clusterzx/paperless-ai:3.0.9 + +COPY setup.js.patched /app/routes/setup.js +COPY server.js.patched /app/server.js diff --git a/infra/paperless/README.md b/infra/paperless/README.md new file mode 100644 index 0000000..599e0e3 --- /dev/null +++ b/infra/paperless/README.md @@ -0,0 +1,24 @@ +# paperless infra (snapshot) + +These files are a **traceable copy** of what lives in `~/paperless/` on +mDock. The live source of truth is on mDock — this directory exists so +the configuration is git-readable for the next shift and for audits. + +If you change the live config on mDock, sync the change here in the same +commit. If you change the files here, deploy by: + +```bash +scp Dockerfile docker-compose.yml mdock:/home/m/paperless/build/Dockerfile # and so on +ssh mdock 'cd /home/m/paperless && docker compose up -d --build' +``` + +The two patched JS files (`setup.js.patched`, `server.js.patched`) live +only on mDock in `~/paperless/build/` — they're large and don't belong +in the repo. Hashes: + +| File | mDock path | md5 | +|---|---|---| +| setup.js.patched | ~/paperless/build/setup.js.patched | `04cb5fbfaed13a5f25612af0b79dd90c` | +| server.js.patched | ~/paperless/build/server.js.patched | `eadcbb86048127f2c80632ae77bbc2a0` | + +See `docs/research/issue-429-paperless-pipeline.md` for the why. diff --git a/infra/paperless/SYSTEM_PROMPT.txt b/infra/paperless/SYSTEM_PROMPT.txt new file mode 100644 index 0000000..b5cf537 --- /dev/null +++ b/infra/paperless/SYSTEM_PROMPT.txt @@ -0,0 +1,24 @@ +Du klassifizierst deutsche Dokumente fuer ein persoenliches Dokumentenmanagementsystem. + +Erlaubte Document Types (NUR diese verwenden, keine neuen erfinden): +- Invoice — Rechnungen, Abrechnungen, Mahnschreiben, Kontoauszuege, Lohnsteuerbescheinigung, Umsatzsteuer-Voranmeldung, Steuererklaerung, Kostenrechnungen +- Contract — Vertraege, Versicherungsscheine, Kauf-/Kreditvertraege, unterschriebene Angebote, AGB +- Information — Behoerden- und Versicherer-Anschreiben, Bescheinigungen, Mitteilungen, Verwaltungsakte, medizinische Befunde, Berichte, Berechnungen, einseitige Informationen +- Personal Correspondence — Briefe von identifizierbaren Privatpersonen. Stammt der Brief von einer Institution, waehle stattdessen Information. +- Vollmacht — Vollmachten +- Urkunde — notarielle Urkunden +- Steuerbescheid — Steuerbescheide vom Finanzamt +- Anleitung — Bedienungsanleitungen, Datenblaetter, Manuals +- Protokoll — Sitzungs- und WEG-Protokolle +- Formular — Blanko-Formulare und Antraege + +Im Zweifel waehle Information. Erfinde NIEMALS neue Document Types. + +Erlaubte Tags (NUR diese verwenden, keine neuen erfinden): +Anleitung, Arbeit, Erbschaft, Finanzen, Frist, Gesundheit, Gewaehrleistung, Paul, Steuer, Versicherung, Windscheid33, Wohnung, offen, wichtig + +Bei medizinischen Dokumenten Tag Gesundheit setzen. +Bei steuerrelevanten Dokumenten Tag Steuer setzen. +Bei Dokumenten mit Frist Tag Frist setzen. + +Correspondents: Verwende den vollen offiziellen Namen der Organisation oder Person (z.B. "DAK-Gesundheit" nicht "DAK-Gesundheit Postzentrum, 22778 Hamburg"). Keine Adressen im Namen. Pruefe ob der Correspondent schon existiert bevor du einen neuen anlegst. \ No newline at end of file diff --git a/infra/paperless/docker-compose.yml b/infra/paperless/docker-compose.yml new file mode 100644 index 0000000..7559463 --- /dev/null +++ b/infra/paperless/docker-compose.yml @@ -0,0 +1,52 @@ +services: + broker: + image: docker.io/library/redis:8 + restart: unless-stopped + volumes: + - redisdata:/data + + db: + image: docker.io/library/postgres:16 + restart: unless-stopped + volumes: + - pgdata:/var/lib/postgresql/data + environment: + POSTGRES_DB: paperless + POSTGRES_USER: paperless + POSTGRES_PASSWORD: paperless + + webserver: + image: ghcr.io/paperless-ngx/paperless-ngx:2.20.6 + restart: unless-stopped + depends_on: + - db + - broker + ports: + - 8777:8000 + volumes: + - /mnt/mdms/paperless/data:/usr/src/paperless/data + - /mnt/mdms/paperless/media:/usr/src/paperless/media + - /mnt/mdms/export:/usr/src/paperless/export + - /mnt/mdms/inbox:/usr/src/paperless/consume + environment: + PAPERLESS_REDIS: redis://broker:6379 + PAPERLESS_DBHOST: db + PAPERLESS_TIME_ZONE: Europe/Berlin + PAPERLESS_OCR_LANGUAGE: deu+eng + PAPERLESS_CONSUMER_POLLING: 60 + PAPERLESS_CONSUMER_RECURSIVE: "true" + + paperless-ai: + build: ./build + image: mdock/paperless-ai:3.0.9-restrict-patch + container_name: paperless-ai + restart: unless-stopped + ports: + - 3077:3000 + volumes: + - aidata:/app/data + +volumes: + redisdata: + pgdata: + aidata: diff --git a/infra/paperless/migrate-apply-2026-05-13.log b/infra/paperless/migrate-apply-2026-05-13.log new file mode 100644 index 0000000..cc99c66 --- /dev/null +++ b/infra/paperless/migrate-apply-2026-05-13.log @@ -0,0 +1,368 @@ +/tmp/migrate_types.py:240: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). + audit_path = f"/tmp/migrate_types_audit_{datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%S')}.json" +/tmp/migrate_types.py:242: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). + "ts_utc": datetime.datetime.utcnow().isoformat() + "Z", +loaded 73 types, 195 docs +all 10 target types verified + +=== PLAN === +document moves: 171 +types to delete (after moves): 63 +types NOT mapped + nonzero docs (need manual call): 0 + +=== MOVES SUMMARY (per target type) === + -> Contract (+23 docs) + 7 from Vertrag + 6 from Versicherungsschein + 1 from agreement + 1 from contract + 1 from Finanzierungsangebot + 1 from Kreditvertrag + 1 from Loan Application and Agreement + 1 from Notarial Deed + 1 from Notarized agreement with amendments + 1 from Rechtsgeschäft + 1 from Versicherungsbedingungen + 1 from Vertragsdokument + -> Information (+96 docs) + 21 from Bescheinigung + 21 from Brief + 17 from Bescheid + 7 from Mitteilung + 3 from Wohnflaechenberechnung + 2 from Einladung zur Eigentümerversammlung + 2 from Leistungsnachweis + 2 from Medizinisch + 2 from Steuererklärung + 1 from Angebot + 1 from Antrag + 1 from Behandlungsplan und Risikoaufklärung + 1 from Beratungsprotokoll + 1 from Berechnung + 1 from Bericht + 1 from Bestätigungsbrief + 1 from Energy Performance Certificate + 1 from Erklarung + 1 from Guidelines + 1 from Gutachten + 1 from informational document + 1 from Informationsschreiben + 1 from Medical Consent Form + 1 from medical documentation + 1 from Rechnungs- und Vertragsinformation + 1 from Schreiben des Finanzamts + 1 from Verwaltungsakt + 1 from Werbung + -> Invoice (+52 docs) + 26 from Rechnung + 11 from Abrechnung + 6 from Umsatzsteuer-Voranmeldung + 4 from Lohnsteuerbescheinigung + 1 from Kontoauszug + 1 from Kontoübersicht + 1 from Kostenabrechnung + 1 from Kostenvoranmeldung + 1 from Mahnschreiben + +=== TYPES TO DELETE (after moves) === + id= 4 count= 11 name='Abrechnung' + id=160 count= 1 name='agreement' + id= 13 count= 1 name='Angebot' + id=134 count= 1 name='Antrag' + id=141 count= 1 name='Behandlungsplan und Risikoaufklärung' + id=129 count= 1 name='Beratungsprotokoll' + id=143 count= 1 name='Berechnung' + id=148 count= 1 name='Bericht' + id= 11 count= 17 name='Bescheid' + id= 15 count= 21 name='Bescheinigung' + id=151 count= 1 name='Bestätigungsbrief' + id= 30 count= 21 name='Brief' + id=127 count= 0 name='Consent Form' + id=144 count= 1 name='contract' + id=120 count= 0 name='Einladung / Vollmacht / Wirtschaftsplan' + id=113 count= 2 name='Einladung zur Eigentümerversammlung' + id=132 count= 0 name='Einspruchsschreiben' + id=158 count= 1 name='Energy Performance Certificate' + id=128 count= 1 name='Erklarung' + id=156 count= 1 name='Finanzierungsangebot' + id=122 count= 0 name='Geldzuwendungsbestätigung' + id=157 count= 1 name='Guidelines' + id= 27 count= 1 name='Gutachten' + id=136 count= 1 name='informational document' + id=139 count= 1 name='Informationsschreiben' + id=137 count= 0 name='Kaufvertrag' + id=118 count= 1 name='Kontoauszug' + id=117 count= 1 name='Kontoübersicht' + id=145 count= 1 name='Kostenabrechnung' + id=121 count= 1 name='Kostenvoranmeldung' + id=142 count= 1 name='Kreditvertrag' + id=114 count= 0 name='Kundeninformation' + id= 83 count= 2 name='Leistungsnachweis' + id=135 count= 1 name='Loan Application and Agreement' + id= 66 count= 4 name='Lohnsteuerbescheinigung' + id=147 count= 1 name='Mahnschreiben' + id=140 count= 1 name='Medical Consent Form' + id=150 count= 1 name='medical documentation' + id= 41 count= 2 name='Medizinisch' + id= 12 count= 7 name='Mitteilung' + id=161 count= 1 name='Notarial Deed' + id=159 count= 1 name='Notarized agreement with amendments' + id=133 count= 0 name='Plan' + id=131 count= 0 name='policy' + id=116 count= 0 name='Questionnaire/Declaration Form' + id= 2 count= 26 name='Rechnung' + id=149 count= 1 name='Rechnungs- und Vertragsinformation' + id=125 count= 0 name='Rechtlicher Vertrag' + id=155 count= 1 name='Rechtsgeschäft' + id=126 count= 0 name='recommendation' + id=152 count= 1 name='Schreiben des Finanzamts' + id=119 count= 0 name='Steuerdokument' + id=115 count= 2 name='Steuererklärung' + id=124 count= 0 name='Tilgungsplan' + id= 88 count= 6 name='Umsatzsteuer-Voranmeldung' + id=130 count= 1 name='Versicherungsbedingungen' + id= 67 count= 6 name='Versicherungsschein' + id= 40 count= 7 name='Vertrag' + id=153 count= 1 name='Vertragsdokument' + id=154 count= 1 name='Verwaltungsakt' + id=146 count= 1 name='Werbung' + id= 73 count= 0 name='Wohnflächenberechnung' + id=123 count= 3 name='Wohnflaechenberechnung' +audit trail written: /tmp/migrate_types_audit_20260513T085119.json + +=== APPLY === + [OK ] doc 104: 'Abrechnung' -> 'Invoice' + [OK ] doc 124: 'Abrechnung' -> 'Invoice' + [OK ] doc 88: 'Abrechnung' -> 'Invoice' + [OK ] doc 134: 'Abrechnung' -> 'Invoice' + [OK ] doc 122: 'Abrechnung' -> 'Invoice' + [OK ] doc 71: 'Abrechnung' -> 'Invoice' + [OK ] doc 220: 'Abrechnung' -> 'Invoice' + [OK ] doc 223: 'Abrechnung' -> 'Invoice' + [OK ] doc 224: 'Abrechnung' -> 'Invoice' + [OK ] doc 255: 'Abrechnung' -> 'Invoice' + [OK ] doc 248: 'Abrechnung' -> 'Invoice' + [OK ] doc 200: 'agreement' -> 'Contract' + [OK ] doc 222: 'Angebot' -> 'Information' + [OK ] doc 98: 'Antrag' -> 'Information' + [OK ] doc 91: 'Behandlungsplan und Risikoaufklärung' -> 'Information' + [OK ] doc 228: 'Beratungsprotokoll' -> 'Information' + [OK ] doc 202: 'Berechnung' -> 'Information' + [OK ] doc 96: 'Bericht' -> 'Information' + [OK ] doc 160: 'Bescheid' -> 'Information' + [OK ] doc 95: 'Bescheid' -> 'Information' + [OK ] doc 86: 'Bescheid' -> 'Information' + [OK ] doc 159: 'Bescheid' -> 'Information' + [OK ] doc 183: 'Bescheid' -> 'Information' + [OK ] doc 101: 'Bescheid' -> 'Information' + [OK ] doc 81: 'Bescheid' -> 'Information' + [OK ] doc 69: 'Bescheid' -> 'Information' + [OK ] doc 70: 'Bescheid' -> 'Information' + [OK ] doc 85: 'Bescheid' -> 'Information' + [OK ] doc 236: 'Bescheid' -> 'Information' + [OK ] doc 253: 'Bescheid' -> 'Information' + [OK ] doc 250: 'Bescheid' -> 'Information' + [OK ] doc 233: 'Bescheid' -> 'Information' + [OK ] doc 234: 'Bescheid' -> 'Information' + [OK ] doc 235: 'Bescheid' -> 'Information' + [OK ] doc 76: 'Bescheid' -> 'Information' + [OK ] doc 260: 'Bescheinigung' -> 'Information' + [OK ] doc 182: 'Bescheinigung' -> 'Information' + [OK ] doc 100: 'Bescheinigung' -> 'Information' + [OK ] doc 178: 'Bescheinigung' -> 'Information' + [OK ] doc 166: 'Bescheinigung' -> 'Information' + [OK ] doc 192: 'Bescheinigung' -> 'Information' + [OK ] doc 75: 'Bescheinigung' -> 'Information' + [OK ] doc 179: 'Bescheinigung' -> 'Information' + [OK ] doc 186: 'Bescheinigung' -> 'Information' + [OK ] doc 168: 'Bescheinigung' -> 'Information' + [OK ] doc 262: 'Bescheinigung' -> 'Information' + [OK ] doc 261: 'Bescheinigung' -> 'Information' + [OK ] doc 259: 'Bescheinigung' -> 'Information' + [OK ] doc 242: 'Bescheinigung' -> 'Information' + [OK ] doc 239: 'Bescheinigung' -> 'Information' + [OK ] doc 245: 'Bescheinigung' -> 'Information' + [OK ] doc 252: 'Bescheinigung' -> 'Information' + [OK ] doc 219: 'Bescheinigung' -> 'Information' + [OK ] doc 205: 'Bescheinigung' -> 'Information' + [OK ] doc 247: 'Bescheinigung' -> 'Information' + [OK ] doc 230: 'Bescheinigung' -> 'Information' + [OK ] doc 152: 'Bestätigungsbrief' -> 'Information' + [OK ] doc 244: 'Brief' -> 'Information' + [OK ] doc 164: 'Brief' -> 'Information' + [OK ] doc 146: 'Brief' -> 'Information' + [OK ] doc 169: 'Brief' -> 'Information' + [OK ] doc 191: 'Brief' -> 'Information' + [OK ] doc 105: 'Brief' -> 'Information' + [OK ] doc 188: 'Brief' -> 'Information' + [OK ] doc 115: 'Brief' -> 'Information' + [OK ] doc 97: 'Brief' -> 'Information' + [OK ] doc 196: 'Brief' -> 'Information' + [OK ] doc 74: 'Brief' -> 'Information' + [OK ] doc 113: 'Brief' -> 'Information' + [OK ] doc 102: 'Brief' -> 'Information' + [OK ] doc 126: 'Brief' -> 'Information' + [OK ] doc 195: 'Brief' -> 'Information' + [OK ] doc 110: 'Brief' -> 'Information' + [OK ] doc 170: 'Brief' -> 'Information' + [OK ] doc 180: 'Brief' -> 'Information' + [OK ] doc 116: 'Brief' -> 'Information' + [OK ] doc 127: 'Brief' -> 'Information' + [OK ] doc 149: 'Brief' -> 'Information' + [OK ] doc 227: 'contract' -> 'Contract' + [OK ] doc 156: 'Einladung zur Eigentümerversammlung' -> 'Information' + [OK ] doc 119: 'Einladung zur Eigentümerversammlung' -> 'Information' + [OK ] doc 163: 'Energy Performance Certificate' -> 'Information' + [OK ] doc 251: 'Erklarung' -> 'Information' + [OK ] doc 217: 'Finanzierungsangebot' -> 'Contract' + [OK ] doc 154: 'Guidelines' -> 'Information' + [OK ] doc 158: 'Gutachten' -> 'Information' + [OK ] doc 218: 'informational document' -> 'Information' + [OK ] doc 185: 'Informationsschreiben' -> 'Information' + [OK ] doc 189: 'Kontoauszug' -> 'Invoice' + [OK ] doc 187: 'Kontoübersicht' -> 'Invoice' + [OK ] doc 121: 'Kostenabrechnung' -> 'Invoice' + [OK ] doc 107: 'Kostenvoranmeldung' -> 'Invoice' + [OK ] doc 212: 'Kreditvertrag' -> 'Contract' + [OK ] doc 256: 'Leistungsnachweis' -> 'Information' + [OK ] doc 241: 'Leistungsnachweis' -> 'Information' + [OK ] doc 214: 'Loan Application and Agreement' -> 'Contract' + [OK ] doc 167: 'Lohnsteuerbescheinigung' -> 'Invoice' + [OK ] doc 254: 'Lohnsteuerbescheinigung' -> 'Invoice' + [OK ] doc 258: 'Lohnsteuerbescheinigung' -> 'Invoice' + [OK ] doc 249: 'Lohnsteuerbescheinigung' -> 'Invoice' + [OK ] doc 80: 'Mahnschreiben' -> 'Invoice' + [OK ] doc 138: 'Medical Consent Form' -> 'Information' + [OK ] doc 136: 'medical documentation' -> 'Information' + [OK ] doc 135: 'Medizinisch' -> 'Information' + [OK ] doc 197: 'Medizinisch' -> 'Information' + [OK ] doc 109: 'Mitteilung' -> 'Information' + [OK ] doc 144: 'Mitteilung' -> 'Information' + [OK ] doc 181: 'Mitteilung' -> 'Information' + [OK ] doc 111: 'Mitteilung' -> 'Information' + [OK ] doc 150: 'Mitteilung' -> 'Information' + [OK ] doc 184: 'Mitteilung' -> 'Information' + [OK ] doc 108: 'Mitteilung' -> 'Information' + [OK ] doc 206: 'Notarial Deed' -> 'Contract' + [OK ] doc 203: 'Notarized agreement with amendments' -> 'Contract' + [OK ] doc 151: 'Rechnung' -> 'Invoice' + [OK ] doc 90: 'Rechnung' -> 'Invoice' + [OK ] doc 93: 'Rechnung' -> 'Invoice' + [OK ] doc 92: 'Rechnung' -> 'Invoice' + [OK ] doc 161: 'Rechnung' -> 'Invoice' + [OK ] doc 140: 'Rechnung' -> 'Invoice' + [OK ] doc 132: 'Rechnung' -> 'Invoice' + [OK ] doc 155: 'Rechnung' -> 'Invoice' + [OK ] doc 73: 'Rechnung' -> 'Invoice' + [OK ] doc 162: 'Rechnung' -> 'Invoice' + [OK ] doc 94: 'Rechnung' -> 'Invoice' + [OK ] doc 78: 'Rechnung' -> 'Invoice' + [OK ] doc 143: 'Rechnung' -> 'Invoice' + [OK ] doc 106: 'Rechnung' -> 'Invoice' + [OK ] doc 72: 'Rechnung' -> 'Invoice' + [OK ] doc 193: 'Rechnung' -> 'Invoice' + [OK ] doc 194: 'Rechnung' -> 'Invoice' + [OK ] doc 139: 'Rechnung' -> 'Invoice' + [OK ] doc 165: 'Rechnung' -> 'Invoice' + [OK ] doc 133: 'Rechnung' -> 'Invoice' + [OK ] doc 173: 'Rechnung' -> 'Invoice' + [OK ] doc 148: 'Rechnung' -> 'Invoice' + [OK ] doc 147: 'Rechnung' -> 'Invoice' + [OK ] doc 141: 'Rechnung' -> 'Invoice' + [OK ] doc 142: 'Rechnung' -> 'Invoice' + [OK ] doc 231: 'Rechnung' -> 'Invoice' + [OK ] doc 175: 'Rechnungs- und Vertragsinformation' -> 'Information' + [OK ] doc 213: 'Rechtsgeschäft' -> 'Contract' + [OK ] doc 79: 'Schreiben des Finanzamts' -> 'Information' + [OK ] doc 246: 'Steuererklärung' -> 'Information' + [OK ] doc 77: 'Steuererklärung' -> 'Information' + [OK ] doc 257: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 237: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 238: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 240: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 243: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 204: 'Umsatzsteuer-Voranmeldung' -> 'Invoice' + [OK ] doc 229: 'Versicherungsbedingungen' -> 'Contract' + [OK ] doc 129: 'Versicherungsschein' -> 'Contract' + [OK ] doc 112: 'Versicherungsschein' -> 'Contract' + [OK ] doc 130: 'Versicherungsschein' -> 'Contract' + [OK ] doc 128: 'Versicherungsschein' -> 'Contract' + [OK ] doc 226: 'Versicherungsschein' -> 'Contract' + [OK ] doc 131: 'Versicherungsschein' -> 'Contract' + [OK ] doc 118: 'Vertrag' -> 'Contract' + [OK ] doc 199: 'Vertrag' -> 'Contract' + [OK ] doc 87: 'Vertrag' -> 'Contract' + [OK ] doc 89: 'Vertrag' -> 'Contract' + [OK ] doc 232: 'Vertrag' -> 'Contract' + [OK ] doc 123: 'Vertrag' -> 'Contract' + [OK ] doc 190: 'Vertrag' -> 'Contract' + [OK ] doc 177: 'Vertragsdokument' -> 'Contract' + [OK ] doc 82: 'Verwaltungsakt' -> 'Information' + [OK ] doc 176: 'Werbung' -> 'Information' + [OK ] doc 216: 'Wohnflaechenberechnung' -> 'Information' + [OK ] doc 201: 'Wohnflaechenberechnung' -> 'Information' + [OK ] doc 207: 'Wohnflaechenberechnung' -> 'Information' + [DEL] type 4 'Abrechnung' resp='' + [DEL] type 160 'agreement' resp='' + [DEL] type 13 'Angebot' resp='' + [DEL] type 134 'Antrag' resp='' + [DEL] type 141 'Behandlungsplan und Risikoaufklärung' resp='' + [DEL] type 129 'Beratungsprotokoll' resp='' + [DEL] type 143 'Berechnung' resp='' + [DEL] type 148 'Bericht' resp='' + [DEL] type 11 'Bescheid' resp='' + [DEL] type 15 'Bescheinigung' resp='' + [DEL] type 151 'Bestätigungsbrief' resp='' + [DEL] type 30 'Brief' resp='' + [DEL] type 127 'Consent Form' resp='' + [DEL] type 144 'contract' resp='' + [DEL] type 120 'Einladung / Vollmacht / Wirtschaftsplan' resp='' + [DEL] type 113 'Einladung zur Eigentümerversammlung' resp='' + [DEL] type 132 'Einspruchsschreiben' resp='' + [DEL] type 158 'Energy Performance Certificate' resp='' + [DEL] type 128 'Erklarung' resp='' + [DEL] type 156 'Finanzierungsangebot' resp='' + [DEL] type 122 'Geldzuwendungsbestätigung' resp='' + [DEL] type 157 'Guidelines' resp='' + [DEL] type 27 'Gutachten' resp='' + [DEL] type 136 'informational document' resp='' + [DEL] type 139 'Informationsschreiben' resp='' + [DEL] type 137 'Kaufvertrag' resp='' + [DEL] type 118 'Kontoauszug' resp='' + [DEL] type 117 'Kontoübersicht' resp='' + [DEL] type 145 'Kostenabrechnung' resp='' + [DEL] type 121 'Kostenvoranmeldung' resp='' + [DEL] type 142 'Kreditvertrag' resp='' + [DEL] type 114 'Kundeninformation' resp='' + [DEL] type 83 'Leistungsnachweis' resp='' + [DEL] type 135 'Loan Application and Agreement' resp='' + [DEL] type 66 'Lohnsteuerbescheinigung' resp='' + [DEL] type 147 'Mahnschreiben' resp='' + [DEL] type 140 'Medical Consent Form' resp='' + [DEL] type 150 'medical documentation' resp='' + [DEL] type 41 'Medizinisch' resp='' + [DEL] type 12 'Mitteilung' resp='' + [DEL] type 161 'Notarial Deed' resp='' + [DEL] type 159 'Notarized agreement with amendments' resp='' + [DEL] type 133 'Plan' resp='' + [DEL] type 131 'policy' resp='' + [DEL] type 116 'Questionnaire/Declaration Form' resp='' + [DEL] type 2 'Rechnung' resp='' + [DEL] type 149 'Rechnungs- und Vertragsinformation' resp='' + [DEL] type 125 'Rechtlicher Vertrag' resp='' + [DEL] type 155 'Rechtsgeschäft' resp='' + [DEL] type 126 'recommendation' resp='' + [DEL] type 152 'Schreiben des Finanzamts' resp='' + [DEL] type 119 'Steuerdokument' resp='' + [DEL] type 115 'Steuererklärung' resp='' + [DEL] type 124 'Tilgungsplan' resp='' + [DEL] type 88 'Umsatzsteuer-Voranmeldung' resp='' + [DEL] type 130 'Versicherungsbedingungen' resp='' + [DEL] type 67 'Versicherungsschein' resp='' + [DEL] type 40 'Vertrag' resp='' + [DEL] type 153 'Vertragsdokument' resp='' + [DEL] type 154 'Verwaltungsakt' resp='' + [DEL] type 146 'Werbung' resp='' + [DEL] type 73 'Wohnflächenberechnung' resp='' + [DEL] type 123 'Wohnflaechenberechnung' resp='' +done. diff --git a/infra/paperless/migrate_types.py b/infra/paperless/migrate_types.py new file mode 100644 index 0000000..457534a --- /dev/null +++ b/infra/paperless/migrate_types.py @@ -0,0 +1,279 @@ +""" +Collapse Paperless document types 69 -> 10, per the mapping agreed in +otto#429. + +Run locally on mDock against the live Paperless API. Default mode is +DRY RUN — prints what would change without writing. Pass --apply to +actually PATCH docs and DELETE old types. + +Usage: + python3 migrate_types.py # dry run + python3 migrate_types.py --apply # live +""" +import os +import sys +import json +import subprocess +import argparse + +# The 10 canonical target types (Paperless type IDs after Step 3). +TARGET = { + "Invoice": 162, + "Contract": 163, + "Information": 164, + "Personal Correspondence": 165, + "Vollmacht": 22, + "Urkunde": 37, + "Steuerbescheid": 138, + "Anleitung": 76, + "Protokoll": 32, + "Formular": 80, +} + +# Mapping: old type *name* -> target canonical name. +# Built from the audit doc's mapping table. Anything not listed here +# stays at its current type (and gets surfaced as "unmapped" so we +# can decide manually). +MAP = { + # ----- Invoice ------------------------------------------------ + "Rechnung": "Invoice", + "Abrechnung": "Invoice", + "Mahnschreiben": "Invoice", + "Kontoauszug": "Invoice", + "Kontoübersicht": "Invoice", + "Kostenabrechnung": "Invoice", + "Kostenvoranmeldung": "Invoice", + "Umsatzsteuer-Voranmeldung": "Invoice", + "Tilgungsplan": "Invoice", + "Lohnsteuerbescheinigung": "Invoice", + + # ----- Contract ----------------------------------------------- + "Vertrag": "Contract", + "Versicherungsschein": "Contract", + "Kaufvertrag": "Contract", + "Kreditvertrag": "Contract", + "Notarial Deed": "Contract", + "agreement": "Contract", + "contract": "Contract", + "policy": "Contract", + "Vertragsdokument": "Contract", + "Rechtsgeschäft": "Contract", + "Rechtlicher Vertrag": "Contract", + "Versicherungsbedingungen": "Contract", + "Finanzierungsangebot": "Contract", + "Loan Application and Agreement": "Contract", + "Notarized agreement with amendments": "Contract", + + # ----- Information -------------------------------------------- + "Bescheid": "Information", + "Bescheinigung": "Information", + "Mitteilung": "Information", + "Verwaltungsakt": "Information", + "Schreiben des Finanzamts": "Information", + "Informationsschreiben": "Information", + "informational document": "Information", + "Kundeninformation": "Information", + "Werbung": "Information", + "Bestätigungsbrief": "Information", + "Geldzuwendungsbestätigung": "Information", + "Antrag": "Information", + "Erklarung": "Information", + "Leistungsnachweis": "Information", + "Beratungsprotokoll": "Information", + "Gutachten": "Information", + "Bericht": "Information", + "Berechnung": "Information", + "Wohnflaechenberechnung": "Information", + "Wohnflächenberechnung": "Information", + "Guidelines": "Information", + "Energy Performance Certificate": "Information", + "Einladung zur Eigentümerversammlung": "Information", + "Einladung / Vollmacht / Wirtschaftsplan": "Information", + "Steuerdokument": "Information", + "Steuererklärung": "Information", + "Plan": "Information", + "Einspruchsschreiben": "Information", + "Angebot": "Information", + "recommendation": "Information", + "Behandlungsplan und Risikoaufklärung": "Information", + "Medical Consent Form": "Information", + "Consent Form": "Information", + "Medizinisch": "Information", + "medical documentation": "Information", + "Questionnaire/Declaration Form": "Information", + "Rechnungs- und Vertragsinformation": "Information", + + # ----- Personal Correspondence -------------------------------- + # Per m's explicit answer: Brief defaults to Information. + # Personal Correspondence is opt-in for letters that are clearly + # from a private person; the AI applies it going forward on a + # case-by-case basis. For the migration of the 21 existing + # Briefe (none of which we can read here to distinguish), they + # land in Information — the safe default m chose. + "Brief": "Information", +} + + +import shlex + + +def gitea_curl(token, path, method="GET", body=None): + inner_parts = [ + "curl", "-s", + "-X", method, + "-H", f"Authorization: Token {token}", + ] + if body is not None: + inner_parts += ["-H", "Content-Type: application/json", "-d", json.dumps(body)] + inner_parts.append(f"http://localhost:8000/api{path}") + inner = " ".join(shlex.quote(p) for p in inner_parts) + full = f"docker exec paperless-webserver-1 {inner}" + out = subprocess.run( + ["ssh", "mdock", full], capture_output=True, text=True, timeout=120, + ) + if out.returncode != 0: + raise RuntimeError(f"curl failed rc={out.returncode}: {out.stderr}") + return out.stdout + + +def get_token(): + out = subprocess.run( + ["ssh", "mdock", "docker exec paperless-ai sh -c 'grep ^PAPERLESS_API_TOKEN /app/data/.env | cut -d= -f2'"], + capture_output=True, text=True, timeout=15, + ) + return out.stdout.strip() + + +def fetch_all(token, path): + """GET path paged; returns flat list of results.""" + results = [] + page = 1 + while True: + raw = gitea_curl(token, f"{path}?page={page}&page_size=200") + data = json.loads(raw) + results.extend(data.get("results", [])) + if not data.get("next"): + break + page += 1 + return results + + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("--apply", action="store_true", help="Actually write changes") + args = ap.parse_args() + + token = get_token() + if not token: + sys.exit("no PAPERLESS_API_TOKEN found") + + types = fetch_all(token, "/document_types/") + docs = fetch_all(token, "/documents/") + print(f"loaded {len(types)} types, {len(docs)} docs") + + type_by_id = {t["id"]: t for t in types} + type_by_name = {t["name"]: t for t in types} + + # Sanity: verify all 10 targets exist + for name, tid in TARGET.items(): + t = type_by_id.get(tid) + if not t or t["name"] != name: + sys.exit(f"target type missing or mismatched: id={tid} expected name={name!r} got={t}") + print("all 10 target types verified") + + # Build plan + moves = [] # list of (doc_id, current_type_name, new_type_id) + unmapped_types = [] + delete_candidates = [] + + for t in types: + if t["id"] in TARGET.values(): + continue # keep + target_name = MAP.get(t["name"]) + if target_name is None: + if t["document_count"] == 0: + delete_candidates.append(t) + else: + unmapped_types.append(t) + continue + new_tid = TARGET[target_name] + # Find docs with this type + for d in docs: + if d.get("document_type") == t["id"]: + moves.append((d["id"], t["name"], new_tid, target_name)) + # Old type becomes deletable after all its docs are moved + delete_candidates.append(t) + + print() + print(f"=== PLAN ===") + print(f"document moves: {len(moves)}") + print(f"types to delete (after moves): {len(delete_candidates)}") + print(f"types NOT mapped + nonzero docs (need manual call): {len(unmapped_types)}") + if unmapped_types: + print(" -- unmapped --") + for t in unmapped_types: + print(f" id={t['id']:3d} count={t['document_count']:3d} name={t['name']!r}") + print() + print("=== MOVES SUMMARY (per target type) ===") + counter = {} + for _, old_name, _, new_name in moves: + counter[new_name] = counter.get(new_name, {}) + counter[new_name][old_name] = counter[new_name].get(old_name, 0) + 1 + for new_name, src in sorted(counter.items()): + total = sum(src.values()) + print(f" -> {new_name} (+{total} docs)") + for old_name, n in sorted(src.items(), key=lambda kv: -kv[1]): + print(f" {n:3d} from {old_name}") + + print() + print("=== TYPES TO DELETE (after moves) ===") + for t in delete_candidates: + print(f" id={t['id']:3d} count={t['document_count']:3d} name={t['name']!r}") + + if not args.apply: + print() + print("DRY RUN — re-run with --apply to write changes") + return + + # Audit trail BEFORE writing + import datetime + audit_path = f"/tmp/migrate_types_audit_{datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%S')}.json" + audit = { + "ts_utc": datetime.datetime.utcnow().isoformat() + "Z", + "types_snapshot": [ + {"id": t["id"], "name": t["name"], "document_count": t["document_count"]} + for t in types + ], + "moves": [ + {"doc_id": d_id, "old_type_name": old_name, "new_type_id": ntid, "new_type_name": nname} + for d_id, old_name, ntid, nname in moves + ], + "deletes": [ + {"id": t["id"], "name": t["name"], "document_count_before": t["document_count"]} + for t in delete_candidates + ], + } + with open(audit_path, "w") as f: + json.dump(audit, f, indent=2, ensure_ascii=False) + print(f"audit trail written: {audit_path}") + print() + print("=== APPLY ===") + for doc_id, old_name, new_tid, new_name in moves: + r = gitea_curl(token, f"/documents/{doc_id}/", method="PATCH", body={"document_type": new_tid}) + try: + d = json.loads(r) + ok = d.get("id") == doc_id + except Exception: + ok = False + flag = "OK " if ok else "ERR" + print(f" [{flag}] doc {doc_id}: {old_name!r} -> {new_name!r}") + for t in delete_candidates: + r = gitea_curl(token, f"/document_types/{t['id']}/", method="DELETE") + # Paperless DELETE returns empty 204 on success + print(f" [DEL] type {t['id']} {t['name']!r} resp={r[:80]!r}") + + print("done.") + + +if __name__ == "__main__": + main() diff --git a/infra/paperless/migrate_types_audit_20260513T085119.json b/infra/paperless/migrate_types_audit_20260513T085119.json new file mode 100644 index 0000000..18d578a --- /dev/null +++ b/infra/paperless/migrate_types_audit_20260513T085119.json @@ -0,0 +1,1715 @@ +{ + "ts_utc": "2026-05-13T08:51:19.869538Z", + "types_snapshot": [ + { + "id": 4, + "name": "Abrechnung", + "document_count": 11 + }, + { + "id": 160, + "name": "agreement", + "document_count": 1 + }, + { + "id": 13, + "name": "Angebot", + "document_count": 1 + }, + { + "id": 76, + "name": "Anleitung", + "document_count": 3 + }, + { + "id": 134, + "name": "Antrag", + "document_count": 1 + }, + { + "id": 141, + "name": "Behandlungsplan und Risikoaufklärung", + "document_count": 1 + }, + { + "id": 129, + "name": "Beratungsprotokoll", + "document_count": 1 + }, + { + "id": 143, + "name": "Berechnung", + "document_count": 1 + }, + { + "id": 148, + "name": "Bericht", + "document_count": 1 + }, + { + "id": 11, + "name": "Bescheid", + "document_count": 17 + }, + { + "id": 15, + "name": "Bescheinigung", + "document_count": 21 + }, + { + "id": 151, + "name": "Bestätigungsbrief", + "document_count": 1 + }, + { + "id": 30, + "name": "Brief", + "document_count": 21 + }, + { + "id": 127, + "name": "Consent Form", + "document_count": 0 + }, + { + "id": 163, + "name": "Contract", + "document_count": 0 + }, + { + "id": 144, + "name": "contract", + "document_count": 1 + }, + { + "id": 120, + "name": "Einladung / Vollmacht / Wirtschaftsplan", + "document_count": 0 + }, + { + "id": 113, + "name": "Einladung zur Eigentümerversammlung", + "document_count": 2 + }, + { + "id": 132, + "name": "Einspruchsschreiben", + "document_count": 0 + }, + { + "id": 158, + "name": "Energy Performance Certificate", + "document_count": 1 + }, + { + "id": 128, + "name": "Erklarung", + "document_count": 1 + }, + { + "id": 156, + "name": "Finanzierungsangebot", + "document_count": 1 + }, + { + "id": 80, + "name": "Formular", + "document_count": 6 + }, + { + "id": 122, + "name": "Geldzuwendungsbestätigung", + "document_count": 0 + }, + { + "id": 157, + "name": "Guidelines", + "document_count": 1 + }, + { + "id": 27, + "name": "Gutachten", + "document_count": 1 + }, + { + "id": 164, + "name": "Information", + "document_count": 0 + }, + { + "id": 136, + "name": "informational document", + "document_count": 1 + }, + { + "id": 139, + "name": "Informationsschreiben", + "document_count": 1 + }, + { + "id": 162, + "name": "Invoice", + "document_count": 1 + }, + { + "id": 137, + "name": "Kaufvertrag", + "document_count": 0 + }, + { + "id": 118, + "name": "Kontoauszug", + "document_count": 1 + }, + { + "id": 117, + "name": "Kontoübersicht", + "document_count": 1 + }, + { + "id": 145, + "name": "Kostenabrechnung", + "document_count": 1 + }, + { + "id": 121, + "name": "Kostenvoranmeldung", + "document_count": 1 + }, + { + "id": 142, + "name": "Kreditvertrag", + "document_count": 1 + }, + { + "id": 114, + "name": "Kundeninformation", + "document_count": 0 + }, + { + "id": 83, + "name": "Leistungsnachweis", + "document_count": 2 + }, + { + "id": 135, + "name": "Loan Application and Agreement", + "document_count": 1 + }, + { + "id": 66, + "name": "Lohnsteuerbescheinigung", + "document_count": 4 + }, + { + "id": 147, + "name": "Mahnschreiben", + "document_count": 1 + }, + { + "id": 140, + "name": "Medical Consent Form", + "document_count": 1 + }, + { + "id": 150, + "name": "medical documentation", + "document_count": 1 + }, + { + "id": 41, + "name": "Medizinisch", + "document_count": 2 + }, + { + "id": 12, + "name": "Mitteilung", + "document_count": 7 + }, + { + "id": 161, + "name": "Notarial Deed", + "document_count": 1 + }, + { + "id": 159, + "name": "Notarized agreement with amendments", + "document_count": 1 + }, + { + "id": 165, + "name": "Personal Correspondence", + "document_count": 0 + }, + { + "id": 133, + "name": "Plan", + "document_count": 0 + }, + { + "id": 131, + "name": "policy", + "document_count": 0 + }, + { + "id": 32, + "name": "Protokoll", + "document_count": 2 + }, + { + "id": 116, + "name": "Questionnaire/Declaration Form", + "document_count": 0 + }, + { + "id": 2, + "name": "Rechnung", + "document_count": 26 + }, + { + "id": 149, + "name": "Rechnungs- und Vertragsinformation", + "document_count": 1 + }, + { + "id": 125, + "name": "Rechtlicher Vertrag", + "document_count": 0 + }, + { + "id": 155, + "name": "Rechtsgeschäft", + "document_count": 1 + }, + { + "id": 126, + "name": "recommendation", + "document_count": 0 + }, + { + "id": 152, + "name": "Schreiben des Finanzamts", + "document_count": 1 + }, + { + "id": 138, + "name": "Steuerbescheid", + "document_count": 3 + }, + { + "id": 119, + "name": "Steuerdokument", + "document_count": 0 + }, + { + "id": 115, + "name": "Steuererklärung", + "document_count": 2 + }, + { + "id": 124, + "name": "Tilgungsplan", + "document_count": 0 + }, + { + "id": 88, + "name": "Umsatzsteuer-Voranmeldung", + "document_count": 6 + }, + { + "id": 37, + "name": "Urkunde", + "document_count": 3 + }, + { + "id": 130, + "name": "Versicherungsbedingungen", + "document_count": 1 + }, + { + "id": 67, + "name": "Versicherungsschein", + "document_count": 6 + }, + { + "id": 40, + "name": "Vertrag", + "document_count": 7 + }, + { + "id": 153, + "name": "Vertragsdokument", + "document_count": 1 + }, + { + "id": 154, + "name": "Verwaltungsakt", + "document_count": 1 + }, + { + "id": 22, + "name": "Vollmacht", + "document_count": 5 + }, + { + "id": 146, + "name": "Werbung", + "document_count": 1 + }, + { + "id": 73, + "name": "Wohnflächenberechnung", + "document_count": 0 + }, + { + "id": 123, + "name": "Wohnflaechenberechnung", + "document_count": 3 + } + ], + "moves": [ + { + "doc_id": 104, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 124, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 88, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 134, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 122, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 71, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 220, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 223, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 224, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 255, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 248, + "old_type_name": "Abrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 200, + "old_type_name": "agreement", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 222, + "old_type_name": "Angebot", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 98, + "old_type_name": "Antrag", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 91, + "old_type_name": "Behandlungsplan und Risikoaufklärung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 228, + "old_type_name": "Beratungsprotokoll", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 202, + "old_type_name": "Berechnung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 96, + "old_type_name": "Bericht", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 160, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 95, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 86, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 159, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 183, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 101, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 81, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 69, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 70, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 85, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 236, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 253, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 250, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 233, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 234, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 235, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 76, + "old_type_name": "Bescheid", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 260, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 182, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 100, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 178, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 166, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 192, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 75, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 179, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 186, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 168, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 262, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 261, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 259, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 242, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 239, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 245, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 252, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 219, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 205, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 247, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 230, + "old_type_name": "Bescheinigung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 152, + "old_type_name": "Bestätigungsbrief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 244, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 164, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 146, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 169, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 191, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 105, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 188, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 115, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 97, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 196, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 74, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 113, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 102, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 126, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 195, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 110, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 170, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 180, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 116, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 127, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 149, + "old_type_name": "Brief", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 227, + "old_type_name": "contract", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 156, + "old_type_name": "Einladung zur Eigentümerversammlung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 119, + "old_type_name": "Einladung zur Eigentümerversammlung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 163, + "old_type_name": "Energy Performance Certificate", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 251, + "old_type_name": "Erklarung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 217, + "old_type_name": "Finanzierungsangebot", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 154, + "old_type_name": "Guidelines", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 158, + "old_type_name": "Gutachten", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 218, + "old_type_name": "informational document", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 185, + "old_type_name": "Informationsschreiben", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 189, + "old_type_name": "Kontoauszug", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 187, + "old_type_name": "Kontoübersicht", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 121, + "old_type_name": "Kostenabrechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 107, + "old_type_name": "Kostenvoranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 212, + "old_type_name": "Kreditvertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 256, + "old_type_name": "Leistungsnachweis", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 241, + "old_type_name": "Leistungsnachweis", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 214, + "old_type_name": "Loan Application and Agreement", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 167, + "old_type_name": "Lohnsteuerbescheinigung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 254, + "old_type_name": "Lohnsteuerbescheinigung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 258, + "old_type_name": "Lohnsteuerbescheinigung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 249, + "old_type_name": "Lohnsteuerbescheinigung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 80, + "old_type_name": "Mahnschreiben", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 138, + "old_type_name": "Medical Consent Form", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 136, + "old_type_name": "medical documentation", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 135, + "old_type_name": "Medizinisch", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 197, + "old_type_name": "Medizinisch", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 109, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 144, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 181, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 111, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 150, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 184, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 108, + "old_type_name": "Mitteilung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 206, + "old_type_name": "Notarial Deed", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 203, + "old_type_name": "Notarized agreement with amendments", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 151, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 90, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 93, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 92, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 161, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 140, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 132, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 155, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 73, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 162, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 94, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 78, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 143, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 106, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 72, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 193, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 194, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 139, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 165, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 133, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 173, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 148, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 147, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 141, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 142, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 231, + "old_type_name": "Rechnung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 175, + "old_type_name": "Rechnungs- und Vertragsinformation", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 213, + "old_type_name": "Rechtsgeschäft", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 79, + "old_type_name": "Schreiben des Finanzamts", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 246, + "old_type_name": "Steuererklärung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 77, + "old_type_name": "Steuererklärung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 257, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 237, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 238, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 240, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 243, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 204, + "old_type_name": "Umsatzsteuer-Voranmeldung", + "new_type_id": 162, + "new_type_name": "Invoice" + }, + { + "doc_id": 229, + "old_type_name": "Versicherungsbedingungen", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 129, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 112, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 130, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 128, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 226, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 131, + "old_type_name": "Versicherungsschein", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 118, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 199, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 87, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 89, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 232, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 123, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 190, + "old_type_name": "Vertrag", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 177, + "old_type_name": "Vertragsdokument", + "new_type_id": 163, + "new_type_name": "Contract" + }, + { + "doc_id": 82, + "old_type_name": "Verwaltungsakt", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 176, + "old_type_name": "Werbung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 216, + "old_type_name": "Wohnflaechenberechnung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 201, + "old_type_name": "Wohnflaechenberechnung", + "new_type_id": 164, + "new_type_name": "Information" + }, + { + "doc_id": 207, + "old_type_name": "Wohnflaechenberechnung", + "new_type_id": 164, + "new_type_name": "Information" + } + ], + "deletes": [ + { + "id": 4, + "name": "Abrechnung", + "document_count_before": 11 + }, + { + "id": 160, + "name": "agreement", + "document_count_before": 1 + }, + { + "id": 13, + "name": "Angebot", + "document_count_before": 1 + }, + { + "id": 134, + "name": "Antrag", + "document_count_before": 1 + }, + { + "id": 141, + "name": "Behandlungsplan und Risikoaufklärung", + "document_count_before": 1 + }, + { + "id": 129, + "name": "Beratungsprotokoll", + "document_count_before": 1 + }, + { + "id": 143, + "name": "Berechnung", + "document_count_before": 1 + }, + { + "id": 148, + "name": "Bericht", + "document_count_before": 1 + }, + { + "id": 11, + "name": "Bescheid", + "document_count_before": 17 + }, + { + "id": 15, + "name": "Bescheinigung", + "document_count_before": 21 + }, + { + "id": 151, + "name": "Bestätigungsbrief", + "document_count_before": 1 + }, + { + "id": 30, + "name": "Brief", + "document_count_before": 21 + }, + { + "id": 127, + "name": "Consent Form", + "document_count_before": 0 + }, + { + "id": 144, + "name": "contract", + "document_count_before": 1 + }, + { + "id": 120, + "name": "Einladung / Vollmacht / Wirtschaftsplan", + "document_count_before": 0 + }, + { + "id": 113, + "name": "Einladung zur Eigentümerversammlung", + "document_count_before": 2 + }, + { + "id": 132, + "name": "Einspruchsschreiben", + "document_count_before": 0 + }, + { + "id": 158, + "name": "Energy Performance Certificate", + "document_count_before": 1 + }, + { + "id": 128, + "name": "Erklarung", + "document_count_before": 1 + }, + { + "id": 156, + "name": "Finanzierungsangebot", + "document_count_before": 1 + }, + { + "id": 122, + "name": "Geldzuwendungsbestätigung", + "document_count_before": 0 + }, + { + "id": 157, + "name": "Guidelines", + "document_count_before": 1 + }, + { + "id": 27, + "name": "Gutachten", + "document_count_before": 1 + }, + { + "id": 136, + "name": "informational document", + "document_count_before": 1 + }, + { + "id": 139, + "name": "Informationsschreiben", + "document_count_before": 1 + }, + { + "id": 137, + "name": "Kaufvertrag", + "document_count_before": 0 + }, + { + "id": 118, + "name": "Kontoauszug", + "document_count_before": 1 + }, + { + "id": 117, + "name": "Kontoübersicht", + "document_count_before": 1 + }, + { + "id": 145, + "name": "Kostenabrechnung", + "document_count_before": 1 + }, + { + "id": 121, + "name": "Kostenvoranmeldung", + "document_count_before": 1 + }, + { + "id": 142, + "name": "Kreditvertrag", + "document_count_before": 1 + }, + { + "id": 114, + "name": "Kundeninformation", + "document_count_before": 0 + }, + { + "id": 83, + "name": "Leistungsnachweis", + "document_count_before": 2 + }, + { + "id": 135, + "name": "Loan Application and Agreement", + "document_count_before": 1 + }, + { + "id": 66, + "name": "Lohnsteuerbescheinigung", + "document_count_before": 4 + }, + { + "id": 147, + "name": "Mahnschreiben", + "document_count_before": 1 + }, + { + "id": 140, + "name": "Medical Consent Form", + "document_count_before": 1 + }, + { + "id": 150, + "name": "medical documentation", + "document_count_before": 1 + }, + { + "id": 41, + "name": "Medizinisch", + "document_count_before": 2 + }, + { + "id": 12, + "name": "Mitteilung", + "document_count_before": 7 + }, + { + "id": 161, + "name": "Notarial Deed", + "document_count_before": 1 + }, + { + "id": 159, + "name": "Notarized agreement with amendments", + "document_count_before": 1 + }, + { + "id": 133, + "name": "Plan", + "document_count_before": 0 + }, + { + "id": 131, + "name": "policy", + "document_count_before": 0 + }, + { + "id": 116, + "name": "Questionnaire/Declaration Form", + "document_count_before": 0 + }, + { + "id": 2, + "name": "Rechnung", + "document_count_before": 26 + }, + { + "id": 149, + "name": "Rechnungs- und Vertragsinformation", + "document_count_before": 1 + }, + { + "id": 125, + "name": "Rechtlicher Vertrag", + "document_count_before": 0 + }, + { + "id": 155, + "name": "Rechtsgeschäft", + "document_count_before": 1 + }, + { + "id": 126, + "name": "recommendation", + "document_count_before": 0 + }, + { + "id": 152, + "name": "Schreiben des Finanzamts", + "document_count_before": 1 + }, + { + "id": 119, + "name": "Steuerdokument", + "document_count_before": 0 + }, + { + "id": 115, + "name": "Steuererklärung", + "document_count_before": 2 + }, + { + "id": 124, + "name": "Tilgungsplan", + "document_count_before": 0 + }, + { + "id": 88, + "name": "Umsatzsteuer-Voranmeldung", + "document_count_before": 6 + }, + { + "id": 130, + "name": "Versicherungsbedingungen", + "document_count_before": 1 + }, + { + "id": 67, + "name": "Versicherungsschein", + "document_count_before": 6 + }, + { + "id": 40, + "name": "Vertrag", + "document_count_before": 7 + }, + { + "id": 153, + "name": "Vertragsdokument", + "document_count_before": 1 + }, + { + "id": 154, + "name": "Verwaltungsakt", + "document_count_before": 1 + }, + { + "id": 146, + "name": "Werbung", + "document_count_before": 1 + }, + { + "id": 73, + "name": "Wohnflächenberechnung", + "document_count_before": 0 + }, + { + "id": 123, + "name": "Wohnflaechenberechnung", + "document_count_before": 3 + } + ] +} \ No newline at end of file diff --git a/infra/samba-canon/Dockerfile b/infra/samba-canon/Dockerfile new file mode 100644 index 0000000..d48fa69 --- /dev/null +++ b/infra/samba-canon/Dockerfile @@ -0,0 +1,18 @@ +FROM alpine:3.13 + +RUN apk add --no-cache \ + samba \ + samba-common-tools \ + shadow \ + && rm -rf /var/cache/apk/* + +RUN rm -rf /etc/samba/* /var/lib/samba/* /var/log/samba/* \ + && mkdir -p /etc/samba /var/lib/samba/private /var/log/samba /var/run/samba /inbox + +COPY smb.conf /etc/samba/smb.conf +COPY entrypoint.sh /entrypoint.sh +RUN chmod 0755 /entrypoint.sh + +EXPOSE 139 445 + +ENTRYPOINT ["/entrypoint.sh"] diff --git a/infra/samba-canon/README.md b/infra/samba-canon/README.md new file mode 100644 index 0000000..abc1c6d --- /dev/null +++ b/infra/samba-canon/README.md @@ -0,0 +1,120 @@ +# samba-canon — SMB bridge for the Canon MAXIFY MB5100 + +Old-Samba container on mDock that gives the Canon MB5100 (2014, SMB1 + +NTLMv1 only) a writable share. Scans land in `/mnt/mdms/inbox/` and are +picked up by Paperless within 60s via the existing consume-folder poll. + +## Why this exists + +The Canon MAXIFY MB5100 only supports SMB Shared Folder as a scan +destination (no FTP, no WebDAV — see the [official manual][canon-manual]). +It speaks SMB1 with NTLMv1 auth. + +Direct scan-to-TrueNAS fails reproducibly even with `enable_smb1=true` + +`ntlmv1_auth=true` flipped on TrueNAS Core: the TrueNAS-Samba (4.19+) ships +extra SMB1 hardening that breaks the printer's handshake. `smb1_process.c:502` +logs `NT_STATUS_CONNECTION_RESET` — the printer closes the TCP socket before +the first SMB packet is processed. + +Rather than fight TrueNAS hardening, this container runs a deliberately old +Samba (4.13.17 on Alpine 3.13) on mDock, bound to mDock's LAN interface +only, and writes received files straight to the NFS-mounted Paperless +inbox. + +The TrueNAS SMB stack stays untouched — mBreeze and mPebble keep mounting +`mDMS` directly from TrueNAS as before. + +[canon-manual]: https://ij.manual.canon/ij/webmanual/Manual/All/MB5100%20series/EN/UG/ug_scanning0700.html + +## Layout + +| File | Purpose | +| ----------------- | ---------------------------------------------------------- | +| `Dockerfile` | `alpine:3.13` + samba 4.13.17, ~46 MiB image | +| `smb.conf` | NT1 server, NTLMv1 + LANMAN enabled, single `[inbox]` share | +| `entrypoint.sh` | Creates `canon` user at UID 1000, sets pw from env, runs smbd | +| `docker-compose.yml` | Binds 445/139 on the LAN IP only, mounts `/mnt/mdms/inbox` | + +These files are a **traceable copy** of what lives in `~/samba-canon/` on +mDock (same convention as `infra/paperless/`). If you change the live config +on mDock, sync the change here in the same commit. + +## Deploy + +```bash +scp infra/samba-canon/{Dockerfile,smb.conf,entrypoint.sh,docker-compose.yml} \ + mdock:~/samba-canon/ +ssh mdock 'cd ~/samba-canon && docker compose up -d --build' +``` + +The real `CANON_PASSWORD` lives in `~/samba-canon/.env` on mDock (chmod 600, +not committed). Rotate by editing `.env` and `docker compose restart` — +`entrypoint.sh` re-applies the password to the Samba TDB on every boot. + +## Canon Quick Utility Toolbox values + +Use these exact values in the printer's "Destination Settings → Folder" +entry (Canon Drucker Quick Utility Toolbox → Destination Folder Settings): + +| Field | Value | +| ---------------- | ---------------------------------------------- | +| Display name | `mDock Inbox` (any label) | +| SMB server name | `192.168.178.131` (mDock LAN IP — not `mdock`, the printer does no DNS) | +| Shared folder | `inbox` | +| Domain / Workgroup | leave blank, or `WORKGROUP` | +| User | `canon` | +| Password | (from `~/samba-canon/.env` on mDock — `CANON_PASSWORD`) | +| Port | leave default (445) — non-standard ports are not supported by the printer | + +The printer's connection-test should report success. + +## Verification (replayed during deploy) + +1. **`smbclient` listing from a known-good client.** From mBreeze: + + ```bash + smbutil view -A "//canon:@192.168.178.131" + # → "Authenticate successfully with //canon:…@192.168.178.131" + ``` + +2. **Mount + write from mBreeze.** + + ```bash + mkdir -p /tmp/canon-test + mount -t smbfs "//canon:@192.168.178.131/inbox" /tmp/canon-test + touch /tmp/canon-test/probe.txt + ls -la /mnt/mdms/inbox/probe.txt # on mDock — should show m:m, mode 0664 + umount /tmp/canon-test + ``` + +3. **Toolbox connection test** — green tick (m runs this once during setup). + +4. **Real scan from the ADF** — PDF lands in `/mnt/mdms/inbox/`, Paperless + polls within 60 s, OCR + AI-typing run, file moves to + `//...` (existing Paperless pipeline, see `infra/paperless/`). + +5. **Survives mDock reboot.** `docker compose up -d` sets + `restart: unless-stopped`. Verified via `docker restart samba-canon` — + container comes back up and shares are reachable within ~5 s. + +## Security notes + +- LAN-only. The compose binds `192.168.178.131:445` and `192.168.178.131:139`, + not `0.0.0.0`. The container is not reachable from Tailscale or the + internet. +- SMB1 + NTLMv1 are insecure by design. Acceptable here because the threat + model is "untrusted devices on the home LAN", and the only client is the + printer. **Do not expose this share to anything except the Canon.** +- The `canon` user is a Samba-only account (`/sbin/nologin`, no system + password, no shell). It maps to UID 1000 inside the container so that + files written through SMB land as `m:m` on the host NFS mount. +- If `CANON_PASSWORD` leaks, rotate it: edit `~/samba-canon/.env` on mDock, + `docker compose restart samba-canon`, and re-enter the new password in + the Canon Toolbox. + +## Out of scope + +- TLS / encrypted SMB — incompatible with the printer; LAN-only mitigates. +- Multi-user — only the printer needs to write here. +- Replacing the TrueNAS SMB stack mBreeze/mPebble already use. +- Replacing the printer — m wants to keep the MB5100 working. diff --git a/infra/samba-canon/docker-compose.yml b/infra/samba-canon/docker-compose.yml new file mode 100644 index 0000000..fdd0f3c --- /dev/null +++ b/infra/samba-canon/docker-compose.yml @@ -0,0 +1,36 @@ +services: + samba-canon: + build: + context: . + dockerfile: Dockerfile + image: samba-canon:alpine3.13 + container_name: samba-canon + restart: unless-stopped + # The Canon MAXIFY MB5100 only speaks SMB on the standard ports — non-standard + # ports are not configurable in the printer. So we bind 445/139 on the LAN + # interface only (mDock's LAN IP), keeping Tailscale out of scope. + ports: + - "192.168.178.131:445:445/tcp" + - "192.168.178.131:139:139/tcp" + volumes: + # /mnt/mdms/inbox is NFS-mounted on mDock from TrueNAS (192.168.178.124). + # Paperless's consume folder polls /mnt/mdms/inbox every 60s, so scans + # land here and are picked up by Paperless without further wiring. + - /mnt/mdms/inbox:/inbox:rw + environment: + # canon user inside the container is created with this UID/GID at boot. + # 1000 = m on mDock, which also owns /mnt/mdms/inbox. + PUID: "1000" + PGID: "1000" + # Real password is in .env (gitignored); see README.md. + CANON_PASSWORD: "${CANON_PASSWORD:?CANON_PASSWORD must be set in .env}" + # smbd needs the full default cap set (SETUID/SETGID to honour `force user`, + # CHOWN/FOWNER/DAC_OVERRIDE for file creation, NET_BIND_SERVICE for <1024). + # We rely on Docker defaults rather than cap_drop ALL + a hand-picked list. + # Light healthcheck — smbd answers `smbclient -L` once it's up. + healthcheck: + test: ["CMD-SHELL", "smbclient -L //127.0.0.1 -U canon%${CANON_PASSWORD} -m SMB3 >/dev/null 2>&1 || smbclient -L //127.0.0.1 -U canon%${CANON_PASSWORD} -m NT1 >/dev/null 2>&1"] + interval: 60s + timeout: 10s + retries: 3 + start_period: 15s diff --git a/infra/samba-canon/entrypoint.sh b/infra/samba-canon/entrypoint.sh new file mode 100644 index 0000000..5aa6317 --- /dev/null +++ b/infra/samba-canon/entrypoint.sh @@ -0,0 +1,41 @@ +#!/bin/sh +set -eu + +# Map the in-container "canon" user to the same UID/GID as `m` on the host +# (UID 1000 / GID 1000). force user = canon in smb.conf then guarantees that +# every file written through SMB lands as m:m on the NFS-mounted /mnt/mdms/inbox. +TARGET_UID="${PUID:-1000}" +TARGET_GID="${PGID:-1000}" + +if ! getent group canon >/dev/null 2>&1; then + addgroup -g "${TARGET_GID}" canon +fi + +if ! getent passwd canon >/dev/null 2>&1; then + adduser -D -H -u "${TARGET_UID}" -G canon -s /sbin/nologin canon +fi + +if [ -z "${CANON_PASSWORD:-}" ]; then + echo "FATAL: CANON_PASSWORD env var is required" >&2 + exit 1 +fi + +# (Re)apply the Samba password every boot so rotating it = restart the container. +printf '%s\n%s\n' "${CANON_PASSWORD}" "${CANON_PASSWORD}" | smbpasswd -s -a canon >/dev/null +smbpasswd -e canon >/dev/null + +# Verify the bind-mounted /inbox exists and is writable from the container. +# smbd will drop privilege per session to the canon user (uid 1000), which +# matches m on the host — files therefore land as m:m on the NFS mount. +if ! test -d /inbox; then + echo "FATAL: /inbox missing — bind mount /mnt/mdms/inbox not set." >&2 + exit 1 +fi +if ! test -w /inbox; then + echo "FATAL: /inbox not writable. Check NFS mount + permissions on /mnt/mdms/inbox (must be writable by uid ${TARGET_UID})." >&2 + exit 1 +fi + +echo "samba-canon ready: smbd $(smbd --version | head -1), user=canon uid=${TARGET_UID} gid=${TARGET_GID}" + +exec smbd --foreground --no-process-group --log-stdout diff --git a/infra/samba-canon/smb.conf b/infra/samba-canon/smb.conf new file mode 100644 index 0000000..fce2a6e --- /dev/null +++ b/infra/samba-canon/smb.conf @@ -0,0 +1,49 @@ +[global] + workgroup = WORKGROUP + server string = Canon SMB bridge + netbios name = MDOCK-CANON + security = user + map to guest = Never + log file = /var/log/samba/log.%m + log level = 1 + max log size = 1000 + + # Old-school SMB1 + NTLMv1 — required by Canon MAXIFY MB5100 (2014, SMB1 only). + # LAN-only, no encryption — see infra/samba-canon/README.md. + server min protocol = NT1 + server max protocol = SMB3 + client min protocol = NT1 + client max protocol = SMB3 + ntlm auth = ntlmv1-permitted + lanman auth = yes + client lanman auth = yes + client plaintext auth = no + server signing = disabled + smb encrypt = disabled + server multi channel support = no + + # Performance / sanity for a single-share LAN bridge + load printers = no + printing = bsd + printcap name = /dev/null + disable spoolss = yes + dns proxy = no + usershare allow guests = no + panic action = /bin/sh -c 'echo "smbd panic at $(date)" >&2' + +[inbox] + comment = Canon scan inbox (writes to /mnt/mdms/inbox on TrueNAS via NFS) + path = /inbox + browseable = yes + writable = yes + read only = no + guest ok = no + valid users = canon + force user = canon + force group = canon + create mask = 0664 + directory mask = 0775 + force create mode = 0664 + force directory mode = 0775 + # The Canon writes single PDFs; vfs full_audit is overkill. + vfs objects =