chore: initial commit — spinout from m/otto
Spun out mDMS strategy + tooling from m/otto into its own repo on 2026-05-15. Migrated: - docs/strategy.md (was: m/otto:docs/mdms-strategy.md) - infra/paperless/ (config + audit + migrate scripts) - infra/samba-canon/ (Canon MB5100 SMB1 bridge container) History in m/otto: issues #429–#438. Going forward, all mDMS issues file here. Sibling m/paperless (separate repo) remains the bare Docker Compose for Paperless-ngx itself.
This commit is contained in:
36
CLAUDE.md
Normal file
36
CLAUDE.md
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
# mDMS
|
||||||
|
|
||||||
|
Document-management strategy + tooling: Paperless-ngx + Paperless-AI + Canon SMB bridge.
|
||||||
|
|
||||||
|
**Memory group_id:** `mdms` (new — formerly `otto` for these issues)
|
||||||
|
|
||||||
|
**Project type:** infrastructure + AI-classification pipeline. No web frontend, no application server. Deploys live on mDock; data on mTrueNAS.
|
||||||
|
|
||||||
|
## Spinout context
|
||||||
|
|
||||||
|
Migrated out of `m/otto` on 2026-05-15. Strategy doc + paperless-AI tooling + samba-canon bridge moved here. The original implementation history is in `m/otto` issues #429–#438. Going forward, file all mDMS issues here.
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
- `docs/strategy.md` — the bible. Taxonomy (10 types, 13 tags), filename conventions, OCR-pipeline decisions. Read first.
|
||||||
|
- `infra/paperless/` — AI-classification layer config: `SYSTEM_PROMPT.txt`, audit log, `migrate_types.py`.
|
||||||
|
- `infra/samba-canon/` — host-network Samba 4.10 SMB1 bridge for Canon MB5100.
|
||||||
|
|
||||||
|
## Sibling repo
|
||||||
|
|
||||||
|
`m/paperless` — separate, bare Docker Compose for Paperless-ngx itself. `~/paperless/` on mDock is its checkout. Keep that for deployment; this repo is for *strategy* + *AI/classification* + *Canon bridge*.
|
||||||
|
|
||||||
|
## Live deployment touchpoints
|
||||||
|
|
||||||
|
- `mdock:8777` — Paperless-ngx (managed via `~/paperless/`, i.e. `m/paperless` repo)
|
||||||
|
- `mdock:3077` — Paperless-AI (config in this repo: `infra/paperless/`)
|
||||||
|
- mDock `~/samba-canon/` — Canon SMB bridge (source in this repo: `infra/samba-canon/`)
|
||||||
|
- mDock `~/mdms-mover/` — Age-gated inbox mover (source still in `m/otto` per issue #438, to be migrated in)
|
||||||
|
|
||||||
|
When code in this repo and the live deployment drift, fix in the repo first, then deploy.
|
||||||
|
|
||||||
|
## Conventions
|
||||||
|
|
||||||
|
- Audit JSON: `infra/paperless/<topic>_<isotimestamp>.json` — keep them in-repo as historical record (migrate_types_audit_*.json etc.)
|
||||||
|
- Issues filed here, not in `m/otto`.
|
||||||
|
- Per global CLAUDE.md: Always `--netrc-file ~/.netrc-mai` for Gitea API as mAi.
|
||||||
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
# mDMS
|
||||||
|
|
||||||
|
m's document management — Paperless-ngx + AI-classification pipeline, Canon scanner SMB bridge, strategy + tooling.
|
||||||
|
|
||||||
|
Spun out from `m/otto` on 2026-05-15 — issues #429–#438 in `m/otto` are the
|
||||||
|
provenance trail. Going forward, all mDMS work lives here.
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
mDMS/
|
||||||
|
├── docs/
|
||||||
|
│ └── strategy.md # Taxonomy, layout, conventions (the bible)
|
||||||
|
├── infra/
|
||||||
|
│ ├── paperless/ # Paperless-AI config: SYSTEM_PROMPT, audit scripts,
|
||||||
|
│ │ # migrate_types.py, deploy docker-compose
|
||||||
|
│ └── samba-canon/ # SMB1 bridge container for Canon MB5100 scanner
|
||||||
|
│ # (host-network + nmbd, SMB1+NTLMv1 for old printer)
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### Paperless-ngx (deployment)
|
||||||
|
|
||||||
|
Compose lives in **`m/paperless`** (separate repo). That repo is the
|
||||||
|
deployment artifact — `~/paperless/` on mDock is its checkout. This repo
|
||||||
|
(`m/mDMS`) tracks the *AI classification* layer that sits on top of
|
||||||
|
Paperless-ngx (`infra/paperless/SYSTEM_PROMPT.txt`, the type/tag/
|
||||||
|
correspondent migration scripts, the audit pipeline).
|
||||||
|
|
||||||
|
### Paperless-AI
|
||||||
|
|
||||||
|
Runs on `mdock:3077` in front of Paperless-ngx (`mdock:8777`). Classifies
|
||||||
|
each ingested document into one of the 10 canonical types and ≤2 of the
|
||||||
|
13 canonical tags. The system prompt + the migration scripts in
|
||||||
|
`infra/paperless/` are the source of truth — keep this repo and the
|
||||||
|
live Paperless-AI `aidata/.env` in sync.
|
||||||
|
|
||||||
|
### Canon SMB bridge
|
||||||
|
|
||||||
|
`infra/samba-canon/` is the host-network Samba 4.10 container on mDock
|
||||||
|
that the Canon MB5100 scans to. Files land in `/mnt/mdms/inbox/` (NFS
|
||||||
|
from mTrueNAS) and Paperless polls every 60s. The two-stage inbox
|
||||||
|
(staging dir + age-gated mover) lives separately under `~/mdms-mover/`
|
||||||
|
on mDock — see `m/otto` issue #438.
|
||||||
|
|
||||||
|
## Data
|
||||||
|
|
||||||
|
NFS-mounted from mTrueNAS: `/mnt/mPool/mdms/` → `/mnt/mdms/` on all
|
||||||
|
consumers. Layout:
|
||||||
|
|
||||||
|
```
|
||||||
|
/mnt/mPool/mdms/
|
||||||
|
├── inbox/ # SMB scanner target (Canon writes here)
|
||||||
|
├── toprocess/ # Age-gated staging → Paperless consumes here
|
||||||
|
├── paperless/ # Paperless storage (post-ingest)
|
||||||
|
├── archive/ # Long-term archive
|
||||||
|
├── templates/ # Document templates
|
||||||
|
└── export/ # Manual exports
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reference
|
||||||
|
|
||||||
|
- `docs/strategy.md` — full strategy, taxonomy decisions, type/tag rationale
|
||||||
|
- `m/otto` issues #429–#438 — original implementation history
|
||||||
|
- `m/paperless` — the bare Paperless-ngx Docker Compose setup
|
||||||
288
docs/strategy.md
Normal file
288
docs/strategy.md
Normal file
@@ -0,0 +1,288 @@
|
|||||||
|
# mDMS: Dokumentenmanagement-Strategie
|
||||||
|
|
||||||
|
## Aktueller Stand (nach Cleanup 2026-04-06)
|
||||||
|
|
||||||
|
### Paperless-ngx (mDock)
|
||||||
|
- **129 Dokumente** (PDFs), Storage Path aktiv
|
||||||
|
- **41 Correspondents** — bereinigt (OCR-Duplikate gemergt, Müll entfernt)
|
||||||
|
- **13 Document Types** — Rechnung, Vertrag, Bescheid, Bescheinigung, Brief, Mitteilung, Abrechnung, Protokoll, Urkunde, Vollmacht, Gutachten, Angebot, Medizinisch
|
||||||
|
- **16 Tags** — hierarchisch: Kategorie (Steuer, Versicherung, Gesundheit, Wohnung, Arbeit, Finanzen, Erbschaft, Gewährleistung, Anleitung) + Status (offen, wichtig, Frist) + Kontext (Windscheid33, Paul)
|
||||||
|
- **1 Storage Path**: `{created_year}/{document_type}/{created} - {correspondent} - {title}`
|
||||||
|
- Dateien strukturiert: `2024/Rechnung/2024-03-15 - DAK - Beitragsrechnung.pdf`
|
||||||
|
- API-User: `mAi`
|
||||||
|
- Docker Compose: `~/paperless/` auf mDock, NFS-Mount `/mnt/paperless` von TrueNAS (`mPool/paperless`)
|
||||||
|
|
||||||
|
### Was bereinigt wurde
|
||||||
|
- 68 Webp-Preview-Dokumente gelöscht (keine Originale, nur schlechte Vorschaubilder)
|
||||||
|
- 51 → 41 Correspondents (OCR-Duplikate gemergt: Hogan Lovells, Matthias Siebels, Ammerländer, Schubert, eprimo, Helios, Paul Siebels, Versorgungswerk etc.)
|
||||||
|
- 39 → 13 Document Types (Merge-Mapping umgesetzt)
|
||||||
|
- 172 → 16 Tags (Noise gelöscht, Kategorie-Mapping vor Löschung durchgeführt)
|
||||||
|
- 13 kaputte SynoResource-Dateien aus Consume gelöscht
|
||||||
|
- 126 orphaned flat-PDFs aus Originals gelöscht
|
||||||
|
- 43 Dokumententitel bereinigt (Nummern → beschreibende Titel)
|
||||||
|
- 5 Birthday-Datumsfehler korrigiert (1987-02-22 → korrekte Dokumentdaten)
|
||||||
|
|
||||||
|
### mDocs (Gitea-Repo m/mDocs) — MIGRATION PENDING
|
||||||
|
- **72 Dateien**, 60 MB (Steuer, Versicherungen, Windscheid33)
|
||||||
|
- Wird in Paperless inbox migriert, Repo danach löschen
|
||||||
|
|
||||||
|
### TrueNAS (mtruenas)
|
||||||
|
- Dataset `mPool/paperless` existiert bereits
|
||||||
|
- NFS-Export nach mDock (192.168.178.0/24)
|
||||||
|
- SMB-Share `mStash` als Referenz für mdms-Share
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Entscheidungen
|
||||||
|
|
||||||
|
### 1. Storage Path Format
|
||||||
|
|
||||||
|
**Format: `{created_year}/{document_type}/{created} - {correspondent} - {title}.pdf`** ✓ Bestätigt
|
||||||
|
|
||||||
|
Beispiele:
|
||||||
|
```
|
||||||
|
2024/Rechnung/2024-03-15 - DAK - Beitragsrechnung Q1.pdf
|
||||||
|
2024/Bescheid/2024-01-20 - Finanzamt - Grundsteuerbescheid.pdf
|
||||||
|
2023/Vertrag/2023-06-01 - Vodafone - GigaTV Vertragsverlängerung.pdf
|
||||||
|
2025/Abrechnung/2025-01-31 - Hogan Lovells - Gehaltsabrechnung Januar.pdf
|
||||||
|
```
|
||||||
|
|
||||||
|
**Warum dieses Format:**
|
||||||
|
- **Jahr als Top-Level**: Chronologisches Browsen, ganzen Jahrgang für Steuerberater kopierbar
|
||||||
|
- **Typ als zweite Ebene**: "Zeig mir alle Rechnungen 2024" = `2024/Rechnung/`
|
||||||
|
- **Datum + Correspondent + Titel im Dateinamen**: Sortierbar, durchsuchbar, kontextreich
|
||||||
|
- **Max 2 Ordnerebenen**: Nicht zu tief, Finder/Explorer-freundlich
|
||||||
|
- **Navigierbar ohne Paperless**: Reiner Dateibrowser funktioniert
|
||||||
|
|
||||||
|
**Verworfene Alternativen:**
|
||||||
|
- `{correspondent}/{year}/...` — zu viele sparse Ordner, schlecht für zeitliche Navigation
|
||||||
|
- `{year}-{month}/...` — zu granular, monatliche Ordner für oft nur 1-2 Dokumente
|
||||||
|
- Flach: `{created}-{correspondent}-{title}.pdf` — bei 500+ Dokumenten unbrauchbar
|
||||||
|
|
||||||
|
### 2. Dataset-Struktur: mPool/mdms
|
||||||
|
|
||||||
|
```
|
||||||
|
/mnt/mPool/mdms/
|
||||||
|
├── paperless/ # Paperless storage (originals, archive, thumbnails)
|
||||||
|
│ ├── documents/
|
||||||
|
│ │ ├── originals/ # Originaldateien
|
||||||
|
│ │ └── archive/ # OCR-Versionen
|
||||||
|
│ └── ...
|
||||||
|
├── inbox/ # Paperless consume — Auto-Import
|
||||||
|
│ # mScan-App, Drag-and-Drop, SFTP
|
||||||
|
├── templates/ # Vertragsvorlagen, Formulare, Muster
|
||||||
|
│ # Nicht in Paperless — statische Referenzdokumente
|
||||||
|
├── archive/ # Dokumente die nicht in Paperless passen:
|
||||||
|
│ # Große Dateien (CAD, Pläne), Sammlungen, Binaries
|
||||||
|
└── export/ # Paperless-Exporte, Backups, Snapshots
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Dokumenten-Routing
|
||||||
|
|
||||||
|
| Dokument | Ziel | Begründung |
|
||||||
|
|----------|------|------------|
|
||||||
|
| Rechnungen, Bescheide, Briefe | Paperless (inbox/) | OCR + AI-Klassifikation + Suche |
|
||||||
|
| Verträge, Urkunden | Paperless | Langzeitarchiv mit Volltextsuche |
|
||||||
|
| Steuerunterlagen | Paperless + Tag "Steuer" | Filterbar für Steuerberater-Export |
|
||||||
|
| Gehaltsabrechnungen | Paperless + Tag "Arbeit" | Chronologisch abrufbar |
|
||||||
|
| Arztbriefe, Befunde | Paperless + Tag "Gesundheit" | Suchbar, datiert |
|
||||||
|
| Phone-Scans (mScan) | inbox/ → Paperless auto-import | Scannen → fertig |
|
||||||
|
| Vertragsvorlagen, Formulare | templates/ | Keine OCR nötig, statische Referenz |
|
||||||
|
| Baupläne, CAD, große Dateien | archive/ | Zu groß/speziell für Paperless |
|
||||||
|
| Fotos von Dokumenten | Paperless | OCR funktioniert auch auf Fotos |
|
||||||
|
|
||||||
|
**Nicht in mDMS:**
|
||||||
|
- Fotos generell → Immich
|
||||||
|
- Bücher, eBooks → Calibre (mCalibre)
|
||||||
|
- Arbeitsrechtliche Dokumente (HL) → mWork-Vault (Obsidian, nicht mDMS)
|
||||||
|
|
||||||
|
### 4. Paperless Taxonomy — Aufräumen
|
||||||
|
|
||||||
|
#### Document Types (39 → 15)
|
||||||
|
|
||||||
|
Reduziert auf sinnvolle, stabile Kategorien:
|
||||||
|
|
||||||
|
| Behalten | Zusammenführen aus |
|
||||||
|
|----------|-------------------|
|
||||||
|
| **Rechnung** | Rechnung, Invoice, Beitragsrechnung |
|
||||||
|
| **Vertrag** | (neu — für Verträge, Verlängerungen) |
|
||||||
|
| **Bescheid** | Bescheid, Beitragsbescheid, Versicherungsbescheid |
|
||||||
|
| **Bescheinigung** | Bescheinigung, Lohnsteuerbescheinigung, Spendenbescheinigung |
|
||||||
|
| **Brief** | Brief, Anschreiben, Korrespondenz |
|
||||||
|
| **Mitteilung** | Mitteilung, Benachrichtigung, Information, Erinnerung |
|
||||||
|
| **Abrechnung** | Abrechnung, Entgeltabrechnung |
|
||||||
|
| **Protokoll** | Protokoll |
|
||||||
|
| **Urkunde** | Urkunde |
|
||||||
|
| **Vollmacht** | Vollmacht |
|
||||||
|
| **Gutachten** | Gutachten, Befund |
|
||||||
|
| **Angebot** | Angebot |
|
||||||
|
| **Energieausweis** | Energieausweis |
|
||||||
|
| **Schadenmeldung** | Schadenmeldung, Schadenanzeige |
|
||||||
|
| **Medizinisch** | Arbeitsunfähigkeitsbescheinigung, Aufklärungsbogen |
|
||||||
|
|
||||||
|
Entfernen: Empfehlung, Preisanpassungsschreiben, Kündigungsbestätigung, Eintragungsbekanntmachung, Auftragsbestätigung, Testament (→ Tag), Einladung (→ Brief)
|
||||||
|
|
||||||
|
#### Tags (172 → ~25 manuell kuratierte)
|
||||||
|
|
||||||
|
Die meisten Auto-Tags sind Noise. Ziel: wenige, stabile Kategorie-Tags + manuelle Pflege.
|
||||||
|
|
||||||
|
**Kategorie-Tags (Pflicht, einer pro Dokument):**
|
||||||
|
| Tag | Für |
|
||||||
|
|-----|-----|
|
||||||
|
| `Steuer` | Alles steuerrelevante |
|
||||||
|
| `Versicherung` | Policen, Schäden, Beiträge |
|
||||||
|
| `Gesundheit` | Arzt, Krankenhaus, Krankenkasse |
|
||||||
|
| `Wohnung` | Miete, Eigentum, Nebenkosten, WEG |
|
||||||
|
| `Arbeit` | Gehalt, Arbeitgeber, Kammer |
|
||||||
|
| `Finanzen` | Bank, Kredit, Altersvorsorge |
|
||||||
|
| `Erbschaft` | Testament, Nachlassangelegenheiten |
|
||||||
|
| `Gewährleistung` | Kaufbelege mit Garantie, Reklamationen |
|
||||||
|
| `Anleitung` | Bedienungsanleitungen, Handbücher, Datenblätter |
|
||||||
|
|
||||||
|
**Aktions-Tags (optional):**
|
||||||
|
| Tag | Bedeutung |
|
||||||
|
|-----|-----------|
|
||||||
|
| `wichtig` | Aufbewahrungspflichtig, Schlüsseldokument |
|
||||||
|
| `Frist` | Hat eine Frist — regelmäßig prüfen |
|
||||||
|
| `offen` | Noch Handlung erforderlich |
|
||||||
|
|
||||||
|
**Kontext-Tags (sparsam, bei Bedarf):**
|
||||||
|
| Tag | Für |
|
||||||
|
|-----|-----|
|
||||||
|
| `Windscheid33` | Immobilie Windscheidstr. 33 |
|
||||||
|
| `Paul` | Dokumente bzgl. Paul Siebels |
|
||||||
|
|
||||||
|
**Löschen:**
|
||||||
|
- Jahres-Tags ("2022", "2025") — redundant mit created-Datum
|
||||||
|
- Personen-Tags ("Matthias Siebels") — gehört als Correspondent
|
||||||
|
- Ultra-granulare Tags ("Finger", "Hand", "Shimano", "Oral-B") — kein Nutzen
|
||||||
|
- Duplikate ("Rechtsanwalt" + "Rechtsanwälte" + "Rechtsanwaltschaft")
|
||||||
|
|
||||||
|
#### Correspondents (51 → ~30)
|
||||||
|
|
||||||
|
OCR-Duplikate zusammenführen:
|
||||||
|
- "Hogan Lovells International LLP" + "Hogan Lovells lnternational LLP" → **Hogan Lovells**
|
||||||
|
- "HELIÜS Klinikurn Duisburg" + "Helios Klinikum Duisburg" → **Helios Klinikum Duisburg**
|
||||||
|
- "Herr Matthias Siebels" + "Herrn Matthias Siebels" + "Matthias Siebels" + "Herrn Rechtsanwalt Matthias Siebels" → **Matthias Siebels** (eigene Dokumente)
|
||||||
|
- "Ammerländer Versicherung VVaG" + "Ammerländer Versicherung WaG" → **Ammerländer Versicherung**
|
||||||
|
- "SCHUBERT GmbH" + "Schubert GmbH Haus- und Grundbesitzverwaltung" → **Schubert Hausverwaltung**
|
||||||
|
- "Dr. figegeberH lcankenkas*" → identifizieren oder löschen (OCR-Müll)
|
||||||
|
- "Dr/Heikö Gemmel" → **Dr. Heiko Gemmel**
|
||||||
|
- "eprimo CmbH" + "eprimo GmbH" → **eprimo**
|
||||||
|
- "lndula Shopsystem GmbH" → **Indula Shopsystem**
|
||||||
|
|
||||||
|
### 5. SMB-Share
|
||||||
|
|
||||||
|
**Ja — mdms als SMB-Share wie mStash.**
|
||||||
|
|
||||||
|
Konfiguration auf TrueNAS:
|
||||||
|
- Share-Name: `mdms`
|
||||||
|
- Dataset: `mPool/mdms`
|
||||||
|
- User: `m` (wie mStash)
|
||||||
|
- Mount auf mBreeze/mPebble: `~/mDMS` (LaunchAgent, analog zu mStash)
|
||||||
|
|
||||||
|
Nutzen:
|
||||||
|
- `~/mDMS/inbox/` für Drag-and-Drop-Import (Paperless consumed automatisch)
|
||||||
|
- `~/mDMS/templates/` für schnellen Zugriff auf Vorlagen
|
||||||
|
- `~/mDMS/paperless/documents/originals/` für Dateibrowser-Navigation (via Storage Path)
|
||||||
|
- `~/mDMS/archive/` für große Dateien
|
||||||
|
|
||||||
|
### 6. Vertrauliche Dokumente
|
||||||
|
|
||||||
|
**Kein separates Verschlüsselungssystem nötig.** ✓ Bestätigt
|
||||||
|
- Alles läuft auf HomeServer (mforge/mtruenas), nur via Tailscale erreichbar
|
||||||
|
- SMB mit User-Auth + Paperless-Login reichen aus
|
||||||
|
|
||||||
|
### 7. Obsidian-Integration
|
||||||
|
|
||||||
|
Der Storage Path soll als Teil eines Obsidian-Vaults nutzbar sein. Das bedeutet:
|
||||||
|
- `mdms/paperless/documents/originals/` (oder `archive/`) via SMB als Vault-Ordner einbinden
|
||||||
|
- Obsidian zeigt die Ordnerstruktur (`2024/Rechnung/...`) direkt im Dateibrowser
|
||||||
|
- PDFs sind in Obsidian inline-viewbar und verlinkbar (`![[2024-03-15 - DAK - Beitragsrechnung.pdf]]`)
|
||||||
|
- Keine Sonderzeichen in Dateinamen die Obsidian-Links brechen (Spaces sind ok)
|
||||||
|
|
||||||
|
**Umsetzung:**
|
||||||
|
- Option A: Symlink `~/m2/mDMS/` → `~/mDMS/paperless/documents/originals/` im Obsidian-Vault
|
||||||
|
- Option B: Separater Mini-Vault nur für Dokumente
|
||||||
|
- Option C: mdms als Unterordner im Hauptvault (m2)
|
||||||
|
|
||||||
|
Empfehlung: **Option A (Symlink)** — kein Daten-Overhead, Vault bleibt schlank, Dokumente sind trotzdem verlinkbar. Braucht nur einen Symlink pro Maschine.
|
||||||
|
|
||||||
|
### 8. E-Mail-Inbox
|
||||||
|
|
||||||
|
**docs@msbls.de** — Alias auf mail@msbls.de (Hostinger).
|
||||||
|
|
||||||
|
Paperless pollt mail@msbls.de via IMAP und konsumiert Anhänge aus Emails an docs@msbls.de:
|
||||||
|
- IMAP: `imap.hostinger.com:993` (SSL/TLS), User: `mail@msbls.de`
|
||||||
|
- Mail-Regel: Filter `To: docs@msbls.de`, nur Attachments, Action: als gelesen markieren
|
||||||
|
- Correspondent wird automatisch vom Absender übernommen
|
||||||
|
- Titel vom Dateinamen
|
||||||
|
|
||||||
|
**Workflow:** Dokument als PDF an docs@msbls.de weiterleiten → Paperless importiert automatisch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Paperless-AI Konfiguration
|
||||||
|
|
||||||
|
Paperless-AI (Port 3077) soll die Klassifikation übernehmen. Konfigurieren mit:
|
||||||
|
- **Auto-assign correspondent** basierend auf OCR-Text (Absender-Erkennung)
|
||||||
|
- **Auto-assign document type** aus den 15 reduzierten Typen
|
||||||
|
- **Auto-assign 1-2 Kategorie-Tags** aus der Kurzliste
|
||||||
|
- **Nicht**: Auto-generierte Freitext-Tags (das erzeugt das aktuelle Chaos)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration: Schritt für Schritt
|
||||||
|
|
||||||
|
### Phase 1: TrueNAS Setup ✓ DONE
|
||||||
|
1. ~~Dataset `mPool/mdms` erstellt~~ (LZ4, 1.24 TiB frei)
|
||||||
|
2. ~~Unterordner angelegt~~ (paperless, inbox, templates, archive, export)
|
||||||
|
3. ~~NFS-Export~~ (id:8, 192.168.178.0/24), Mount auf mDock als `/mnt/mdms`
|
||||||
|
4. ~~SMB-Share `mDMS`~~ (id:3, User `m`)
|
||||||
|
|
||||||
|
### Phase 2: Paperless Migration ✓ DONE
|
||||||
|
5. ~~Paperless auf mDock gestoppt~~
|
||||||
|
6. ~~Media, data, ai kopiert~~; pgdata als lokales Docker Volume (NFS-Ownership inkompatibel mit Postgres uid 999)
|
||||||
|
7. ~~consume → inbox kopiert~~
|
||||||
|
8. ~~SynoResource-Dateien gelöscht~~
|
||||||
|
9. ~~NFS-Mount `/mnt/mdms` auf mDock~~ (fstab via Proxmox agent)
|
||||||
|
10. ~~Docker Compose aktualisiert~~ (`~/paperless/docker-compose.yml`)
|
||||||
|
11. ~~Storage Path konfiguriert~~
|
||||||
|
12. ~~Paperless verifiziert~~ — 129 Docs, alle Metadaten intakt
|
||||||
|
|
||||||
|
**Hinweis:** pgdata lebt als Docker Volume `paperless_pgdata` auf mDock (nicht auf NFS). DB-Backup über `pg_dump` in `mdms/export/` planen.
|
||||||
|
|
||||||
|
### Phase 3: Cleanup ✓ DONE
|
||||||
|
13. ~~Paperless Correspondents zusammenführen~~ → 51 → 41
|
||||||
|
14. ~~Document Types reduzieren~~ → 39 → 13
|
||||||
|
15. ~~Tags aufräumen~~ → 172 → 16 (hierarchisch: Kategorie + Status + Kontext)
|
||||||
|
16. Paperless-AI mit neuer Taxonomy konfigurieren (TODO)
|
||||||
|
|
||||||
|
### Phase 4: mDocs Migration
|
||||||
|
17. mDocs-Repo klonen, alle PDFs nach mdms/inbox/ kopieren
|
||||||
|
18. Paperless konsumiert und klassifiziert automatisch
|
||||||
|
19. Manuell verifizieren: Correspondents, Types, Tags korrekt?
|
||||||
|
20. mDocs-Repo auf Gitea löschen
|
||||||
|
|
||||||
|
### Phase 5: Client-Setup
|
||||||
|
21. SMB-Mount auf mBreeze: `~/mDMS` (LaunchAgent wie mStash)
|
||||||
|
22. SMB-Mount auf mPebble: `~/mDMS`
|
||||||
|
23. mScan-App auf mdms/inbox/ konfigurieren (falls SFTP/SMB-Upload möglich)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Offene Punkte
|
||||||
|
|
||||||
|
- [x] Paperless Admin-Credentials — mAi-User auf Paperless angelegt
|
||||||
|
- [x] Paperless Cleanup — Correspondents, Types, Tags bereinigt
|
||||||
|
- [x] Storage Path konfiguriert und Dateien umbenannt
|
||||||
|
- [x] Webp-Previews gelöscht, SynoResource-Müll bereinigt
|
||||||
|
- [x] TrueNAS Dataset `mPool/mdms` erstellt (NFS id:8, SMB `mDMS` id:3)
|
||||||
|
- [x] Paperless media auf mdms/paperless umgestellt (Docker Compose aktualisiert)
|
||||||
|
- [x] SMB-Share `mDMS` eingerichtet auf TrueNAS
|
||||||
|
- [x] mDocs-Migration: 69 PDFs in Paperless inbox, consumption läuft
|
||||||
|
- [x] **docs@msbls.de** — E-Mail-Inbox für Paperless (IMAP-Polling, Alias auf mail@msbls.de, Regel filtert auf To: docs@msbls.de)
|
||||||
|
- [ ] Paperless-AI mit neuer Taxonomy konfigurieren
|
||||||
|
- [ ] Regelmäßiger Export/Backup-Job (Paperless → mdms/export/)
|
||||||
|
- [ ] `m doc` CLI-Subcommand für Paperless-Zugriff? (search, list, tag)
|
||||||
|
- [ ] Obsidian-Vault Symlink-Setup auf mBreeze/mPebble
|
||||||
14
infra/paperless/Dockerfile
Normal file
14
infra/paperless/Dockerfile
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# Thin overlay on clusterzx/paperless-ai:3.0.9 — same digest as
|
||||||
|
# the :latest tag pulled on 2026-04-06, but pinned so future image
|
||||||
|
# refreshes do not silently wipe the type-restriction patches.
|
||||||
|
#
|
||||||
|
# Patch 1: routes/setup.js — restrict-existing-document-types on
|
||||||
|
# the manual processing route (already applied previously
|
||||||
|
# by docker cp, but volatile across container recreation).
|
||||||
|
# Patch 2: server.js — same restriction on the scheduled-scan
|
||||||
|
# loop. Without this, new document types kept appearing
|
||||||
|
# even with RESTRICT_TO_EXISTING_DOCUMENT_TYPES=yes.
|
||||||
|
FROM clusterzx/paperless-ai:3.0.9
|
||||||
|
|
||||||
|
COPY setup.js.patched /app/routes/setup.js
|
||||||
|
COPY server.js.patched /app/server.js
|
||||||
24
infra/paperless/README.md
Normal file
24
infra/paperless/README.md
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
# paperless infra (snapshot)
|
||||||
|
|
||||||
|
These files are a **traceable copy** of what lives in `~/paperless/` on
|
||||||
|
mDock. The live source of truth is on mDock — this directory exists so
|
||||||
|
the configuration is git-readable for the next shift and for audits.
|
||||||
|
|
||||||
|
If you change the live config on mDock, sync the change here in the same
|
||||||
|
commit. If you change the files here, deploy by:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scp Dockerfile docker-compose.yml mdock:/home/m/paperless/build/Dockerfile # and so on
|
||||||
|
ssh mdock 'cd /home/m/paperless && docker compose up -d --build'
|
||||||
|
```
|
||||||
|
|
||||||
|
The two patched JS files (`setup.js.patched`, `server.js.patched`) live
|
||||||
|
only on mDock in `~/paperless/build/` — they're large and don't belong
|
||||||
|
in the repo. Hashes:
|
||||||
|
|
||||||
|
| File | mDock path | md5 |
|
||||||
|
|---|---|---|
|
||||||
|
| setup.js.patched | ~/paperless/build/setup.js.patched | `04cb5fbfaed13a5f25612af0b79dd90c` |
|
||||||
|
| server.js.patched | ~/paperless/build/server.js.patched | `eadcbb86048127f2c80632ae77bbc2a0` |
|
||||||
|
|
||||||
|
See `docs/research/issue-429-paperless-pipeline.md` for the why.
|
||||||
24
infra/paperless/SYSTEM_PROMPT.txt
Normal file
24
infra/paperless/SYSTEM_PROMPT.txt
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
Du klassifizierst deutsche Dokumente fuer ein persoenliches Dokumentenmanagementsystem.
|
||||||
|
|
||||||
|
Erlaubte Document Types (NUR diese verwenden, keine neuen erfinden):
|
||||||
|
- Invoice — Rechnungen, Abrechnungen, Mahnschreiben, Kontoauszuege, Lohnsteuerbescheinigung, Umsatzsteuer-Voranmeldung, Steuererklaerung, Kostenrechnungen
|
||||||
|
- Contract — Vertraege, Versicherungsscheine, Kauf-/Kreditvertraege, unterschriebene Angebote, AGB
|
||||||
|
- Information — Behoerden- und Versicherer-Anschreiben, Bescheinigungen, Mitteilungen, Verwaltungsakte, medizinische Befunde, Berichte, Berechnungen, einseitige Informationen
|
||||||
|
- Personal Correspondence — Briefe von identifizierbaren Privatpersonen. Stammt der Brief von einer Institution, waehle stattdessen Information.
|
||||||
|
- Vollmacht — Vollmachten
|
||||||
|
- Urkunde — notarielle Urkunden
|
||||||
|
- Steuerbescheid — Steuerbescheide vom Finanzamt
|
||||||
|
- Anleitung — Bedienungsanleitungen, Datenblaetter, Manuals
|
||||||
|
- Protokoll — Sitzungs- und WEG-Protokolle
|
||||||
|
- Formular — Blanko-Formulare und Antraege
|
||||||
|
|
||||||
|
Im Zweifel waehle Information. Erfinde NIEMALS neue Document Types.
|
||||||
|
|
||||||
|
Erlaubte Tags (NUR diese verwenden, keine neuen erfinden):
|
||||||
|
Anleitung, Arbeit, Erbschaft, Finanzen, Frist, Gesundheit, Gewaehrleistung, Paul, Steuer, Versicherung, Windscheid33, Wohnung, offen, wichtig
|
||||||
|
|
||||||
|
Bei medizinischen Dokumenten Tag Gesundheit setzen.
|
||||||
|
Bei steuerrelevanten Dokumenten Tag Steuer setzen.
|
||||||
|
Bei Dokumenten mit Frist Tag Frist setzen.
|
||||||
|
|
||||||
|
Correspondents: Verwende den vollen offiziellen Namen der Organisation oder Person (z.B. "DAK-Gesundheit" nicht "DAK-Gesundheit Postzentrum, 22778 Hamburg"). Keine Adressen im Namen. Pruefe ob der Correspondent schon existiert bevor du einen neuen anlegst.
|
||||||
52
infra/paperless/docker-compose.yml
Normal file
52
infra/paperless/docker-compose.yml
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
services:
|
||||||
|
broker:
|
||||||
|
image: docker.io/library/redis:8
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- redisdata:/data
|
||||||
|
|
||||||
|
db:
|
||||||
|
image: docker.io/library/postgres:16
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- pgdata:/var/lib/postgresql/data
|
||||||
|
environment:
|
||||||
|
POSTGRES_DB: paperless
|
||||||
|
POSTGRES_USER: paperless
|
||||||
|
POSTGRES_PASSWORD: paperless
|
||||||
|
|
||||||
|
webserver:
|
||||||
|
image: ghcr.io/paperless-ngx/paperless-ngx:2.20.6
|
||||||
|
restart: unless-stopped
|
||||||
|
depends_on:
|
||||||
|
- db
|
||||||
|
- broker
|
||||||
|
ports:
|
||||||
|
- 8777:8000
|
||||||
|
volumes:
|
||||||
|
- /mnt/mdms/paperless/data:/usr/src/paperless/data
|
||||||
|
- /mnt/mdms/paperless/media:/usr/src/paperless/media
|
||||||
|
- /mnt/mdms/export:/usr/src/paperless/export
|
||||||
|
- /mnt/mdms/inbox:/usr/src/paperless/consume
|
||||||
|
environment:
|
||||||
|
PAPERLESS_REDIS: redis://broker:6379
|
||||||
|
PAPERLESS_DBHOST: db
|
||||||
|
PAPERLESS_TIME_ZONE: Europe/Berlin
|
||||||
|
PAPERLESS_OCR_LANGUAGE: deu+eng
|
||||||
|
PAPERLESS_CONSUMER_POLLING: 60
|
||||||
|
PAPERLESS_CONSUMER_RECURSIVE: "true"
|
||||||
|
|
||||||
|
paperless-ai:
|
||||||
|
build: ./build
|
||||||
|
image: mdock/paperless-ai:3.0.9-restrict-patch
|
||||||
|
container_name: paperless-ai
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- 3077:3000
|
||||||
|
volumes:
|
||||||
|
- aidata:/app/data
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
redisdata:
|
||||||
|
pgdata:
|
||||||
|
aidata:
|
||||||
368
infra/paperless/migrate-apply-2026-05-13.log
Normal file
368
infra/paperless/migrate-apply-2026-05-13.log
Normal file
@@ -0,0 +1,368 @@
|
|||||||
|
/tmp/migrate_types.py:240: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
|
||||||
|
audit_path = f"/tmp/migrate_types_audit_{datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%S')}.json"
|
||||||
|
/tmp/migrate_types.py:242: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
|
||||||
|
"ts_utc": datetime.datetime.utcnow().isoformat() + "Z",
|
||||||
|
loaded 73 types, 195 docs
|
||||||
|
all 10 target types verified
|
||||||
|
|
||||||
|
=== PLAN ===
|
||||||
|
document moves: 171
|
||||||
|
types to delete (after moves): 63
|
||||||
|
types NOT mapped + nonzero docs (need manual call): 0
|
||||||
|
|
||||||
|
=== MOVES SUMMARY (per target type) ===
|
||||||
|
-> Contract (+23 docs)
|
||||||
|
7 from Vertrag
|
||||||
|
6 from Versicherungsschein
|
||||||
|
1 from agreement
|
||||||
|
1 from contract
|
||||||
|
1 from Finanzierungsangebot
|
||||||
|
1 from Kreditvertrag
|
||||||
|
1 from Loan Application and Agreement
|
||||||
|
1 from Notarial Deed
|
||||||
|
1 from Notarized agreement with amendments
|
||||||
|
1 from Rechtsgeschäft
|
||||||
|
1 from Versicherungsbedingungen
|
||||||
|
1 from Vertragsdokument
|
||||||
|
-> Information (+96 docs)
|
||||||
|
21 from Bescheinigung
|
||||||
|
21 from Brief
|
||||||
|
17 from Bescheid
|
||||||
|
7 from Mitteilung
|
||||||
|
3 from Wohnflaechenberechnung
|
||||||
|
2 from Einladung zur Eigentümerversammlung
|
||||||
|
2 from Leistungsnachweis
|
||||||
|
2 from Medizinisch
|
||||||
|
2 from Steuererklärung
|
||||||
|
1 from Angebot
|
||||||
|
1 from Antrag
|
||||||
|
1 from Behandlungsplan und Risikoaufklärung
|
||||||
|
1 from Beratungsprotokoll
|
||||||
|
1 from Berechnung
|
||||||
|
1 from Bericht
|
||||||
|
1 from Bestätigungsbrief
|
||||||
|
1 from Energy Performance Certificate
|
||||||
|
1 from Erklarung
|
||||||
|
1 from Guidelines
|
||||||
|
1 from Gutachten
|
||||||
|
1 from informational document
|
||||||
|
1 from Informationsschreiben
|
||||||
|
1 from Medical Consent Form
|
||||||
|
1 from medical documentation
|
||||||
|
1 from Rechnungs- und Vertragsinformation
|
||||||
|
1 from Schreiben des Finanzamts
|
||||||
|
1 from Verwaltungsakt
|
||||||
|
1 from Werbung
|
||||||
|
-> Invoice (+52 docs)
|
||||||
|
26 from Rechnung
|
||||||
|
11 from Abrechnung
|
||||||
|
6 from Umsatzsteuer-Voranmeldung
|
||||||
|
4 from Lohnsteuerbescheinigung
|
||||||
|
1 from Kontoauszug
|
||||||
|
1 from Kontoübersicht
|
||||||
|
1 from Kostenabrechnung
|
||||||
|
1 from Kostenvoranmeldung
|
||||||
|
1 from Mahnschreiben
|
||||||
|
|
||||||
|
=== TYPES TO DELETE (after moves) ===
|
||||||
|
id= 4 count= 11 name='Abrechnung'
|
||||||
|
id=160 count= 1 name='agreement'
|
||||||
|
id= 13 count= 1 name='Angebot'
|
||||||
|
id=134 count= 1 name='Antrag'
|
||||||
|
id=141 count= 1 name='Behandlungsplan und Risikoaufklärung'
|
||||||
|
id=129 count= 1 name='Beratungsprotokoll'
|
||||||
|
id=143 count= 1 name='Berechnung'
|
||||||
|
id=148 count= 1 name='Bericht'
|
||||||
|
id= 11 count= 17 name='Bescheid'
|
||||||
|
id= 15 count= 21 name='Bescheinigung'
|
||||||
|
id=151 count= 1 name='Bestätigungsbrief'
|
||||||
|
id= 30 count= 21 name='Brief'
|
||||||
|
id=127 count= 0 name='Consent Form'
|
||||||
|
id=144 count= 1 name='contract'
|
||||||
|
id=120 count= 0 name='Einladung / Vollmacht / Wirtschaftsplan'
|
||||||
|
id=113 count= 2 name='Einladung zur Eigentümerversammlung'
|
||||||
|
id=132 count= 0 name='Einspruchsschreiben'
|
||||||
|
id=158 count= 1 name='Energy Performance Certificate'
|
||||||
|
id=128 count= 1 name='Erklarung'
|
||||||
|
id=156 count= 1 name='Finanzierungsangebot'
|
||||||
|
id=122 count= 0 name='Geldzuwendungsbestätigung'
|
||||||
|
id=157 count= 1 name='Guidelines'
|
||||||
|
id= 27 count= 1 name='Gutachten'
|
||||||
|
id=136 count= 1 name='informational document'
|
||||||
|
id=139 count= 1 name='Informationsschreiben'
|
||||||
|
id=137 count= 0 name='Kaufvertrag'
|
||||||
|
id=118 count= 1 name='Kontoauszug'
|
||||||
|
id=117 count= 1 name='Kontoübersicht'
|
||||||
|
id=145 count= 1 name='Kostenabrechnung'
|
||||||
|
id=121 count= 1 name='Kostenvoranmeldung'
|
||||||
|
id=142 count= 1 name='Kreditvertrag'
|
||||||
|
id=114 count= 0 name='Kundeninformation'
|
||||||
|
id= 83 count= 2 name='Leistungsnachweis'
|
||||||
|
id=135 count= 1 name='Loan Application and Agreement'
|
||||||
|
id= 66 count= 4 name='Lohnsteuerbescheinigung'
|
||||||
|
id=147 count= 1 name='Mahnschreiben'
|
||||||
|
id=140 count= 1 name='Medical Consent Form'
|
||||||
|
id=150 count= 1 name='medical documentation'
|
||||||
|
id= 41 count= 2 name='Medizinisch'
|
||||||
|
id= 12 count= 7 name='Mitteilung'
|
||||||
|
id=161 count= 1 name='Notarial Deed'
|
||||||
|
id=159 count= 1 name='Notarized agreement with amendments'
|
||||||
|
id=133 count= 0 name='Plan'
|
||||||
|
id=131 count= 0 name='policy'
|
||||||
|
id=116 count= 0 name='Questionnaire/Declaration Form'
|
||||||
|
id= 2 count= 26 name='Rechnung'
|
||||||
|
id=149 count= 1 name='Rechnungs- und Vertragsinformation'
|
||||||
|
id=125 count= 0 name='Rechtlicher Vertrag'
|
||||||
|
id=155 count= 1 name='Rechtsgeschäft'
|
||||||
|
id=126 count= 0 name='recommendation'
|
||||||
|
id=152 count= 1 name='Schreiben des Finanzamts'
|
||||||
|
id=119 count= 0 name='Steuerdokument'
|
||||||
|
id=115 count= 2 name='Steuererklärung'
|
||||||
|
id=124 count= 0 name='Tilgungsplan'
|
||||||
|
id= 88 count= 6 name='Umsatzsteuer-Voranmeldung'
|
||||||
|
id=130 count= 1 name='Versicherungsbedingungen'
|
||||||
|
id= 67 count= 6 name='Versicherungsschein'
|
||||||
|
id= 40 count= 7 name='Vertrag'
|
||||||
|
id=153 count= 1 name='Vertragsdokument'
|
||||||
|
id=154 count= 1 name='Verwaltungsakt'
|
||||||
|
id=146 count= 1 name='Werbung'
|
||||||
|
id= 73 count= 0 name='Wohnflächenberechnung'
|
||||||
|
id=123 count= 3 name='Wohnflaechenberechnung'
|
||||||
|
audit trail written: /tmp/migrate_types_audit_20260513T085119.json
|
||||||
|
|
||||||
|
=== APPLY ===
|
||||||
|
[OK ] doc 104: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 124: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 88: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 134: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 122: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 71: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 220: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 223: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 224: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 255: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 248: 'Abrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 200: 'agreement' -> 'Contract'
|
||||||
|
[OK ] doc 222: 'Angebot' -> 'Information'
|
||||||
|
[OK ] doc 98: 'Antrag' -> 'Information'
|
||||||
|
[OK ] doc 91: 'Behandlungsplan und Risikoaufklärung' -> 'Information'
|
||||||
|
[OK ] doc 228: 'Beratungsprotokoll' -> 'Information'
|
||||||
|
[OK ] doc 202: 'Berechnung' -> 'Information'
|
||||||
|
[OK ] doc 96: 'Bericht' -> 'Information'
|
||||||
|
[OK ] doc 160: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 95: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 86: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 159: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 183: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 101: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 81: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 69: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 70: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 85: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 236: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 253: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 250: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 233: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 234: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 235: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 76: 'Bescheid' -> 'Information'
|
||||||
|
[OK ] doc 260: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 182: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 100: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 178: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 166: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 192: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 75: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 179: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 186: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 168: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 262: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 261: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 259: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 242: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 239: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 245: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 252: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 219: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 205: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 247: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 230: 'Bescheinigung' -> 'Information'
|
||||||
|
[OK ] doc 152: 'Bestätigungsbrief' -> 'Information'
|
||||||
|
[OK ] doc 244: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 164: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 146: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 169: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 191: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 105: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 188: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 115: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 97: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 196: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 74: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 113: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 102: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 126: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 195: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 110: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 170: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 180: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 116: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 127: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 149: 'Brief' -> 'Information'
|
||||||
|
[OK ] doc 227: 'contract' -> 'Contract'
|
||||||
|
[OK ] doc 156: 'Einladung zur Eigentümerversammlung' -> 'Information'
|
||||||
|
[OK ] doc 119: 'Einladung zur Eigentümerversammlung' -> 'Information'
|
||||||
|
[OK ] doc 163: 'Energy Performance Certificate' -> 'Information'
|
||||||
|
[OK ] doc 251: 'Erklarung' -> 'Information'
|
||||||
|
[OK ] doc 217: 'Finanzierungsangebot' -> 'Contract'
|
||||||
|
[OK ] doc 154: 'Guidelines' -> 'Information'
|
||||||
|
[OK ] doc 158: 'Gutachten' -> 'Information'
|
||||||
|
[OK ] doc 218: 'informational document' -> 'Information'
|
||||||
|
[OK ] doc 185: 'Informationsschreiben' -> 'Information'
|
||||||
|
[OK ] doc 189: 'Kontoauszug' -> 'Invoice'
|
||||||
|
[OK ] doc 187: 'Kontoübersicht' -> 'Invoice'
|
||||||
|
[OK ] doc 121: 'Kostenabrechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 107: 'Kostenvoranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 212: 'Kreditvertrag' -> 'Contract'
|
||||||
|
[OK ] doc 256: 'Leistungsnachweis' -> 'Information'
|
||||||
|
[OK ] doc 241: 'Leistungsnachweis' -> 'Information'
|
||||||
|
[OK ] doc 214: 'Loan Application and Agreement' -> 'Contract'
|
||||||
|
[OK ] doc 167: 'Lohnsteuerbescheinigung' -> 'Invoice'
|
||||||
|
[OK ] doc 254: 'Lohnsteuerbescheinigung' -> 'Invoice'
|
||||||
|
[OK ] doc 258: 'Lohnsteuerbescheinigung' -> 'Invoice'
|
||||||
|
[OK ] doc 249: 'Lohnsteuerbescheinigung' -> 'Invoice'
|
||||||
|
[OK ] doc 80: 'Mahnschreiben' -> 'Invoice'
|
||||||
|
[OK ] doc 138: 'Medical Consent Form' -> 'Information'
|
||||||
|
[OK ] doc 136: 'medical documentation' -> 'Information'
|
||||||
|
[OK ] doc 135: 'Medizinisch' -> 'Information'
|
||||||
|
[OK ] doc 197: 'Medizinisch' -> 'Information'
|
||||||
|
[OK ] doc 109: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 144: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 181: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 111: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 150: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 184: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 108: 'Mitteilung' -> 'Information'
|
||||||
|
[OK ] doc 206: 'Notarial Deed' -> 'Contract'
|
||||||
|
[OK ] doc 203: 'Notarized agreement with amendments' -> 'Contract'
|
||||||
|
[OK ] doc 151: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 90: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 93: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 92: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 161: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 140: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 132: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 155: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 73: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 162: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 94: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 78: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 143: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 106: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 72: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 193: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 194: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 139: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 165: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 133: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 173: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 148: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 147: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 141: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 142: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 231: 'Rechnung' -> 'Invoice'
|
||||||
|
[OK ] doc 175: 'Rechnungs- und Vertragsinformation' -> 'Information'
|
||||||
|
[OK ] doc 213: 'Rechtsgeschäft' -> 'Contract'
|
||||||
|
[OK ] doc 79: 'Schreiben des Finanzamts' -> 'Information'
|
||||||
|
[OK ] doc 246: 'Steuererklärung' -> 'Information'
|
||||||
|
[OK ] doc 77: 'Steuererklärung' -> 'Information'
|
||||||
|
[OK ] doc 257: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 237: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 238: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 240: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 243: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 204: 'Umsatzsteuer-Voranmeldung' -> 'Invoice'
|
||||||
|
[OK ] doc 229: 'Versicherungsbedingungen' -> 'Contract'
|
||||||
|
[OK ] doc 129: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 112: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 130: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 128: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 226: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 131: 'Versicherungsschein' -> 'Contract'
|
||||||
|
[OK ] doc 118: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 199: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 87: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 89: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 232: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 123: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 190: 'Vertrag' -> 'Contract'
|
||||||
|
[OK ] doc 177: 'Vertragsdokument' -> 'Contract'
|
||||||
|
[OK ] doc 82: 'Verwaltungsakt' -> 'Information'
|
||||||
|
[OK ] doc 176: 'Werbung' -> 'Information'
|
||||||
|
[OK ] doc 216: 'Wohnflaechenberechnung' -> 'Information'
|
||||||
|
[OK ] doc 201: 'Wohnflaechenberechnung' -> 'Information'
|
||||||
|
[OK ] doc 207: 'Wohnflaechenberechnung' -> 'Information'
|
||||||
|
[DEL] type 4 'Abrechnung' resp=''
|
||||||
|
[DEL] type 160 'agreement' resp=''
|
||||||
|
[DEL] type 13 'Angebot' resp=''
|
||||||
|
[DEL] type 134 'Antrag' resp=''
|
||||||
|
[DEL] type 141 'Behandlungsplan und Risikoaufklärung' resp=''
|
||||||
|
[DEL] type 129 'Beratungsprotokoll' resp=''
|
||||||
|
[DEL] type 143 'Berechnung' resp=''
|
||||||
|
[DEL] type 148 'Bericht' resp=''
|
||||||
|
[DEL] type 11 'Bescheid' resp=''
|
||||||
|
[DEL] type 15 'Bescheinigung' resp=''
|
||||||
|
[DEL] type 151 'Bestätigungsbrief' resp=''
|
||||||
|
[DEL] type 30 'Brief' resp=''
|
||||||
|
[DEL] type 127 'Consent Form' resp=''
|
||||||
|
[DEL] type 144 'contract' resp=''
|
||||||
|
[DEL] type 120 'Einladung / Vollmacht / Wirtschaftsplan' resp=''
|
||||||
|
[DEL] type 113 'Einladung zur Eigentümerversammlung' resp=''
|
||||||
|
[DEL] type 132 'Einspruchsschreiben' resp=''
|
||||||
|
[DEL] type 158 'Energy Performance Certificate' resp=''
|
||||||
|
[DEL] type 128 'Erklarung' resp=''
|
||||||
|
[DEL] type 156 'Finanzierungsangebot' resp=''
|
||||||
|
[DEL] type 122 'Geldzuwendungsbestätigung' resp=''
|
||||||
|
[DEL] type 157 'Guidelines' resp=''
|
||||||
|
[DEL] type 27 'Gutachten' resp=''
|
||||||
|
[DEL] type 136 'informational document' resp=''
|
||||||
|
[DEL] type 139 'Informationsschreiben' resp=''
|
||||||
|
[DEL] type 137 'Kaufvertrag' resp=''
|
||||||
|
[DEL] type 118 'Kontoauszug' resp=''
|
||||||
|
[DEL] type 117 'Kontoübersicht' resp=''
|
||||||
|
[DEL] type 145 'Kostenabrechnung' resp=''
|
||||||
|
[DEL] type 121 'Kostenvoranmeldung' resp=''
|
||||||
|
[DEL] type 142 'Kreditvertrag' resp=''
|
||||||
|
[DEL] type 114 'Kundeninformation' resp=''
|
||||||
|
[DEL] type 83 'Leistungsnachweis' resp=''
|
||||||
|
[DEL] type 135 'Loan Application and Agreement' resp=''
|
||||||
|
[DEL] type 66 'Lohnsteuerbescheinigung' resp=''
|
||||||
|
[DEL] type 147 'Mahnschreiben' resp=''
|
||||||
|
[DEL] type 140 'Medical Consent Form' resp=''
|
||||||
|
[DEL] type 150 'medical documentation' resp=''
|
||||||
|
[DEL] type 41 'Medizinisch' resp=''
|
||||||
|
[DEL] type 12 'Mitteilung' resp=''
|
||||||
|
[DEL] type 161 'Notarial Deed' resp=''
|
||||||
|
[DEL] type 159 'Notarized agreement with amendments' resp=''
|
||||||
|
[DEL] type 133 'Plan' resp=''
|
||||||
|
[DEL] type 131 'policy' resp=''
|
||||||
|
[DEL] type 116 'Questionnaire/Declaration Form' resp=''
|
||||||
|
[DEL] type 2 'Rechnung' resp=''
|
||||||
|
[DEL] type 149 'Rechnungs- und Vertragsinformation' resp=''
|
||||||
|
[DEL] type 125 'Rechtlicher Vertrag' resp=''
|
||||||
|
[DEL] type 155 'Rechtsgeschäft' resp=''
|
||||||
|
[DEL] type 126 'recommendation' resp=''
|
||||||
|
[DEL] type 152 'Schreiben des Finanzamts' resp=''
|
||||||
|
[DEL] type 119 'Steuerdokument' resp=''
|
||||||
|
[DEL] type 115 'Steuererklärung' resp=''
|
||||||
|
[DEL] type 124 'Tilgungsplan' resp=''
|
||||||
|
[DEL] type 88 'Umsatzsteuer-Voranmeldung' resp=''
|
||||||
|
[DEL] type 130 'Versicherungsbedingungen' resp=''
|
||||||
|
[DEL] type 67 'Versicherungsschein' resp=''
|
||||||
|
[DEL] type 40 'Vertrag' resp=''
|
||||||
|
[DEL] type 153 'Vertragsdokument' resp=''
|
||||||
|
[DEL] type 154 'Verwaltungsakt' resp=''
|
||||||
|
[DEL] type 146 'Werbung' resp=''
|
||||||
|
[DEL] type 73 'Wohnflächenberechnung' resp=''
|
||||||
|
[DEL] type 123 'Wohnflaechenberechnung' resp=''
|
||||||
|
done.
|
||||||
279
infra/paperless/migrate_types.py
Normal file
279
infra/paperless/migrate_types.py
Normal file
@@ -0,0 +1,279 @@
|
|||||||
|
"""
|
||||||
|
Collapse Paperless document types 69 -> 10, per the mapping agreed in
|
||||||
|
otto#429.
|
||||||
|
|
||||||
|
Run locally on mDock against the live Paperless API. Default mode is
|
||||||
|
DRY RUN — prints what would change without writing. Pass --apply to
|
||||||
|
actually PATCH docs and DELETE old types.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python3 migrate_types.py # dry run
|
||||||
|
python3 migrate_types.py --apply # live
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
# The 10 canonical target types (Paperless type IDs after Step 3).
|
||||||
|
TARGET = {
|
||||||
|
"Invoice": 162,
|
||||||
|
"Contract": 163,
|
||||||
|
"Information": 164,
|
||||||
|
"Personal Correspondence": 165,
|
||||||
|
"Vollmacht": 22,
|
||||||
|
"Urkunde": 37,
|
||||||
|
"Steuerbescheid": 138,
|
||||||
|
"Anleitung": 76,
|
||||||
|
"Protokoll": 32,
|
||||||
|
"Formular": 80,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mapping: old type *name* -> target canonical name.
|
||||||
|
# Built from the audit doc's mapping table. Anything not listed here
|
||||||
|
# stays at its current type (and gets surfaced as "unmapped" so we
|
||||||
|
# can decide manually).
|
||||||
|
MAP = {
|
||||||
|
# ----- Invoice ------------------------------------------------
|
||||||
|
"Rechnung": "Invoice",
|
||||||
|
"Abrechnung": "Invoice",
|
||||||
|
"Mahnschreiben": "Invoice",
|
||||||
|
"Kontoauszug": "Invoice",
|
||||||
|
"Kontoübersicht": "Invoice",
|
||||||
|
"Kostenabrechnung": "Invoice",
|
||||||
|
"Kostenvoranmeldung": "Invoice",
|
||||||
|
"Umsatzsteuer-Voranmeldung": "Invoice",
|
||||||
|
"Tilgungsplan": "Invoice",
|
||||||
|
"Lohnsteuerbescheinigung": "Invoice",
|
||||||
|
|
||||||
|
# ----- Contract -----------------------------------------------
|
||||||
|
"Vertrag": "Contract",
|
||||||
|
"Versicherungsschein": "Contract",
|
||||||
|
"Kaufvertrag": "Contract",
|
||||||
|
"Kreditvertrag": "Contract",
|
||||||
|
"Notarial Deed": "Contract",
|
||||||
|
"agreement": "Contract",
|
||||||
|
"contract": "Contract",
|
||||||
|
"policy": "Contract",
|
||||||
|
"Vertragsdokument": "Contract",
|
||||||
|
"Rechtsgeschäft": "Contract",
|
||||||
|
"Rechtlicher Vertrag": "Contract",
|
||||||
|
"Versicherungsbedingungen": "Contract",
|
||||||
|
"Finanzierungsangebot": "Contract",
|
||||||
|
"Loan Application and Agreement": "Contract",
|
||||||
|
"Notarized agreement with amendments": "Contract",
|
||||||
|
|
||||||
|
# ----- Information --------------------------------------------
|
||||||
|
"Bescheid": "Information",
|
||||||
|
"Bescheinigung": "Information",
|
||||||
|
"Mitteilung": "Information",
|
||||||
|
"Verwaltungsakt": "Information",
|
||||||
|
"Schreiben des Finanzamts": "Information",
|
||||||
|
"Informationsschreiben": "Information",
|
||||||
|
"informational document": "Information",
|
||||||
|
"Kundeninformation": "Information",
|
||||||
|
"Werbung": "Information",
|
||||||
|
"Bestätigungsbrief": "Information",
|
||||||
|
"Geldzuwendungsbestätigung": "Information",
|
||||||
|
"Antrag": "Information",
|
||||||
|
"Erklarung": "Information",
|
||||||
|
"Leistungsnachweis": "Information",
|
||||||
|
"Beratungsprotokoll": "Information",
|
||||||
|
"Gutachten": "Information",
|
||||||
|
"Bericht": "Information",
|
||||||
|
"Berechnung": "Information",
|
||||||
|
"Wohnflaechenberechnung": "Information",
|
||||||
|
"Wohnflächenberechnung": "Information",
|
||||||
|
"Guidelines": "Information",
|
||||||
|
"Energy Performance Certificate": "Information",
|
||||||
|
"Einladung zur Eigentümerversammlung": "Information",
|
||||||
|
"Einladung / Vollmacht / Wirtschaftsplan": "Information",
|
||||||
|
"Steuerdokument": "Information",
|
||||||
|
"Steuererklärung": "Information",
|
||||||
|
"Plan": "Information",
|
||||||
|
"Einspruchsschreiben": "Information",
|
||||||
|
"Angebot": "Information",
|
||||||
|
"recommendation": "Information",
|
||||||
|
"Behandlungsplan und Risikoaufklärung": "Information",
|
||||||
|
"Medical Consent Form": "Information",
|
||||||
|
"Consent Form": "Information",
|
||||||
|
"Medizinisch": "Information",
|
||||||
|
"medical documentation": "Information",
|
||||||
|
"Questionnaire/Declaration Form": "Information",
|
||||||
|
"Rechnungs- und Vertragsinformation": "Information",
|
||||||
|
|
||||||
|
# ----- Personal Correspondence --------------------------------
|
||||||
|
# Per m's explicit answer: Brief defaults to Information.
|
||||||
|
# Personal Correspondence is opt-in for letters that are clearly
|
||||||
|
# from a private person; the AI applies it going forward on a
|
||||||
|
# case-by-case basis. For the migration of the 21 existing
|
||||||
|
# Briefe (none of which we can read here to distinguish), they
|
||||||
|
# land in Information — the safe default m chose.
|
||||||
|
"Brief": "Information",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
import shlex
|
||||||
|
|
||||||
|
|
||||||
|
def gitea_curl(token, path, method="GET", body=None):
|
||||||
|
inner_parts = [
|
||||||
|
"curl", "-s",
|
||||||
|
"-X", method,
|
||||||
|
"-H", f"Authorization: Token {token}",
|
||||||
|
]
|
||||||
|
if body is not None:
|
||||||
|
inner_parts += ["-H", "Content-Type: application/json", "-d", json.dumps(body)]
|
||||||
|
inner_parts.append(f"http://localhost:8000/api{path}")
|
||||||
|
inner = " ".join(shlex.quote(p) for p in inner_parts)
|
||||||
|
full = f"docker exec paperless-webserver-1 {inner}"
|
||||||
|
out = subprocess.run(
|
||||||
|
["ssh", "mdock", full], capture_output=True, text=True, timeout=120,
|
||||||
|
)
|
||||||
|
if out.returncode != 0:
|
||||||
|
raise RuntimeError(f"curl failed rc={out.returncode}: {out.stderr}")
|
||||||
|
return out.stdout
|
||||||
|
|
||||||
|
|
||||||
|
def get_token():
|
||||||
|
out = subprocess.run(
|
||||||
|
["ssh", "mdock", "docker exec paperless-ai sh -c 'grep ^PAPERLESS_API_TOKEN /app/data/.env | cut -d= -f2'"],
|
||||||
|
capture_output=True, text=True, timeout=15,
|
||||||
|
)
|
||||||
|
return out.stdout.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_all(token, path):
|
||||||
|
"""GET path paged; returns flat list of results."""
|
||||||
|
results = []
|
||||||
|
page = 1
|
||||||
|
while True:
|
||||||
|
raw = gitea_curl(token, f"{path}?page={page}&page_size=200")
|
||||||
|
data = json.loads(raw)
|
||||||
|
results.extend(data.get("results", []))
|
||||||
|
if not data.get("next"):
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
ap = argparse.ArgumentParser()
|
||||||
|
ap.add_argument("--apply", action="store_true", help="Actually write changes")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
token = get_token()
|
||||||
|
if not token:
|
||||||
|
sys.exit("no PAPERLESS_API_TOKEN found")
|
||||||
|
|
||||||
|
types = fetch_all(token, "/document_types/")
|
||||||
|
docs = fetch_all(token, "/documents/")
|
||||||
|
print(f"loaded {len(types)} types, {len(docs)} docs")
|
||||||
|
|
||||||
|
type_by_id = {t["id"]: t for t in types}
|
||||||
|
type_by_name = {t["name"]: t for t in types}
|
||||||
|
|
||||||
|
# Sanity: verify all 10 targets exist
|
||||||
|
for name, tid in TARGET.items():
|
||||||
|
t = type_by_id.get(tid)
|
||||||
|
if not t or t["name"] != name:
|
||||||
|
sys.exit(f"target type missing or mismatched: id={tid} expected name={name!r} got={t}")
|
||||||
|
print("all 10 target types verified")
|
||||||
|
|
||||||
|
# Build plan
|
||||||
|
moves = [] # list of (doc_id, current_type_name, new_type_id)
|
||||||
|
unmapped_types = []
|
||||||
|
delete_candidates = []
|
||||||
|
|
||||||
|
for t in types:
|
||||||
|
if t["id"] in TARGET.values():
|
||||||
|
continue # keep
|
||||||
|
target_name = MAP.get(t["name"])
|
||||||
|
if target_name is None:
|
||||||
|
if t["document_count"] == 0:
|
||||||
|
delete_candidates.append(t)
|
||||||
|
else:
|
||||||
|
unmapped_types.append(t)
|
||||||
|
continue
|
||||||
|
new_tid = TARGET[target_name]
|
||||||
|
# Find docs with this type
|
||||||
|
for d in docs:
|
||||||
|
if d.get("document_type") == t["id"]:
|
||||||
|
moves.append((d["id"], t["name"], new_tid, target_name))
|
||||||
|
# Old type becomes deletable after all its docs are moved
|
||||||
|
delete_candidates.append(t)
|
||||||
|
|
||||||
|
print()
|
||||||
|
print(f"=== PLAN ===")
|
||||||
|
print(f"document moves: {len(moves)}")
|
||||||
|
print(f"types to delete (after moves): {len(delete_candidates)}")
|
||||||
|
print(f"types NOT mapped + nonzero docs (need manual call): {len(unmapped_types)}")
|
||||||
|
if unmapped_types:
|
||||||
|
print(" -- unmapped --")
|
||||||
|
for t in unmapped_types:
|
||||||
|
print(f" id={t['id']:3d} count={t['document_count']:3d} name={t['name']!r}")
|
||||||
|
print()
|
||||||
|
print("=== MOVES SUMMARY (per target type) ===")
|
||||||
|
counter = {}
|
||||||
|
for _, old_name, _, new_name in moves:
|
||||||
|
counter[new_name] = counter.get(new_name, {})
|
||||||
|
counter[new_name][old_name] = counter[new_name].get(old_name, 0) + 1
|
||||||
|
for new_name, src in sorted(counter.items()):
|
||||||
|
total = sum(src.values())
|
||||||
|
print(f" -> {new_name} (+{total} docs)")
|
||||||
|
for old_name, n in sorted(src.items(), key=lambda kv: -kv[1]):
|
||||||
|
print(f" {n:3d} from {old_name}")
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=== TYPES TO DELETE (after moves) ===")
|
||||||
|
for t in delete_candidates:
|
||||||
|
print(f" id={t['id']:3d} count={t['document_count']:3d} name={t['name']!r}")
|
||||||
|
|
||||||
|
if not args.apply:
|
||||||
|
print()
|
||||||
|
print("DRY RUN — re-run with --apply to write changes")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Audit trail BEFORE writing
|
||||||
|
import datetime
|
||||||
|
audit_path = f"/tmp/migrate_types_audit_{datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%S')}.json"
|
||||||
|
audit = {
|
||||||
|
"ts_utc": datetime.datetime.utcnow().isoformat() + "Z",
|
||||||
|
"types_snapshot": [
|
||||||
|
{"id": t["id"], "name": t["name"], "document_count": t["document_count"]}
|
||||||
|
for t in types
|
||||||
|
],
|
||||||
|
"moves": [
|
||||||
|
{"doc_id": d_id, "old_type_name": old_name, "new_type_id": ntid, "new_type_name": nname}
|
||||||
|
for d_id, old_name, ntid, nname in moves
|
||||||
|
],
|
||||||
|
"deletes": [
|
||||||
|
{"id": t["id"], "name": t["name"], "document_count_before": t["document_count"]}
|
||||||
|
for t in delete_candidates
|
||||||
|
],
|
||||||
|
}
|
||||||
|
with open(audit_path, "w") as f:
|
||||||
|
json.dump(audit, f, indent=2, ensure_ascii=False)
|
||||||
|
print(f"audit trail written: {audit_path}")
|
||||||
|
print()
|
||||||
|
print("=== APPLY ===")
|
||||||
|
for doc_id, old_name, new_tid, new_name in moves:
|
||||||
|
r = gitea_curl(token, f"/documents/{doc_id}/", method="PATCH", body={"document_type": new_tid})
|
||||||
|
try:
|
||||||
|
d = json.loads(r)
|
||||||
|
ok = d.get("id") == doc_id
|
||||||
|
except Exception:
|
||||||
|
ok = False
|
||||||
|
flag = "OK " if ok else "ERR"
|
||||||
|
print(f" [{flag}] doc {doc_id}: {old_name!r} -> {new_name!r}")
|
||||||
|
for t in delete_candidates:
|
||||||
|
r = gitea_curl(token, f"/document_types/{t['id']}/", method="DELETE")
|
||||||
|
# Paperless DELETE returns empty 204 on success
|
||||||
|
print(f" [DEL] type {t['id']} {t['name']!r} resp={r[:80]!r}")
|
||||||
|
|
||||||
|
print("done.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
1715
infra/paperless/migrate_types_audit_20260513T085119.json
Normal file
1715
infra/paperless/migrate_types_audit_20260513T085119.json
Normal file
File diff suppressed because it is too large
Load Diff
18
infra/samba-canon/Dockerfile
Normal file
18
infra/samba-canon/Dockerfile
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
FROM alpine:3.13
|
||||||
|
|
||||||
|
RUN apk add --no-cache \
|
||||||
|
samba \
|
||||||
|
samba-common-tools \
|
||||||
|
shadow \
|
||||||
|
&& rm -rf /var/cache/apk/*
|
||||||
|
|
||||||
|
RUN rm -rf /etc/samba/* /var/lib/samba/* /var/log/samba/* \
|
||||||
|
&& mkdir -p /etc/samba /var/lib/samba/private /var/log/samba /var/run/samba /inbox
|
||||||
|
|
||||||
|
COPY smb.conf /etc/samba/smb.conf
|
||||||
|
COPY entrypoint.sh /entrypoint.sh
|
||||||
|
RUN chmod 0755 /entrypoint.sh
|
||||||
|
|
||||||
|
EXPOSE 139 445
|
||||||
|
|
||||||
|
ENTRYPOINT ["/entrypoint.sh"]
|
||||||
120
infra/samba-canon/README.md
Normal file
120
infra/samba-canon/README.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
# samba-canon — SMB bridge for the Canon MAXIFY MB5100
|
||||||
|
|
||||||
|
Old-Samba container on mDock that gives the Canon MB5100 (2014, SMB1 +
|
||||||
|
NTLMv1 only) a writable share. Scans land in `/mnt/mdms/inbox/` and are
|
||||||
|
picked up by Paperless within 60s via the existing consume-folder poll.
|
||||||
|
|
||||||
|
## Why this exists
|
||||||
|
|
||||||
|
The Canon MAXIFY MB5100 only supports SMB Shared Folder as a scan
|
||||||
|
destination (no FTP, no WebDAV — see the [official manual][canon-manual]).
|
||||||
|
It speaks SMB1 with NTLMv1 auth.
|
||||||
|
|
||||||
|
Direct scan-to-TrueNAS fails reproducibly even with `enable_smb1=true` +
|
||||||
|
`ntlmv1_auth=true` flipped on TrueNAS Core: the TrueNAS-Samba (4.19+) ships
|
||||||
|
extra SMB1 hardening that breaks the printer's handshake. `smb1_process.c:502`
|
||||||
|
logs `NT_STATUS_CONNECTION_RESET` — the printer closes the TCP socket before
|
||||||
|
the first SMB packet is processed.
|
||||||
|
|
||||||
|
Rather than fight TrueNAS hardening, this container runs a deliberately old
|
||||||
|
Samba (4.13.17 on Alpine 3.13) on mDock, bound to mDock's LAN interface
|
||||||
|
only, and writes received files straight to the NFS-mounted Paperless
|
||||||
|
inbox.
|
||||||
|
|
||||||
|
The TrueNAS SMB stack stays untouched — mBreeze and mPebble keep mounting
|
||||||
|
`mDMS` directly from TrueNAS as before.
|
||||||
|
|
||||||
|
[canon-manual]: https://ij.manual.canon/ij/webmanual/Manual/All/MB5100%20series/EN/UG/ug_scanning0700.html
|
||||||
|
|
||||||
|
## Layout
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
| ----------------- | ---------------------------------------------------------- |
|
||||||
|
| `Dockerfile` | `alpine:3.13` + samba 4.13.17, ~46 MiB image |
|
||||||
|
| `smb.conf` | NT1 server, NTLMv1 + LANMAN enabled, single `[inbox]` share |
|
||||||
|
| `entrypoint.sh` | Creates `canon` user at UID 1000, sets pw from env, runs smbd |
|
||||||
|
| `docker-compose.yml` | Binds 445/139 on the LAN IP only, mounts `/mnt/mdms/inbox` |
|
||||||
|
|
||||||
|
These files are a **traceable copy** of what lives in `~/samba-canon/` on
|
||||||
|
mDock (same convention as `infra/paperless/`). If you change the live config
|
||||||
|
on mDock, sync the change here in the same commit.
|
||||||
|
|
||||||
|
## Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scp infra/samba-canon/{Dockerfile,smb.conf,entrypoint.sh,docker-compose.yml} \
|
||||||
|
mdock:~/samba-canon/
|
||||||
|
ssh mdock 'cd ~/samba-canon && docker compose up -d --build'
|
||||||
|
```
|
||||||
|
|
||||||
|
The real `CANON_PASSWORD` lives in `~/samba-canon/.env` on mDock (chmod 600,
|
||||||
|
not committed). Rotate by editing `.env` and `docker compose restart` —
|
||||||
|
`entrypoint.sh` re-applies the password to the Samba TDB on every boot.
|
||||||
|
|
||||||
|
## Canon Quick Utility Toolbox values
|
||||||
|
|
||||||
|
Use these exact values in the printer's "Destination Settings → Folder"
|
||||||
|
entry (Canon Drucker Quick Utility Toolbox → Destination Folder Settings):
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
| ---------------- | ---------------------------------------------- |
|
||||||
|
| Display name | `mDock Inbox` (any label) |
|
||||||
|
| SMB server name | `192.168.178.131` (mDock LAN IP — not `mdock`, the printer does no DNS) |
|
||||||
|
| Shared folder | `inbox` |
|
||||||
|
| Domain / Workgroup | leave blank, or `WORKGROUP` |
|
||||||
|
| User | `canon` |
|
||||||
|
| Password | (from `~/samba-canon/.env` on mDock — `CANON_PASSWORD`) |
|
||||||
|
| Port | leave default (445) — non-standard ports are not supported by the printer |
|
||||||
|
|
||||||
|
The printer's connection-test should report success.
|
||||||
|
|
||||||
|
## Verification (replayed during deploy)
|
||||||
|
|
||||||
|
1. **`smbclient` listing from a known-good client.** From mBreeze:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
smbutil view -A "//canon:<pw>@192.168.178.131"
|
||||||
|
# → "Authenticate successfully with //canon:…@192.168.178.131"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Mount + write from mBreeze.**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p /tmp/canon-test
|
||||||
|
mount -t smbfs "//canon:<pw>@192.168.178.131/inbox" /tmp/canon-test
|
||||||
|
touch /tmp/canon-test/probe.txt
|
||||||
|
ls -la /mnt/mdms/inbox/probe.txt # on mDock — should show m:m, mode 0664
|
||||||
|
umount /tmp/canon-test
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Toolbox connection test** — green tick (m runs this once during setup).
|
||||||
|
|
||||||
|
4. **Real scan from the ADF** — PDF lands in `/mnt/mdms/inbox/`, Paperless
|
||||||
|
polls within 60 s, OCR + AI-typing run, file moves to
|
||||||
|
`<year>/<type>/...` (existing Paperless pipeline, see `infra/paperless/`).
|
||||||
|
|
||||||
|
5. **Survives mDock reboot.** `docker compose up -d` sets
|
||||||
|
`restart: unless-stopped`. Verified via `docker restart samba-canon` —
|
||||||
|
container comes back up and shares are reachable within ~5 s.
|
||||||
|
|
||||||
|
## Security notes
|
||||||
|
|
||||||
|
- LAN-only. The compose binds `192.168.178.131:445` and `192.168.178.131:139`,
|
||||||
|
not `0.0.0.0`. The container is not reachable from Tailscale or the
|
||||||
|
internet.
|
||||||
|
- SMB1 + NTLMv1 are insecure by design. Acceptable here because the threat
|
||||||
|
model is "untrusted devices on the home LAN", and the only client is the
|
||||||
|
printer. **Do not expose this share to anything except the Canon.**
|
||||||
|
- The `canon` user is a Samba-only account (`/sbin/nologin`, no system
|
||||||
|
password, no shell). It maps to UID 1000 inside the container so that
|
||||||
|
files written through SMB land as `m:m` on the host NFS mount.
|
||||||
|
- If `CANON_PASSWORD` leaks, rotate it: edit `~/samba-canon/.env` on mDock,
|
||||||
|
`docker compose restart samba-canon`, and re-enter the new password in
|
||||||
|
the Canon Toolbox.
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- TLS / encrypted SMB — incompatible with the printer; LAN-only mitigates.
|
||||||
|
- Multi-user — only the printer needs to write here.
|
||||||
|
- Replacing the TrueNAS SMB stack mBreeze/mPebble already use.
|
||||||
|
- Replacing the printer — m wants to keep the MB5100 working.
|
||||||
36
infra/samba-canon/docker-compose.yml
Normal file
36
infra/samba-canon/docker-compose.yml
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
services:
|
||||||
|
samba-canon:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
image: samba-canon:alpine3.13
|
||||||
|
container_name: samba-canon
|
||||||
|
restart: unless-stopped
|
||||||
|
# The Canon MAXIFY MB5100 only speaks SMB on the standard ports — non-standard
|
||||||
|
# ports are not configurable in the printer. So we bind 445/139 on the LAN
|
||||||
|
# interface only (mDock's LAN IP), keeping Tailscale out of scope.
|
||||||
|
ports:
|
||||||
|
- "192.168.178.131:445:445/tcp"
|
||||||
|
- "192.168.178.131:139:139/tcp"
|
||||||
|
volumes:
|
||||||
|
# /mnt/mdms/inbox is NFS-mounted on mDock from TrueNAS (192.168.178.124).
|
||||||
|
# Paperless's consume folder polls /mnt/mdms/inbox every 60s, so scans
|
||||||
|
# land here and are picked up by Paperless without further wiring.
|
||||||
|
- /mnt/mdms/inbox:/inbox:rw
|
||||||
|
environment:
|
||||||
|
# canon user inside the container is created with this UID/GID at boot.
|
||||||
|
# 1000 = m on mDock, which also owns /mnt/mdms/inbox.
|
||||||
|
PUID: "1000"
|
||||||
|
PGID: "1000"
|
||||||
|
# Real password is in .env (gitignored); see README.md.
|
||||||
|
CANON_PASSWORD: "${CANON_PASSWORD:?CANON_PASSWORD must be set in .env}"
|
||||||
|
# smbd needs the full default cap set (SETUID/SETGID to honour `force user`,
|
||||||
|
# CHOWN/FOWNER/DAC_OVERRIDE for file creation, NET_BIND_SERVICE for <1024).
|
||||||
|
# We rely on Docker defaults rather than cap_drop ALL + a hand-picked list.
|
||||||
|
# Light healthcheck — smbd answers `smbclient -L` once it's up.
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "smbclient -L //127.0.0.1 -U canon%${CANON_PASSWORD} -m SMB3 >/dev/null 2>&1 || smbclient -L //127.0.0.1 -U canon%${CANON_PASSWORD} -m NT1 >/dev/null 2>&1"]
|
||||||
|
interval: 60s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 15s
|
||||||
41
infra/samba-canon/entrypoint.sh
Normal file
41
infra/samba-canon/entrypoint.sh
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
# Map the in-container "canon" user to the same UID/GID as `m` on the host
|
||||||
|
# (UID 1000 / GID 1000). force user = canon in smb.conf then guarantees that
|
||||||
|
# every file written through SMB lands as m:m on the NFS-mounted /mnt/mdms/inbox.
|
||||||
|
TARGET_UID="${PUID:-1000}"
|
||||||
|
TARGET_GID="${PGID:-1000}"
|
||||||
|
|
||||||
|
if ! getent group canon >/dev/null 2>&1; then
|
||||||
|
addgroup -g "${TARGET_GID}" canon
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! getent passwd canon >/dev/null 2>&1; then
|
||||||
|
adduser -D -H -u "${TARGET_UID}" -G canon -s /sbin/nologin canon
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${CANON_PASSWORD:-}" ]; then
|
||||||
|
echo "FATAL: CANON_PASSWORD env var is required" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# (Re)apply the Samba password every boot so rotating it = restart the container.
|
||||||
|
printf '%s\n%s\n' "${CANON_PASSWORD}" "${CANON_PASSWORD}" | smbpasswd -s -a canon >/dev/null
|
||||||
|
smbpasswd -e canon >/dev/null
|
||||||
|
|
||||||
|
# Verify the bind-mounted /inbox exists and is writable from the container.
|
||||||
|
# smbd will drop privilege per session to the canon user (uid 1000), which
|
||||||
|
# matches m on the host — files therefore land as m:m on the NFS mount.
|
||||||
|
if ! test -d /inbox; then
|
||||||
|
echo "FATAL: /inbox missing — bind mount /mnt/mdms/inbox not set." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! test -w /inbox; then
|
||||||
|
echo "FATAL: /inbox not writable. Check NFS mount + permissions on /mnt/mdms/inbox (must be writable by uid ${TARGET_UID})." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "samba-canon ready: smbd $(smbd --version | head -1), user=canon uid=${TARGET_UID} gid=${TARGET_GID}"
|
||||||
|
|
||||||
|
exec smbd --foreground --no-process-group --log-stdout
|
||||||
49
infra/samba-canon/smb.conf
Normal file
49
infra/samba-canon/smb.conf
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
[global]
|
||||||
|
workgroup = WORKGROUP
|
||||||
|
server string = Canon SMB bridge
|
||||||
|
netbios name = MDOCK-CANON
|
||||||
|
security = user
|
||||||
|
map to guest = Never
|
||||||
|
log file = /var/log/samba/log.%m
|
||||||
|
log level = 1
|
||||||
|
max log size = 1000
|
||||||
|
|
||||||
|
# Old-school SMB1 + NTLMv1 — required by Canon MAXIFY MB5100 (2014, SMB1 only).
|
||||||
|
# LAN-only, no encryption — see infra/samba-canon/README.md.
|
||||||
|
server min protocol = NT1
|
||||||
|
server max protocol = SMB3
|
||||||
|
client min protocol = NT1
|
||||||
|
client max protocol = SMB3
|
||||||
|
ntlm auth = ntlmv1-permitted
|
||||||
|
lanman auth = yes
|
||||||
|
client lanman auth = yes
|
||||||
|
client plaintext auth = no
|
||||||
|
server signing = disabled
|
||||||
|
smb encrypt = disabled
|
||||||
|
server multi channel support = no
|
||||||
|
|
||||||
|
# Performance / sanity for a single-share LAN bridge
|
||||||
|
load printers = no
|
||||||
|
printing = bsd
|
||||||
|
printcap name = /dev/null
|
||||||
|
disable spoolss = yes
|
||||||
|
dns proxy = no
|
||||||
|
usershare allow guests = no
|
||||||
|
panic action = /bin/sh -c 'echo "smbd panic at $(date)" >&2'
|
||||||
|
|
||||||
|
[inbox]
|
||||||
|
comment = Canon scan inbox (writes to /mnt/mdms/inbox on TrueNAS via NFS)
|
||||||
|
path = /inbox
|
||||||
|
browseable = yes
|
||||||
|
writable = yes
|
||||||
|
read only = no
|
||||||
|
guest ok = no
|
||||||
|
valid users = canon
|
||||||
|
force user = canon
|
||||||
|
force group = canon
|
||||||
|
create mask = 0664
|
||||||
|
directory mask = 0775
|
||||||
|
force create mode = 0664
|
||||||
|
force directory mode = 0775
|
||||||
|
# The Canon writes single PDFs; vfs full_audit is overkill.
|
||||||
|
vfs objects =
|
||||||
Reference in New Issue
Block a user