tools/anti-ai-lint.py: Python-Linter (stdlib + yq) prueft jede build/<domain>/index.html gegen die Blacklist in tools/anti-ai-blacklist.yaml. HTML wird via html.parser auf sichtbaren Text reduziert (Skripte/Styles werden ignoriert), dann werden Vokabel- Substrings (DE+EN, case-insensitive) und Regex-Patterns gematcht. Severity warn = Build geht durch, fail = Build bricht ab. Whitelist-Mechanismen: - HTML-Kommentar im Markup: <!-- anti-ai-allow: term1, term2 --> - Per-Site in site.yaml: anti_ai_allow: [term1, term2] Integration in build.sh als Schritt 4/4, mit --skip-lint fuer Notfaelle. Dockerfile installiert python3 zusaetzlich; nur im Builder-Stage, kein Effekt aufs Caddy-Image. Tests via tools/test-anti-ai-lint.sh: synthetische AI-Fixture wird korrekt geflagged, Whitelists unterdruecken Hits, fail-Severity triggert exit 1, neutraler Text exit 0. Initial-Lauf auf 59 bestehenden Sites: 2 warn (killusion.de "revolutionaer" in ironischem Kontext, kilofant.de "robust"), 0 fail. Cleanup ist Folge-Issue. README + docs/geo-seo-guideline.md aktualisiert mit der konkreten Tool-Position.
98 lines
2.8 KiB
YAML
98 lines
2.8 KiB
YAML
# Anti-AI lint rules: textual fingerprints typical of LLM-generated content.
|
|
#
|
|
# Severity:
|
|
# warn — build proceeds, message printed
|
|
# fail — build aborts (exit 1) unless build.sh --skip-lint
|
|
#
|
|
# Whitelisting matches:
|
|
# In an HTML file: <!-- anti-ai-allow: term -->
|
|
# <!-- anti-ai-allow: term1, term2 -->
|
|
# Per site (site.yaml): anti_ai_allow:
|
|
# - leverage
|
|
# - em-dash-3-bullet
|
|
#
|
|
# Vocab matches are case-insensitive substring matches against the visible
|
|
# text of the rendered HTML (script/style/comments stripped). Pattern matches
|
|
# are regex (Python re), case-insensitive by default, against the same.
|
|
#
|
|
# Source: docs/geo-seo-guideline.md §3.6 (Wikipedia AI-content signals).
|
|
|
|
vocab:
|
|
de:
|
|
warn:
|
|
- "nahtlos"
|
|
- "robust"
|
|
- "umfassend"
|
|
- "ganzheitlich"
|
|
- "fungiert als"
|
|
- "dient als Brücke"
|
|
- "Symbiose"
|
|
- "im Bereich der"
|
|
- "in der heutigen schnelllebigen"
|
|
- "ein Meilenstein"
|
|
- "ein Beweis für"
|
|
- "hat Spuren hinterlassen"
|
|
- "Es ist wichtig zu erwähnen"
|
|
- "Es ist wichtig zu beachten"
|
|
- "bahnbrechend"
|
|
- "revolutionär"
|
|
fail:
|
|
- "in der sich entwickelnden Landschaft"
|
|
- "Herausforderungen und Zukunftsaussichten"
|
|
- "Herausforderungen und Perspektiven"
|
|
|
|
en:
|
|
warn:
|
|
- "delve"
|
|
- "tapestry"
|
|
- "testament"
|
|
- "intricate"
|
|
- "garnered"
|
|
- "bolstered"
|
|
- "enduring"
|
|
- "robust"
|
|
- "comprehensive"
|
|
- "meticulous"
|
|
- "interplay"
|
|
- "pivotal"
|
|
- "underscore"
|
|
- "moreover"
|
|
- "furthermore"
|
|
- "additionally"
|
|
- "crucial"
|
|
- "showcasing"
|
|
- "highlighting"
|
|
- "leverage"
|
|
- "streamline"
|
|
- "holistic"
|
|
- "seamless"
|
|
- "unleash"
|
|
- "ecosystem"
|
|
- "in the realm of"
|
|
- "dive into"
|
|
- "It's important to note that"
|
|
- "It is important to note that"
|
|
- "In this article, we'll"
|
|
fail:
|
|
- "in today's evolving landscape"
|
|
- "in the ever-evolving landscape"
|
|
- "Challenges and Future Prospects"
|
|
|
|
patterns:
|
|
- name: em-dash-3-bullet
|
|
description: |
|
|
Three "Word: text — Word: text — Word: …" segments in one block.
|
|
Classic AI bullet pattern.
|
|
regex: '(\w[\w\s]{0,30}:\s+[^—\n]{2,80}—\s*){2,}\w[\w\s]{0,30}:'
|
|
severity: warn
|
|
|
|
- name: not-only-but-also
|
|
description: '"not only X, but also Y" / "nicht nur X, sondern auch Y" filler.'
|
|
regex: '\b(?:not only|nicht nur)\b[^.,;\n]{1,80}\b(?:but also|sondern auch)\b'
|
|
severity: warn
|
|
|
|
- name: as-an-ai
|
|
description: Leftover AI self-disclosure.
|
|
regex: '\b(?:as an? (?:AI|language model)|als (?:eine?|eine\s+)?(?:KI|Sprachmodell))\b'
|
|
severity: fail
|