Files
paliad/cmd/server/main_smoke_test.go
mAi c901293c9c
Some checks failed
Paliad CI gate / build (push) Has been cancelled
Paliad CI gate / test-go (push) Has been cancelled
Paliad CI gate / deploy (push) Has been cancelled
feat(cicd): Slice A — pre-deploy gate + role-split migration smoke
Adds .gitea/workflows/test.yaml that gates every push on `go build`,
`bun run build`, `go vet`, the migration coordination check, and the
role-split end-to-end migration smoke. On push to main + green, calls
Dokploy's compose.deploy API and polls /health/ready until 200.

t-paliad-282 / m/paliad#114. Design: docs/design-cicd-pre-deploy-gate-2026-05-25.md
(inventor shift on mai/cronus/inventor-ci-cd-pre).

Catches all three of today's outage classes:

  brunel (~13:20) slot collision     -> TestMigrations_NoDuplicateSlot
  hermes (~16:05) dropped-col refs   -> TestBootSmoke
  mig 129 (~14:56) 42501 ownership   -> TestMigrations_EndToEndAsAppRole

Snapshot approach. internal/db/testdata/prod-snapshot.sql is a pg_dump
of youpc-supabase paliad schema + applied_migrations rows. CI restores
this into a fresh `supabase/postgres:15.8.1.060` (same image, same role
topology as prod) and runs ApplyMigrations as the `postgres` role
(which is NOT a superuser on supabase/postgres, matching prod). Existing
migrations are skipped (already in applied_migrations); only NEW migs
from the PR run end-to-end. This sidesteps the fresh-DB idempotence
debt in some historical migrations (mig 037 missing pg_trgm, mig 051
inner COMMIT) — those are tracked separately and don't block the gate.

Sub-changes:

- internal/handlers/handlers.go — new /health/ready endpoint distinct
  from /healthz. /healthz stays liveness (process alive, no DB); /ready
  is readiness (DB pool pings within 2 s). Returns 503 when svc or pool
  is nil (DB-less deploys are intentionally not-ready). svc.Pool added
  to handlers.Services, wired in cmd/server/main.go.

- internal/db/migrate_test.go — TestMigrations_NoDuplicateSlot (pure
  unit, catches brunel) and TestMigrations_EndToEndAsAppRole (snapshot-
  gated, catches the 42501 class).

- cmd/server/main_smoke_test.go — TestBootSmoke now also asserts
  /health/ready returns 503 with a nil svc. New TestHealthReady_Live
  asserts 200 against a live pool.

- internal/db/migrations/024_rename_department_columns.up.sql and
  027_rename_to_partner_units.up.sql — ALTER INDEX / ALTER POLICY
  exception handlers now catch undefined_object OR undefined_table OR
  duplicate_object. Old handler only caught undefined_object; Postgres
  raises undefined_table when source object never existed, and
  duplicate_object when destination already exists. The expanded
  handlers make these migrations truly idempotent across all plausible
  starting states.

- Makefile — verify-mig-app, test-frontend, refresh-snapshot targets.
  refresh-snapshot pg_dumps youpc-supabase prod (needs PALIAD_PROD_DATABASE_URL),
  strips pg16 \restrict commands for pg15 restore compat, and filters
  applied_migrations rows to this branch's max on-disk version.

- internal/db/testdata/README.md — explains the snapshot's purpose,
  refresh procedure, and how to verify locally.

- docs/cicd-runner-setup-2026-05-25.md — one-time admin steps for
  registering a Gitea Actions runner on mriver and wiring DOKPLOY_TOKEN
  as a repo secret. Documents soft-launch plan per m's Q11.4 (keep
  Dokploy's autoDeploy=true webhook alive for one week, disable after
  the workflow has gated 5 successful deploys).

Build clean. Full go test ./internal/... ./cmd/... green without
TEST_DATABASE_URL. With TEST_DATABASE_URL + TEST_APP_DATABASE_URL set
to a supabase/postgres scratch + snapshot restored:
TestMigrations_NoDuplicateSlot, TestMigrations_EndToEndAsAppRole,
TestBootSmoke, TestHealthReady_Live all pass. Live-DB service tests in
internal/services/* fail under supabase/postgres 15.8 with a 42P08
parameter-binding error (unrelated to Slice A — tracked as a follow-up).
2026-05-25 17:42:06 +02:00

256 lines
8.5 KiB
Go

// Boot smoke test — assert paliad reaches a serving state.
//
// Three checks against TEST_DATABASE_URL:
//
// 1. db.ApplyMigrations does not panic and returns nil.
// 2. paliad.applied_migrations covers every on-disk *.up.sql — no
// migration was silently skipped, no version is missing. The set
// contract is stronger than the old single-counter check: applied
// set must EQUAL on-disk set, not just reach the max version.
// 3. The handler mux (with /healthz mounted) responds 200 to GET /healthz.
//
// This is the lightweight cousin of the migration dry-run gate
// (internal/db/migrate_test.go): the dry-run catches per-migration syntax
// errors before merge; this smoke confirms the apply+bind path the
// container actually runs at boot. Together they cover the mig-098 /
// mig-099 class of crash-loops end-to-end, plus the mig-103 parallel-merge
// skip-hole that t-paliad-218 closed (m/paliad#44).
//
// Skipped without TEST_DATABASE_URL — matches the rest of the live-DB tests.
//
// Design: docs/design-paliad-test-strategy-2026-05-19.md §5 Slice 1 and
// docs/design-migration-runner-applied-set-2026-05-20.md §6.
package main
import (
"database/sql"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"sort"
"strconv"
"strings"
"testing"
_ "github.com/lib/pq"
"mgit.msbls.de/m/paliad/internal/auth"
"mgit.msbls.de/m/paliad/internal/db"
"mgit.msbls.de/m/paliad/internal/handlers"
)
func TestBootSmoke(t *testing.T) {
url := os.Getenv("TEST_DATABASE_URL")
if url == "" {
t.Skip("TEST_DATABASE_URL not set — skipping boot smoke")
}
// (1) Apply migrations end-to-end. The same code path the prod
// container runs at boot before `http.ListenAndServe`. A regression
// like mig-098's digit-regex would surface here as a non-nil error.
if err := db.ApplyMigrations(url); err != nil {
t.Fatalf("db.ApplyMigrations: %v", err)
}
// (2) Assert the applied set equals the on-disk set. The new runner
// tracks applied state per-migration; a silently-skipped version
// would surface as a row missing from paliad.applied_migrations even
// though max(version) matches. Comparing sets — not just max —
// catches the failure mode the t-paliad-218 post-mortem documented.
onDisk := embeddedMigrationVersions(t)
applied := appliedMigrationVersions(t, url)
if missing := setDiff(onDisk, applied); len(missing) > 0 {
t.Errorf("paliad.applied_migrations missing %d on-disk versions: %v "+
"(a migration was skipped — investigate before deploying)",
len(missing), missing)
}
if extra := setDiff(applied, onDisk); len(extra) > 0 {
t.Errorf("paliad.applied_migrations has %d versions with no on-disk file: %v "+
"(orphan rows — either restore the file or DELETE the row)",
len(extra), extra)
}
// (3) Mount the public handlers (the same Register call main() makes,
// minus the DB-backed Services bundle which the /healthz route doesn't
// need) and assert /healthz returns 200. This is the bind-and-serve
// half of the smoke: catches a regression that would make /healthz
// 404 or break the mux registration order.
//
// We deliberately do not boot the full main() — that would require
// SUPABASE_URL, SUPABASE_ANON_KEY, SUPABASE_JWT_SECRET, an open
// listening socket and a real auth client. The /healthz handler is
// auth-independent by design, and Register registers it on the outer
// mux before any DB-backed route, so this minimal setup exercises the
// exact code path main() takes.
mux := http.NewServeMux()
authClient := auth.NewClient("https://test.invalid", "anon-key", []byte("test-secret"))
handlers.Register(mux, authClient, "", nil)
rec := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/healthz", nil)
mux.ServeHTTP(rec, req)
if rec.Code != http.StatusOK {
t.Errorf("GET /healthz: status=%d, body=%q; want 200 OK", rec.Code, rec.Body.String())
}
if body := strings.TrimSpace(rec.Body.String()); body != "ok" {
t.Errorf("GET /healthz: body=%q; want \"ok\"", body)
}
// (4) Readiness probe. With a nil Services bundle the endpoint MUST
// report 503 — that's the contract documented in handlers/handlers.go.
// A separate svc-with-Pool case is exercised in TestHealthReady (live).
rec = httptest.NewRecorder()
req = httptest.NewRequest(http.MethodGet, "/health/ready", nil)
mux.ServeHTTP(rec, req)
if rec.Code != http.StatusServiceUnavailable {
t.Errorf("GET /health/ready (nil svc): status=%d; want 503", rec.Code)
}
}
// TestHealthReady_Live asserts the readiness probe answers 200 when the
// pool is reachable, 503 when it isn't. Requires TEST_DATABASE_URL.
//
// Why a separate test: TestBootSmoke runs Register with svc=nil to keep
// its setup minimal; the pool-reachable path needs the pool wired in
// through svc.Pool. Two tests, two assertions, no entanglement.
func TestHealthReady_Live(t *testing.T) {
url := os.Getenv("TEST_DATABASE_URL")
if url == "" {
t.Skip("TEST_DATABASE_URL not set — skipping live readiness probe")
}
if err := db.ApplyMigrations(url); err != nil {
t.Fatalf("db.ApplyMigrations: %v", err)
}
pool, err := db.OpenPool(url)
if err != nil {
t.Fatalf("open pool: %v", err)
}
mux := http.NewServeMux()
authClient := auth.NewClient("https://test.invalid", "anon-key", []byte("test-secret"))
handlers.Register(mux, authClient, "", &handlers.Services{Pool: pool})
rec := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/health/ready", nil)
mux.ServeHTTP(rec, req)
if rec.Code != http.StatusOK {
t.Errorf("GET /health/ready (live pool): status=%d, body=%q; want 200", rec.Code, rec.Body.String())
}
if body := strings.TrimSpace(rec.Body.String()); body != "ready" {
t.Errorf("GET /health/ready (live pool): body=%q; want \"ready\"", body)
}
}
// embeddedMigrationVersions returns every N where N_*.up.sql exists in
// internal/db/migrations/ on disk. The boot smoke compares this set
// against paliad.applied_migrations to detect skipped or orphan
// migrations.
//
// Read from disk (not the embed.FS inside the db package — it's unexported)
// since the test runs from the repo. The two views must agree for the
// build to be self-consistent; if they diverge, the smoke test is the
// wrong place to learn about it (the build is). We trust them to match.
func embeddedMigrationVersions(t *testing.T) []int {
t.Helper()
root, err := repoRoot()
if err != nil {
t.Fatalf("locate repo root: %v", err)
}
dir := filepath.Join(root, "internal", "db", "migrations")
entries, err := os.ReadDir(dir)
if err != nil {
t.Fatalf("read migrations dir %s: %v", dir, err)
}
var versions []int
for _, e := range entries {
name := e.Name()
if !strings.HasSuffix(name, ".up.sql") {
continue
}
base := strings.TrimSuffix(name, ".up.sql")
underscore := strings.IndexByte(base, '_')
if underscore <= 0 {
continue
}
v, err := strconv.Atoi(base[:underscore])
if err != nil {
continue
}
versions = append(versions, v)
}
if len(versions) == 0 {
t.Fatalf("no *.up.sql files found in %s", dir)
}
sort.Ints(versions)
return versions
}
// appliedMigrationVersions reads paliad.applied_migrations and returns
// the sorted list of versions. Fails the test if the table doesn't exist —
// db.ApplyMigrations is supposed to have created it by this point.
func appliedMigrationVersions(t *testing.T, url string) []int {
t.Helper()
conn, err := sql.Open("postgres", url)
if err != nil {
t.Fatalf("open: %v", err)
}
defer conn.Close()
rows, err := conn.Query(`SELECT version FROM paliad.applied_migrations ORDER BY version`)
if err != nil {
t.Fatalf("read applied_migrations: %v", err)
}
defer rows.Close()
var out []int
for rows.Next() {
var v int
if err := rows.Scan(&v); err != nil {
t.Fatalf("scan: %v", err)
}
out = append(out, v)
}
if err := rows.Err(); err != nil {
t.Fatalf("rows: %v", err)
}
return out
}
// setDiff returns the elements of a that are not in b. Inputs are sorted
// ascending; output preserves that ordering.
func setDiff(a, b []int) []int {
bset := make(map[int]bool, len(b))
for _, v := range b {
bset[v] = true
}
var out []int
for _, v := range a {
if !bset[v] {
out = append(out, v)
}
}
return out
}
// repoRoot walks upward from the test binary's working directory until it
// finds a go.mod. `go test` runs in the package dir, so we typically have
// to climb a couple of levels.
func repoRoot() (string, error) {
dir, err := os.Getwd()
if err != nil {
return "", err
}
for {
if _, err := os.Stat(filepath.Join(dir, "go.mod")); err == nil {
return dir, nil
}
parent := filepath.Dir(dir)
if parent == dir {
return "", os.ErrNotExist
}
dir = parent
}
}