Adds .gitea/workflows/test.yaml that gates every push on `go build`, `bun run build`, `go vet`, the migration coordination check, and the role-split end-to-end migration smoke. On push to main + green, calls Dokploy's compose.deploy API and polls /health/ready until 200. t-paliad-282 / m/paliad#114. Design: docs/design-cicd-pre-deploy-gate-2026-05-25.md (inventor shift on mai/cronus/inventor-ci-cd-pre). Catches all three of today's outage classes: brunel (~13:20) slot collision -> TestMigrations_NoDuplicateSlot hermes (~16:05) dropped-col refs -> TestBootSmoke mig 129 (~14:56) 42501 ownership -> TestMigrations_EndToEndAsAppRole Snapshot approach. internal/db/testdata/prod-snapshot.sql is a pg_dump of youpc-supabase paliad schema + applied_migrations rows. CI restores this into a fresh `supabase/postgres:15.8.1.060` (same image, same role topology as prod) and runs ApplyMigrations as the `postgres` role (which is NOT a superuser on supabase/postgres, matching prod). Existing migrations are skipped (already in applied_migrations); only NEW migs from the PR run end-to-end. This sidesteps the fresh-DB idempotence debt in some historical migrations (mig 037 missing pg_trgm, mig 051 inner COMMIT) — those are tracked separately and don't block the gate. Sub-changes: - internal/handlers/handlers.go — new /health/ready endpoint distinct from /healthz. /healthz stays liveness (process alive, no DB); /ready is readiness (DB pool pings within 2 s). Returns 503 when svc or pool is nil (DB-less deploys are intentionally not-ready). svc.Pool added to handlers.Services, wired in cmd/server/main.go. - internal/db/migrate_test.go — TestMigrations_NoDuplicateSlot (pure unit, catches brunel) and TestMigrations_EndToEndAsAppRole (snapshot- gated, catches the 42501 class). - cmd/server/main_smoke_test.go — TestBootSmoke now also asserts /health/ready returns 503 with a nil svc. New TestHealthReady_Live asserts 200 against a live pool. - internal/db/migrations/024_rename_department_columns.up.sql and 027_rename_to_partner_units.up.sql — ALTER INDEX / ALTER POLICY exception handlers now catch undefined_object OR undefined_table OR duplicate_object. Old handler only caught undefined_object; Postgres raises undefined_table when source object never existed, and duplicate_object when destination already exists. The expanded handlers make these migrations truly idempotent across all plausible starting states. - Makefile — verify-mig-app, test-frontend, refresh-snapshot targets. refresh-snapshot pg_dumps youpc-supabase prod (needs PALIAD_PROD_DATABASE_URL), strips pg16 \restrict commands for pg15 restore compat, and filters applied_migrations rows to this branch's max on-disk version. - internal/db/testdata/README.md — explains the snapshot's purpose, refresh procedure, and how to verify locally. - docs/cicd-runner-setup-2026-05-25.md — one-time admin steps for registering a Gitea Actions runner on mriver and wiring DOKPLOY_TOKEN as a repo secret. Documents soft-launch plan per m's Q11.4 (keep Dokploy's autoDeploy=true webhook alive for one week, disable after the workflow has gated 5 successful deploys). Build clean. Full go test ./internal/... ./cmd/... green without TEST_DATABASE_URL. With TEST_DATABASE_URL + TEST_APP_DATABASE_URL set to a supabase/postgres scratch + snapshot restored: TestMigrations_NoDuplicateSlot, TestMigrations_EndToEndAsAppRole, TestBootSmoke, TestHealthReady_Live all pass. Live-DB service tests in internal/services/* fail under supabase/postgres 15.8 with a 42P08 parameter-binding error (unrelated to Slice A — tracked as a follow-up).
197 lines
7.6 KiB
Go
197 lines
7.6 KiB
Go
// Package db tests — migration dry-run gate.
|
|
//
|
|
// This is the test that catches mig-N crash-loops before they reach prod.
|
|
// The new runner tracks applied state as a set in paliad.applied_migrations
|
|
// (one row per migration; see migrate.go). A migration that compiles cleanly
|
|
// but fails on apply (typo, missing column, wrong CHECK shape) crashes the
|
|
// Dokploy container loop before paliad.de finishes binding :8080, and the
|
|
// only way to learn about it today is to watch the deploy log.
|
|
//
|
|
// TestMigrations_DryRun closes that gap: for every *.up.sql in this
|
|
// directory whose version is NOT present in paliad.applied_migrations on
|
|
// the scratch DB, it opens a transaction, runs the SQL, and ROLLBACKs.
|
|
// Any error fails the test with the file name + Postgres error. Always
|
|
// non-destructive — the ROLLBACK runs even on success, so the scratch DB
|
|
// stays at its starting set.
|
|
//
|
|
// "Pending" means: a version that's on disk but not in applied_migrations.
|
|
// In CI against a fresh scratch DB (where applied_migrations either
|
|
// doesn't exist or is empty), every migration is pending and gets
|
|
// verified. On a developer laptop whose scratch DB is already at HEAD,
|
|
// no migrations are pending and the test logs and passes — the protection
|
|
// only kicks in the moment a new *.up.sql lands in the tree before the
|
|
// developer runs `db.ApplyMigrations` against the same scratch DB.
|
|
//
|
|
// Requires TEST_DATABASE_URL (same pattern as the rest of the live-DB
|
|
// tests). Skipped without it.
|
|
//
|
|
// Design: docs/design-paliad-test-strategy-2026-05-19.md §5 Slice 1 and
|
|
// docs/design-migration-runner-applied-set-2026-05-20.md §6.
|
|
|
|
package db
|
|
|
|
import (
|
|
"database/sql"
|
|
"fmt"
|
|
"os"
|
|
"strings"
|
|
"testing"
|
|
|
|
_ "github.com/lib/pq"
|
|
)
|
|
|
|
// TestMigrations_DryRun walks every pending *.up.sql in numeric order,
|
|
// applies each inside its own BEGIN/ROLLBACK against the scratch DB, and
|
|
// fails the test on the first SQL error. Reports per-file as a sub-test so
|
|
// `go test -v` shows which migration failed.
|
|
func TestMigrations_DryRun(t *testing.T) {
|
|
url := os.Getenv("TEST_DATABASE_URL")
|
|
if url == "" {
|
|
t.Skip("TEST_DATABASE_URL not set — skipping migration dry-run")
|
|
}
|
|
|
|
conn, err := sql.Open("postgres", url)
|
|
if err != nil {
|
|
t.Fatalf("open: %v", err)
|
|
}
|
|
defer conn.Close()
|
|
if err := conn.Ping(); err != nil {
|
|
t.Fatalf("ping: %v", err)
|
|
}
|
|
|
|
// The paliad schema must exist before migration 001 runs against it,
|
|
// mirroring the bootstrap step in ApplyMigrations. Without this, a
|
|
// fresh scratch DB would fail migration 001's CREATE TABLE paliad.*
|
|
// statements inside the BEGIN/ROLLBACK probe with "schema paliad does
|
|
// not exist" — a false negative that distracts from real errors.
|
|
if _, err := conn.Exec(`CREATE SCHEMA IF NOT EXISTS paliad`); err != nil {
|
|
t.Fatalf("ensure paliad schema: %v", err)
|
|
}
|
|
|
|
applied, err := readAppliedVersions(conn)
|
|
if err != nil {
|
|
t.Fatalf("read applied_migrations: %v", err)
|
|
}
|
|
|
|
onDisk, err := scanEmbeddedMigrations()
|
|
if err != nil {
|
|
t.Fatalf("scan embedded migrations: %v", err)
|
|
}
|
|
|
|
var pending []migration
|
|
for _, m := range onDisk {
|
|
if !applied[m.version] {
|
|
pending = append(pending, m)
|
|
}
|
|
}
|
|
|
|
if len(pending) == 0 {
|
|
t.Logf("no pending migrations — scratch DB applied set covers every on-disk version (%d total)",
|
|
len(onDisk))
|
|
return
|
|
}
|
|
t.Logf("scratch DB has %d/%d on-disk migrations applied; walking %d pending",
|
|
len(applied), len(onDisk), len(pending))
|
|
|
|
for _, m := range pending {
|
|
t.Run(fmt.Sprintf("%03d_%s", m.version, m.name), func(t *testing.T) {
|
|
body, err := migrationFS.ReadFile("migrations/" + m.filename)
|
|
if err != nil {
|
|
t.Fatalf("read %s: %v", m.filename, err)
|
|
}
|
|
tx, err := conn.Begin()
|
|
if err != nil {
|
|
t.Fatalf("begin: %v", err)
|
|
}
|
|
// Always rollback; the dry-run must not leave the scratch
|
|
// DB at a different applied set than where it started.
|
|
// Rollback is safe after a failed Exec — Postgres aborts
|
|
// the transaction internally on the first error.
|
|
defer func() { _ = tx.Rollback() }()
|
|
|
|
if _, err := tx.Exec(string(body)); err != nil {
|
|
t.Fatalf("migration %s failed dry-run: %v", m.filename, err)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// TestMigrations_NoDuplicateSlot is a free-standing pre-flight check that
|
|
// scanEmbeddedMigrations refuses to walk a tree where two *.up.sql files
|
|
// claim the same NNN slot. This is the brunel-slot-collision class of
|
|
// outage (m/paliad#114, 2026-05-25 ~13:20): a worker writes a migration
|
|
// at slot N while another shipped slot N from a separate branch, both
|
|
// merge, both end up in the embed.FS, and the runner refuses to start.
|
|
//
|
|
// Catching this at CI time (no DB needed) lets the second PR fail before
|
|
// it merges, instead of breaking prod at the next deploy. Pure unit test;
|
|
// runs even on developer laptops that don't set TEST_DATABASE_URL.
|
|
func TestMigrations_NoDuplicateSlot(t *testing.T) {
|
|
if _, err := scanEmbeddedMigrations(); err != nil {
|
|
t.Fatalf("scanEmbeddedMigrations: %v "+
|
|
"(two migrations share the same NNN slot — coordinate with head "+
|
|
"and rename one of them before merging)", err)
|
|
}
|
|
}
|
|
|
|
// TestMigrations_EndToEndAsAppRole applies every embedded migration in
|
|
// numeric order against a scratch DB connected as a NON-SUPERUSER role.
|
|
// This is the prod-shape smoke that the per-mig BEGIN/ROLLBACK dry-run
|
|
// (TestMigrations_DryRun) cannot deliver: the dry-run runs each
|
|
// statement in isolation and rolls back, so it cannot reproduce the
|
|
// mig-129-class outage (m/paliad#114, 2026-05-25 ~14:56 — pq: must be
|
|
// owner of table project_event_choices, SQLSTATE 42501) where a
|
|
// migration assumes ownership the deploy role doesn't have.
|
|
//
|
|
// Requires TEST_APP_DATABASE_URL — a Postgres URL whose role is NOT a
|
|
// superuser and does NOT own the `paliad` schema (m's Q11.2 pick:
|
|
// generic two-role model, see docs/design-cicd-pre-deploy-gate-2026-05-25.md
|
|
// §6.2(a)). The CI workflow creates the role + schema split before
|
|
// invoking the test; a developer who wants to reproduce the gate locally
|
|
// runs the same SQL preamble (see Makefile target `verify-migrations`).
|
|
//
|
|
// Skipped without TEST_APP_DATABASE_URL — keeps `go test ./...` green
|
|
// on machines that haven't set up the role split.
|
|
func TestMigrations_EndToEndAsAppRole(t *testing.T) {
|
|
url := os.Getenv("TEST_APP_DATABASE_URL")
|
|
if url == "" {
|
|
t.Skip("TEST_APP_DATABASE_URL not set — skipping role-split end-to-end migration smoke")
|
|
}
|
|
if err := ApplyMigrations(url); err != nil {
|
|
t.Fatalf("ApplyMigrations as app role failed: %v "+
|
|
"(a migration assumes more privilege than the deploy role has — "+
|
|
"common cases: ALTER TABLE on a schema-owner table, CREATE EXTENSION "+
|
|
"without grants, SET ROLE without permission. Fix the migration to "+
|
|
"work as the deploy role, or arrange for the schema to be owned by "+
|
|
"the deploy role)", err)
|
|
}
|
|
}
|
|
|
|
// readAppliedVersions returns the set of versions present in
|
|
// paliad.applied_migrations on the scratch DB. Missing table → empty set
|
|
// (fresh-DB path; the table only exists after the runner has been called).
|
|
//
|
|
// We don't pre-create the table here because the dry-run is supposed to be
|
|
// a passive observer — it must not mutate the scratch DB outside of its
|
|
// own per-mig BEGIN/ROLLBACK probes. A "table doesn't exist" outcome is
|
|
// the right read against a virgin scratch DB.
|
|
func readAppliedVersions(conn *sql.DB) (map[int]bool, error) {
|
|
rows, err := conn.Query(`SELECT version FROM paliad.applied_migrations`)
|
|
if err != nil {
|
|
if strings.Contains(err.Error(), "does not exist") {
|
|
return map[int]bool{}, nil
|
|
}
|
|
return nil, err
|
|
}
|
|
defer rows.Close()
|
|
out := map[int]bool{}
|
|
for rows.Next() {
|
|
var v int
|
|
if err := rows.Scan(&v); err != nil {
|
|
return nil, err
|
|
}
|
|
out[v] = true
|
|
}
|
|
return out, rows.Err()
|
|
}
|