Mastodon v4.5.x regression sweep — baseline-drift over a release line

Date: 2026-05-21. Rigor: v0.1.8 (master, [Unreleased]). Target: mastodon/mastodon, 16 tags v4.5.0-beta.1 → v4.5.10.

Goal

Validate two things against a real “normal development flow”:

How much does a project’s Rigor error count grow as ordinary development proceeds, when a baseline is taken once and frozen?
How realistic is the rigor-project-init acknowledge-mode environment — does it behave the way ADR-22 promises (adopt once, surface only regressions)?

Method: the rigor-regression-sweep procedure — baseline at the first tag, then rigor check every later tag against that frozen baseline + a frozen config.

Setup

Blobless clone of mastodon/mastodon; 16 tags in release order: v4.5.0-beta.1, -beta.2, -rc.1, -rc.2, -rc.3, v4.5.0, v4.5.1 … v4.5.10.
Frozen config (held identical across every tag — the delta is then attributable to Mastodon’s code, not config): paths: [app, lib], exclude: [vendor, tmp], severity_profile: lenient, signature_paths: → the rigor-activesupport-core-ext sig/ bundle (absolute path). The rigor-* plugin gems are omitted — not RubyGems-published in v0.1.x (ROADMAP § “Out of scope today”), so a faithful external-user config cannot include them yet.
Baseline generated at v4.5.0-beta.1: 30 diagnostics / 25 buckets (rule-ID mode).
app + lib scope: 1,218 .rb files at beta.1, 1,219 at v4.5.10.

Result — the error-increase curve is flat at zero

Tag	raw	silenced
v4.5.0-beta.1	30	30
v4.5.0-beta.2	30	30
v4.5.0-rc.1 … -rc.3	30	30
v4.5.0	30	30
v4.5.1 … v4.5.10	30	30

Every tag: surfaced = 0. The frozen baseline absorbed the entire release line — normal development across beta.1 → 10 patch releases introduced zero new Rigor diagnostics in the analysed scope, and removed none of the 30 baselined ones.

This is not a no-op artefact. Real churn occurred over the window:

119 .rb files changed in app + lib (git diff v4.5.0-beta.1 v4.5.10: 603 insertions / 265 deletions; 762 files / 17.3k insertions repo-wide).
6 of the diagnostic-bearing baselined files were edited — activitypub/activity/create.rb, linked_data_signature.rb, feed_manager.rb, signature_parser.rb, account.rb, cli/statuses.rb — and their (file, rule, count) buckets still matched. Line moves did not break the baseline.
A cold --no-cache --no-baseline run at v4.5.10 independently confirms raw = 30 (9 error / 12 warning / 9 info), ruling out cache masking.

v4.5.10 raw composition (unchanged from beta.1)

Severity	Rules
error ×9	`call.undefined-method` ×9
warning ×12	`call.possible-nil-receiver` ×9, `call.argument-type-mismatch` ×3
info ×9	`flow.always-truthy-condition` ×8, `rbs.coverage.missing-gem` ×1

The 8 flow.always-truthy-condition are exactly the cluster-4 G1/G2 flow-folding false positives triaged in 20260521-mastodon-cluster4-flow-folding-triage.md — still present, still queued, unchanged across the line.

Verdict

ADR-22 acknowledge mode is empirically validated. A project that adopted Rigor at v4.5.0-beta.1 would have sailed through the entire v4.5.x line — 10 patch releases — with zero baseline maintenance and zero false CI failures. That is precisely the “adopt the current state; ordinary coding does not increase errors” contract the rigor-project-init acknowledge mode sells.
The (file, rule, count) granularity is refactor-robust in practice. Six diagnostic-bearing files were edited without breaking their buckets — the WD1 line-move-robustness claim holds on real churn.
rigor-project-init’s workflow shape is sound for a Rails project of this size; the surfaced-count metric is meaningful and stable.

Caveats / limits of this run

Released tags are a post-spec-gate population — this sweep measures baseline stability, not Rigor’s bug-detection. Every tag here is post-CI: a mature project’s spec suite catches obvious errors before merge, and genuine bugs are fixed before a tag is cut. So surfaced = 0 over a released-tag line is the expected, healthy result — the bugs Rigor would catch are the same class the specs already removed. This run strongly validates acknowledge-mode stability (no false regressions from maintenance churn) but says nothing about whether Rigor catches new bugs; that holds for any released-tag range, feature-spanning or not. Testing detection needs sampling finer than release tags — per-commit on main, PR-head commits, or bug-introducing commits (the commit before a known fix). The rigor-regression-sweep SKILL § “Phase 1” records this as the headline sampling guidance.
Config omits the rigor-* plugins (not published in v0.1.x). With the Rails plugin set active the absolute counts would differ; the delta methodology is unaffected (config stays frozen).
severity_profile: lenient downgrades possible-nil-receiver / argument-type-mismatch / flow.*; the 9 error-severity diagnostics are all call.undefined-method.
Single project, single release line. The SKILL exists so this becomes a repeatable corpus.

Reproduction

~/repo/ruby/rigor-survey/_mastodon-sweep/ holds sweep.sh, tabulate.rb, the frozen baseline.yml, and reports/<tag>.json. The procedure is the rigor-regression-sweep SKILL.