Skip to content

Mastodon v4.5.x regression sweep — baseline-drift over a release line

Date: 2026-05-21. Rigor: v0.1.8 (master, [Unreleased]). Target: mastodon/mastodon, 16 tags v4.5.0-beta.1v4.5.10.

Validate two things against a real “normal development flow”:

  1. How much does a project’s Rigor error count grow as ordinary development proceeds, when a baseline is taken once and frozen?
  2. How realistic is the rigor-project-init acknowledge-mode environment — does it behave the way ADR-22 promises (adopt once, surface only regressions)?

Method: the rigor-regression-sweep procedure — baseline at the first tag, then rigor check every later tag against that frozen baseline + a frozen config.

  • Blobless clone of mastodon/mastodon; 16 tags in release order: v4.5.0-beta.1, -beta.2, -rc.1, -rc.2, -rc.3, v4.5.0, v4.5.1 … v4.5.10.
  • Frozen config (held identical across every tag — the delta is then attributable to Mastodon’s code, not config): paths: [app, lib], exclude: [vendor, tmp], severity_profile: lenient, signature_paths: → the rigor-activesupport-core-ext sig/ bundle (absolute path). The rigor-* plugin gems are omitted — not RubyGems-published in v0.1.x (ROADMAP § “Out of scope today”), so a faithful external-user config cannot include them yet.
  • Baseline generated at v4.5.0-beta.1: 30 diagnostics / 25 buckets (rule-ID mode).
  • app + lib scope: 1,218 .rb files at beta.1, 1,219 at v4.5.10.

Result — the error-increase curve is flat at zero

Section titled “Result — the error-increase curve is flat at zero”
Tagrawsilencedsurfaced
v4.5.0-beta.130300
v4.5.0-beta.230300
v4.5.0-rc.1 … -rc.330300
v4.5.030300
v4.5.1 … v4.5.1030300

Every tag: surfaced = 0. The frozen baseline absorbed the entire release line — normal development across beta.1 → 10 patch releases introduced zero new Rigor diagnostics in the analysed scope, and removed none of the 30 baselined ones.

This is not a no-op artefact. Real churn occurred over the window:

  • 119 .rb files changed in app + lib (git diff v4.5.0-beta.1 v4.5.10: 603 insertions / 265 deletions; 762 files / 17.3k insertions repo-wide).
  • 6 of the diagnostic-bearing baselined files were editedactivitypub/activity/create.rb, linked_data_signature.rb, feed_manager.rb, signature_parser.rb, account.rb, cli/statuses.rb — and their (file, rule, count) buckets still matched. Line moves did not break the baseline.
  • A cold --no-cache --no-baseline run at v4.5.10 independently confirms raw = 30 (9 error / 12 warning / 9 info), ruling out cache masking.

v4.5.10 raw composition (unchanged from beta.1)

Section titled “v4.5.10 raw composition (unchanged from beta.1)”
SeverityRules
error ×9call.undefined-method ×9
warning ×12call.possible-nil-receiver ×9, call.argument-type-mismatch ×3
info ×9flow.always-truthy-condition ×8, rbs.coverage.missing-gem ×1

The 8 flow.always-truthy-condition are exactly the cluster-4 G1/G2 flow-folding false positives triaged in 20260521-mastodon-cluster4-flow-folding-triage.md — still present, still queued, unchanged across the line.

  1. ADR-22 acknowledge mode is empirically validated. A project that adopted Rigor at v4.5.0-beta.1 would have sailed through the entire v4.5.x line — 10 patch releases — with zero baseline maintenance and zero false CI failures. That is precisely the “adopt the current state; ordinary coding does not increase errors” contract the rigor-project-init acknowledge mode sells.
  2. The (file, rule, count) granularity is refactor-robust in practice. Six diagnostic-bearing files were edited without breaking their buckets — the WD1 line-move-robustness claim holds on real churn.
  3. rigor-project-init’s workflow shape is sound for a Rails project of this size; the surfaced-count metric is meaningful and stable.
  • Released tags are a post-spec-gate population — this sweep measures baseline stability, not Rigor’s bug-detection. Every tag here is post-CI: a mature project’s spec suite catches obvious errors before merge, and genuine bugs are fixed before a tag is cut. So surfaced = 0 over a released-tag line is the expected, healthy result — the bugs Rigor would catch are the same class the specs already removed. This run strongly validates acknowledge-mode stability (no false regressions from maintenance churn) but says nothing about whether Rigor catches new bugs; that holds for any released-tag range, feature-spanning or not. Testing detection needs sampling finer than release tags — per-commit on main, PR-head commits, or bug-introducing commits (the commit before a known fix). The rigor-regression-sweep SKILL § “Phase 1” records this as the headline sampling guidance.
  • Config omits the rigor-* plugins (not published in v0.1.x). With the Rails plugin set active the absolute counts would differ; the delta methodology is unaffected (config stays frozen).
  • severity_profile: lenient downgrades possible-nil-receiver / argument-type-mismatch / flow.*; the 9 error-severity diagnostics are all call.undefined-method.
  • Single project, single release line. The SKILL exists so this becomes a repeatable corpus.

~/repo/ruby/rigor-survey/_mastodon-sweep/ holds sweep.sh, tabulate.rb, the frozen baseline.yml, and reports/<tag>.json. The procedure is the rigor-regression-sweep SKILL.

© 2026 TypedDuck. Licensed under CC BY-SA 4.0.