Mastodon v4.5.x regression sweep — baseline-drift over a release line
Date: 2026-05-21. Rigor: v0.1.8 (master, [Unreleased]).
Target: mastodon/mastodon, 16 tags v4.5.0-beta.1 → v4.5.10.
Validate two things against a real “normal development flow”:
- How much does a project’s Rigor error count grow as ordinary development proceeds, when a baseline is taken once and frozen?
- How realistic is the
rigor-project-initacknowledge-mode environment — does it behave the way ADR-22 promises (adopt once, surface only regressions)?
Method: the rigor-regression-sweep
procedure — baseline at the first tag, then rigor check every
later tag against that frozen baseline + a frozen config.
- Blobless clone of
mastodon/mastodon; 16 tags in release order:v4.5.0-beta.1, -beta.2, -rc.1, -rc.2, -rc.3, v4.5.0, v4.5.1 … v4.5.10. - Frozen config (held identical across every tag — the delta is
then attributable to Mastodon’s code, not config):
paths: [app, lib],exclude: [vendor, tmp],severity_profile: lenient,signature_paths:→ therigor-activesupport-core-extsig/bundle (absolute path). Therigor-*plugin gems are omitted — not RubyGems-published in v0.1.x (ROADMAP § “Out of scope today”), so a faithful external-user config cannot include them yet. - Baseline generated at
v4.5.0-beta.1: 30 diagnostics / 25 buckets (rule-ID mode). app + libscope: 1,218.rbfiles at beta.1, 1,219 at v4.5.10.
Result — the error-increase curve is flat at zero
Section titled “Result — the error-increase curve is flat at zero”| Tag | raw | silenced | surfaced |
|---|---|---|---|
| v4.5.0-beta.1 | 30 | 30 | 0 |
| v4.5.0-beta.2 | 30 | 30 | 0 |
| v4.5.0-rc.1 … -rc.3 | 30 | 30 | 0 |
| v4.5.0 | 30 | 30 | 0 |
| v4.5.1 … v4.5.10 | 30 | 30 | 0 |
Every tag: surfaced = 0. The frozen baseline absorbed the entire
release line — normal development across beta.1 → 10 patch
releases introduced zero new Rigor diagnostics in the analysed
scope, and removed none of the 30 baselined ones.
This is not a no-op artefact. Real churn occurred over the window:
- 119
.rbfiles changed inapp + lib(git diff v4.5.0-beta.1 v4.5.10: 603 insertions / 265 deletions; 762 files / 17.3k insertions repo-wide). - 6 of the diagnostic-bearing baselined files were edited —
activitypub/activity/create.rb,linked_data_signature.rb,feed_manager.rb,signature_parser.rb,account.rb,cli/statuses.rb— and their(file, rule, count)buckets still matched. Line moves did not break the baseline. - A cold
--no-cache --no-baselinerun at v4.5.10 independently confirms raw = 30 (9 error / 12 warning / 9 info), ruling out cache masking.
v4.5.10 raw composition (unchanged from beta.1)
Section titled “v4.5.10 raw composition (unchanged from beta.1)”| Severity | Rules |
|---|---|
| error ×9 | call.undefined-method ×9 |
| warning ×12 | call.possible-nil-receiver ×9, call.argument-type-mismatch ×3 |
| info ×9 | flow.always-truthy-condition ×8, rbs.coverage.missing-gem ×1 |
The 8 flow.always-truthy-condition are exactly the cluster-4 G1/G2
flow-folding false positives triaged in
20260521-mastodon-cluster4-flow-folding-triage.md
— still present, still queued, unchanged across the line.
Verdict
Section titled “Verdict”- ADR-22 acknowledge mode is empirically validated. A project
that adopted Rigor at
v4.5.0-beta.1would have sailed through the entire v4.5.x line — 10 patch releases — with zero baseline maintenance and zero false CI failures. That is precisely the “adopt the current state; ordinary coding does not increase errors” contract therigor-project-initacknowledge mode sells. - The
(file, rule, count)granularity is refactor-robust in practice. Six diagnostic-bearing files were edited without breaking their buckets — the WD1 line-move-robustness claim holds on real churn. rigor-project-init’s workflow shape is sound for a Rails project of this size; the surfaced-count metric is meaningful and stable.
Caveats / limits of this run
Section titled “Caveats / limits of this run”- Released tags are a post-spec-gate population — this sweep
measures baseline stability, not Rigor’s bug-detection. Every
tag here is post-CI: a mature project’s spec suite catches obvious
errors before merge, and genuine bugs are fixed before a tag is
cut. So
surfaced = 0over a released-tag line is the expected, healthy result — the bugs Rigor would catch are the same class the specs already removed. This run strongly validates acknowledge-mode stability (no false regressions from maintenance churn) but says nothing about whether Rigor catches new bugs; that holds for any released-tag range, feature-spanning or not. Testing detection needs sampling finer than release tags — per-commit onmain, PR-head commits, or bug-introducing commits (the commit before a known fix). Therigor-regression-sweepSKILL § “Phase 1” records this as the headline sampling guidance. - Config omits the
rigor-*plugins (not published in v0.1.x). With the Rails plugin set active the absolute counts would differ; the delta methodology is unaffected (config stays frozen). severity_profile: lenientdowngradespossible-nil-receiver/argument-type-mismatch/flow.*; the 9error-severity diagnostics are allcall.undefined-method.- Single project, single release line. The SKILL exists so this becomes a repeatable corpus.
Reproduction
Section titled “Reproduction”~/repo/ruby/rigor-survey/_mastodon-sweep/ holds sweep.sh,
tabulate.rb, the frozen baseline.yml, and reports/<tag>.json.
The procedure is the rigor-regression-sweep
SKILL.
© 2026 TypedDuck. Licensed under CC BY-SA 4.0.