Mastodon regression sweeps — re-run on Rigor v0.1.9
Date: 2026-05-23. Rigor: v0.1.9 (master, [Unreleased]).
Target: mastodon/mastodon.
Two sweeps:
- Patch line — 16 tags
v4.5.0-beta.1→v4.5.10, a re-run of the 2026-05-21 v0.1.8 sweep on the current engine. - Cross-version — 9 tags
v3.5.19→v4.5.10spanning five minor/major release lines (added 2026-05-23).
§§ 1–6 below cover sweep 1; § “Cross-version sweep” covers sweep 2.
Re-run the 2026-05-21 Mastodon v4.5.x sweep — same target, same 16 tags, same frozen config — against the current engine. The original run was on Rigor v0.1.8; the engine has since advanced to v0.1.9. This re-check answers one question the first run could not: does the v0.1.8 → v0.1.9 engine change itself perturb a real project’s diagnostic stream?
Method unchanged: the rigor-regression-sweep
procedure — baseline at the first tag, then rigor check every
later tag against that frozen baseline + frozen config.
- Same blobless clone of
mastodon/mastodon; same 16 tags in release order (v4.5.0-beta.1 … v4.5.10) — all verified present. - Same frozen config as the v0.1.8 run (held identical across
every tag):
paths: [app, lib],exclude: [vendor, tmp],severity_profile: lenient,signature_paths:→ therigor-activesupport-core-extsig/bundle (absolute path). Therigor-*plugin gems remain omitted (not RubyGems-published in v0.1.x). - Cache wiped before the run. The shared content-hashed cache from the v0.1.8 sweep was deleted so v0.1.9 recomputes every file from cold — a stale cross-engine-version entry cannot mask a changed result.
- Baseline regenerated at
v4.5.0-beta.1with v0.1.9: 30 diagnostics / 25 buckets (rule-ID mode) — byte-for-byte the same count the v0.1.8 run produced. app + libscope: 1,218.rbfiles at beta.1, 1,219 at v4.5.10 (unchanged from the v0.1.8 run).
Result — still flat at zero, identical to v0.1.8
Section titled “Result — still flat at zero, identical to v0.1.8”| Tag | raw | silenced | surfaced |
|---|---|---|---|
| v4.5.0-beta.1 | 30 | 30 | 0 |
| v4.5.0-beta.2 | 30 | 30 | 0 |
| v4.5.0-rc.1 … -rc.3 | 30 | 30 | 0 |
| v4.5.0 | 30 | 30 | 0 |
| v4.5.1 … v4.5.10 | 30 | 30 | 0 |
Every tag: surfaced = 0, raw = 30, silenced = 30. The v0.1.9
engine reproduces the v0.1.8 sweep exactly — same baseline
zero point, same flat-at-zero curve over all 16 tags.
Churn cross-check (re-measured, identical to the v0.1.8 run):
- 119
.rbfiles changed inapp + lib(git diff v4.5.0-beta.1 v4.5.10: 603 insertions / 265 deletions; 762 files / 17.3k insertions repo-wide). Real development moved across the window; the flat curve is not a no-op artefact.
v4.5.10 raw composition (cold-cache spot check)
Section titled “v4.5.10 raw composition (cold-cache spot check)”A cold --no-cache --no-baseline run at v4.5.10 on v0.1.9
independently confirms raw = 30, with a composition identical to
the v0.1.8 note:
| Severity | Rules |
|---|---|
| error ×9 | call.undefined-method ×9 |
| warning ×12 | call.possible-nil-receiver ×9, call.argument-type-mismatch ×3 |
| info ×9 | flow.always-truthy-condition ×8, rbs.coverage.missing-gem ×1 |
The 8 flow.always-truthy-condition are still the cluster-4 G1/G2
flow-folding false positives triaged in
20260521-mastodon-cluster4-flow-folding-triage.md
— unchanged by the v0.1.9 engine.
Verdict
Section titled “Verdict”- The v0.1.8 → v0.1.9 engine change is diagnostic-neutral on
this corpus. The baseline at beta.1, the per-tag surfaced
curve, and the cold v4.5.10 raw composition are all identical
across the two engine versions. Whatever shipped between v0.1.8
and v0.1.9 did not add, drop, or reword a single diagnostic in
Mastodon’s
app + libscope under the frozen config — a useful no-regression signal for the release line. - ADR-22 acknowledge mode remains empirically validated. A
project that adopted Rigor at
v4.5.0-beta.1would still sail through the entire v4.5.x line with zero baseline maintenance and zero false CI failures — and the conclusion now holds across two engine versions, not one. - The
(file, rule, count)baseline is engine-version-robust. A baseline generated on v0.1.8 and one generated on v0.1.9 cover the same 25 buckets; nothing in the rule-ID-mode keying drifted across the patch bump.
Caveats / limits of this run
Section titled “Caveats / limits of this run”- This re-run inherits every limit of the original. Released
tags are a post-spec-gate population, so
surfaced = 0over the line measures baseline stability, not Rigor’s bug-detection power. Testing detection still needs sampling finer than release tags (per-commit / PR-head / bug-introducing commits) — see the v0.1.8 note § Caveats and the SKILL § “Phase 1”. - Config still omits the
rigor-*plugins (not published in v0.1.x);severity_profile: lenientstill downgradespossible-nil-receiver/argument-type-mismatch/flow.*. - Single project, single release line — the corpus is still two data points (this run and the v0.1.8 one) over the same project.
Tooling note
Section titled “Tooling note”The first invocation failed: the rigor checkout’s vendor/bundle
held native extensions (json, prism, rbs) compiled against an
older Ruby ABI — dlopen … Symbol not found: _rb_cObject. Fixed
with bundle pristine inside the Nix dev shell, which recompiled
every native gem against the current Ruby 4.0.5. Worth a line in the
SKILL if the sweep is run after a Ruby bump; not a Rigor bug.
Cross-version sweep — v3.5.19 → v4.5.10
Section titled “Cross-version sweep — v3.5.19 → v4.5.10”A second sweep, run the same day on the same v0.1.9 engine, spans
five release lines: 9 tags v3.5.19, v4.0.15, v4.1.15, v4.2.7, v4.3.0-beta.1, v4.3.0, v4.4.0, v4.5.0, v4.5.10. Per the SKILL
§ “Phase 1” this is a feature-spanning released-tag range — it
measures whether new released code adds standing diagnostics the
baseline did not already cover, a different question from the
patch-line sweep’s baseline-stability one.
- Same frozen config shape; baseline taken once at v3.5.19 and
frozen. Separate artefact dir (
_mastodon-major-sweep/), cache wiped before the run. app + libscope grows substantially over the range: 875.rbfiles at v3.5.19 → 1,219 at v4.5.10.- One config delta from the patch sweep, forced by the data — see the parse-error-floor note below.
The parse-error floor — and the config fix
Section titled “The parse-error floor — and the config fix”The first cross-version run showed a constant 7 rule-less
error diagnostics at every tag from v3.5.19 through v4.1.15,
including the baseline tag itself. That is the exact signature the
SKILL § “Phase 8” documents: parse errors carry no rule, so
Baseline never buckets them and they surface forever. The source
was lib/templates/rails/post_deployment_migration/migration.rb —
a Rails generator ERB template carrying a .rb extension
(class <%= … %> fails to parse). It was removed upstream at
v4.2.7, which is why the floor vanishes there.
Per the SKILL the fix is config-side: exclude: the generator
template. Worth recording — the exclude is a File.fnmatch? glob
without FNM_PATHNAME, and matched against an absolute path,
so a bare lib/templates does nothing; the working form mirrors the
built-in excludes: "**/lib/templates/**/*.rb". With that added to
the frozen exclude:, the curve below is clean.
Result — a rising surfaced curve
Section titled “Result — a rising surfaced curve”| Tag | files | raw | silenced | surfaced |
|---|---|---|---|---|
| v3.5.19 | 875 | 22 | 22 | 0 |
| v4.0.15 | 955 | 23 | 22 | 1 |
| v4.1.15 | 961 | 25 | 22 | 3 |
| v4.2.7 | 1022 | 25 | 20 | 5 |
| v4.3.0-beta.1 | 1112 | 28 | 18 | 10 |
| v4.3.0 | 1096 | 31 | 18 | 13 |
| v4.4.0 | 1188 | 29 | 15 | 14 |
| v4.5.0 | 1218 | 30 | 15 | 15 |
| v4.5.10 | 1219 | 30 | 15 | 15 |
Unlike the flat-at-0 patch line, surfaced climbs monotonically
0 → 15 across the five release lines. This is the expected
shape for a feature-spanning range: code added between v3.5 and v4.5
carries diagnostics the v3.5.19 baseline could not have known about.
The final v4.5.10 set (15 surfaced) is the same diagnostic cluster
the patch sweep baselined at v4.5.0-beta.1 — call.possible-nil-receiver
×5, call.undefined-method ×4, flow.always-truthy-condition ×3,
call.argument-type-mismatch ×3 — i.e. the v3.5.19 baseline
“sees” only 15 of v4.5.10’s 30 diagnostics as new; the other 15 it
already covered.
Two effects are entangled here — read the curve with care
Section titled “Two effects are entangled here — read the curve with care”A cross-major sweep does not cleanly measure “error increase”. Two things move at once:
- Genuinely new standing diagnostics — new files / new code
paths (the
fasp/workers,interaction_policy_concern,signature_parser) carrying diagnostics. The bulk of the rise. - Rename artefacts. The baseline keys on
(file, rule, count). Commitb6b4ea4c(“Move the mastodon/*_cli files to mastodon/cli/*”, #24139) renamed every CLI file —lib/mastodon/statuses_cli.rb→lib/mastodon/cli/statuses.rb. Its baselinedcall.undefined-method ×2bucket therefore goes:clearedand the new path surfaces the same 2 diagnostics: asurfaced += 2that is pure churn, not a regression. This is the SKILL § “Phase 8” rename caveat, observed live.
The silenced column tells the same story from the other side: it
decays 22 → 15 as the codebase moves away from v3.5.19 — baseline
buckets stop matching when their files are renamed, deleted, or
edited past their count. Over 1,191 changed .rb files
(30.7k insertions / 12.8k deletions in app + lib; 6,238 files
repo-wide) that erosion is unavoidable.
Takeaway: the patch-line sweep (stable file tree, frozen
baseline) is the clean instrument for baseline stability. A
cross-major sweep is useful for “does newer code carry standing
diagnostics” (yes — 15 by v4.5.10) but its surfaced count must be
read as new diagnostics + rename artefacts, not as a regression
count. When surfaced jumps, diff the tag and rule out renames
first.
What is increasing — by rule and cause
Section titled “What is increasing — by rule and cause”Tracing every surfaced diagnostic to its first-appearance tag and reading the source, the 15-diagnostic v4.5.10 set breaks down as:
| Cluster | Rule | N | Cause | Verdict |
|---|---|---|---|---|
signature_parser.rb | argument-type-mismatch | 3 | StringScanner#[] called with a Symbol named-capture (scanner[:key]); Rigor’s RBS carries only the Integer overload | FP — RBS gap |
account.rb | undefined-method | 1 | in scope :duplicate_uris, -> { select(...).group(:uri) } Rigor resolves select to Enumerable#select → Array[String], so .group reads as undefined | misinference — AR scope body |
block_domain_service.rb | undefined-method | 1 | @domain_block_event.affected_local_accounts after a return if @domain_block_event.nil? guard; the ivar is still typed nil | misinference — ivar typing |
cli/statuses.rb | undefined-method | 2 | table_name= on a Class — the statuses_cli.rb → cli/statuses.rb rename artefact (§ above) | churn artefact |
| ActivityPub ×3 files | flow.always-truthy-condition | 3 | flow-folding over-claims a condition constant in create.rb / linked_data_signature.rb / link_details_extractor.rb | FP — flow-folding |
| CLI + workers + models | possible-nil-receiver | 5 | AR query results / associations typed nilable, called un-guarded | needs per-site triage |
The first-appearance trace (surfaced 0→1→3→5→10→13→14→15→15):
each step is new code, not a regression in old code — media_cli /
accounts_cli nil-receivers (v4.0–v4.1), the CLI-rename table_name=
pair (v4.2.7), the signature_parser + block_domain_service +
with_recursive cluster (v4.3.0-beta.1), account.rb group +
two flow-folding sites (v4.3.0), the fasp/ worker + signed_request
(v4.4.0), interaction_policy_concern (v4.5.0). One flow.dead-assignment
appeared at v4.1.15 and was cleared upstream by v4.3 — a genuine
refactor smell Rigor caught, then Mastodon fixed. A
lib/active_record/with_recursive.rb cluster (with_recursive! on
Integer + a flow-folding site) surfaced at v4.3.0 and vanished by
v4.4.0.
Reading the increase: the rise is dominated by new code
hitting Rigor’s known precision gaps, not by new bugs. Of the 15:
the 3 argument-type-mismatch and 3 flow.always-truthy-condition
are outright false positives; 2 are AR-shaped misinferences; 2 are a
rename artefact. Only the 5 possible-nil-receiver warnings are
candidate genuine findings, and those still need per-site triage
(the rigor-baseline-reduce SKILL’s job). So a cross-major
surfaced curve grows with code volume largely because more code
means more contact with the same finite set of engine precision
gaps — another reason it is not a regression count.
Cold-cache spot check
Section titled “Cold-cache spot check”A cold --no-cache --no-baseline run at v4.5.10 confirms raw =
30 (error ×9 / warning ×12 / info ×9) — identical to the patch
sweep’s v4.5.10 and to the v0.1.8 note. The shared cache masked
nothing; the two sweeps agree on the absolute diagnostic set.
Cross-version verdict
Section titled “Cross-version verdict”- The v3.5.19 baseline covers exactly half of v4.5.10’s diagnostics (15 of 30). Adopting Rigor on an old Mastodon and never refreshing the baseline would, by v4.5.10, surface 15 diagnostics — but ~2 of those are CLI-rename artefacts, so the genuine standing-diagnostic increase is ~13.
- The rename caveat is real and measurable. A team running a
frozen baseline across a major upgrade should regenerate it after
large refactors (file moves), exactly as ADR-22 / the
rigor-baseline-reduceSKILL anticipate. - The parse-error floor is real and config-fixable. A Rails
project carrying generator
.rbtemplates needs them inexclude:;rigor-project-initshould emit that exclude for Rails stacks (queued follow-up — see below).
Follow-ups surfaced by these sweeps
Section titled “Follow-ups surfaced by these sweeps”Config / tooling
Section titled “Config / tooling”rigor-project-initshould add"**/lib/templates/**/*.rb"(or the broader generator-template path) to theexclude:it emits for a Rails stack — the parse-error floor is otherwise a guaranteed first-run papercut on any Rails app with generators.- The
rigor-regression-sweepSKILL § “Phase 8” parse-error-floor paragraph could note theFile.fnmatch?-without-FNM_PATHNAME+ absolute-path matching rule (a bare directory name does not work as anexclude:entry; use the**/dir/**built-in-exclude shape).
False positives / misinferences — TODO to resolve
Section titled “False positives / misinferences — TODO to resolve”The cross-version § “What is increasing” table isolates four
engine-side defects. Recorded as TODOs in
docs/CURRENT_WORK.md § “Open engineering
items” → “Mastodon cross-version sweep — FP findings”; summarised
here:
StringScanner#[]Symbol overload (FP, 3 sites).scanner[:key]— a Symbol named-capture argument, valid since Ruby 3.x — tripscall.argument-type-mismatchbecause Rigor’s RBS forStringScanner#[]has only(Integer) -> String?. Already tracked:CURRENT_WORK.md§ “Stdlib RBS coverage-gap pattern” names thereferences/rbsbranchwiden-strscan-resolv-stdlib-sigsthat widens exactly this signature. This sweep is the empirical confirmation; the fix is the staged upstream RBS PR.- AR
scope-body method resolution (misinference, 1 site). Insidescope :duplicate_uris, -> { select(...).group(:uri) }the lambda’sselfis the model class;selectshould resolve toActiveRecord::Querying#select(→ a relation) but resolves toEnumerable#select(→Array[String]), so the chained.groupreads asundefined-method. The empirical case for ADR-26 (ActiveRecord::Relationtyping) — note the sweep config omits therigor-activerecordplugin, which is where the model-class query surface is meant to be typed. - Ivar nil-guard / ivar-write typing (misinference, 1 site).
block_domain_service.rbcalls@domain_block_event.affected_local_accountsafter areturn if @domain_block_event.nil?guard, yet the ivar is typednil(anundefined-method ... for nil, not a nilablepossible-nil-receiver). Same family as flow-folding gap G2 (an ivar’s type is taken from its literal writes and not refreshed) — the guard clause does not narrow the ivar, and the ivar’s non-nilassignment is invisible to inference. Needs an ivar-narrowing / ivar-write-inference fix. - Flow-folding over-claim (FP, 3 sites).
flow.always-truthy-conditioninactivity/create.rb,linked_data_signature.rb,link_details_extractor.rb. Already tracked:CURRENT_WORK.md§ “Flow-folding — loop-mutation tracking (gaps G1 / G2)” and the cluster-4 triage. This sweep confirms the cluster persists across the v3.5→v4.5 line.
The 5 call.possible-nil-receiver warnings are not recorded as
engine TODOs — they are candidate genuine findings and belong in
per-site triage (rigor-baseline-reduce), not the FP backlog.
Reproduction
Section titled “Reproduction”The patch-line sweep lives in ~/repo/ruby/rigor-survey/_mastodon-sweep/
(sweep.sh, tabulate.rb, regenerated baseline.yml,
reports/<tag>.json; the v0.1.8 reports preserved as
reports-v0.1.8/). The cross-version sweep lives in the sibling
_mastodon-major-sweep/ with the same file layout and its own
v3.5.19 baseline.yml.
The procedure is the rigor-regression-sweep
SKILL.
© 2026 TypedDuck. Licensed under CC BY-SA 4.0.