Skip to content

Mastodon regression sweeps — re-run on Rigor v0.1.9

Date: 2026-05-23. Rigor: v0.1.9 (master, [Unreleased]). Target: mastodon/mastodon.

Two sweeps:

  1. Patch line — 16 tags v4.5.0-beta.1v4.5.10, a re-run of the 2026-05-21 v0.1.8 sweep on the current engine.
  2. Cross-version — 9 tags v3.5.19v4.5.10 spanning five minor/major release lines (added 2026-05-23).

§§ 1–6 below cover sweep 1; § “Cross-version sweep” covers sweep 2.

Re-run the 2026-05-21 Mastodon v4.5.x sweep — same target, same 16 tags, same frozen config — against the current engine. The original run was on Rigor v0.1.8; the engine has since advanced to v0.1.9. This re-check answers one question the first run could not: does the v0.1.8 → v0.1.9 engine change itself perturb a real project’s diagnostic stream?

Method unchanged: the rigor-regression-sweep procedure — baseline at the first tag, then rigor check every later tag against that frozen baseline + frozen config.

  • Same blobless clone of mastodon/mastodon; same 16 tags in release order (v4.5.0-beta.1 … v4.5.10) — all verified present.
  • Same frozen config as the v0.1.8 run (held identical across every tag): paths: [app, lib], exclude: [vendor, tmp], severity_profile: lenient, signature_paths: → the rigor-activesupport-core-ext sig/ bundle (absolute path). The rigor-* plugin gems remain omitted (not RubyGems-published in v0.1.x).
  • Cache wiped before the run. The shared content-hashed cache from the v0.1.8 sweep was deleted so v0.1.9 recomputes every file from cold — a stale cross-engine-version entry cannot mask a changed result.
  • Baseline regenerated at v4.5.0-beta.1 with v0.1.9: 30 diagnostics / 25 buckets (rule-ID mode) — byte-for-byte the same count the v0.1.8 run produced.
  • app + lib scope: 1,218 .rb files at beta.1, 1,219 at v4.5.10 (unchanged from the v0.1.8 run).

Result — still flat at zero, identical to v0.1.8

Section titled “Result — still flat at zero, identical to v0.1.8”
Tagrawsilencedsurfaced
v4.5.0-beta.130300
v4.5.0-beta.230300
v4.5.0-rc.1 … -rc.330300
v4.5.030300
v4.5.1 … v4.5.1030300

Every tag: surfaced = 0, raw = 30, silenced = 30. The v0.1.9 engine reproduces the v0.1.8 sweep exactly — same baseline zero point, same flat-at-zero curve over all 16 tags.

Churn cross-check (re-measured, identical to the v0.1.8 run):

  • 119 .rb files changed in app + lib (git diff v4.5.0-beta.1 v4.5.10: 603 insertions / 265 deletions; 762 files / 17.3k insertions repo-wide). Real development moved across the window; the flat curve is not a no-op artefact.

v4.5.10 raw composition (cold-cache spot check)

Section titled “v4.5.10 raw composition (cold-cache spot check)”

A cold --no-cache --no-baseline run at v4.5.10 on v0.1.9 independently confirms raw = 30, with a composition identical to the v0.1.8 note:

SeverityRules
error ×9call.undefined-method ×9
warning ×12call.possible-nil-receiver ×9, call.argument-type-mismatch ×3
info ×9flow.always-truthy-condition ×8, rbs.coverage.missing-gem ×1

The 8 flow.always-truthy-condition are still the cluster-4 G1/G2 flow-folding false positives triaged in 20260521-mastodon-cluster4-flow-folding-triage.md — unchanged by the v0.1.9 engine.

  1. The v0.1.8 → v0.1.9 engine change is diagnostic-neutral on this corpus. The baseline at beta.1, the per-tag surfaced curve, and the cold v4.5.10 raw composition are all identical across the two engine versions. Whatever shipped between v0.1.8 and v0.1.9 did not add, drop, or reword a single diagnostic in Mastodon’s app + lib scope under the frozen config — a useful no-regression signal for the release line.
  2. ADR-22 acknowledge mode remains empirically validated. A project that adopted Rigor at v4.5.0-beta.1 would still sail through the entire v4.5.x line with zero baseline maintenance and zero false CI failures — and the conclusion now holds across two engine versions, not one.
  3. The (file, rule, count) baseline is engine-version-robust. A baseline generated on v0.1.8 and one generated on v0.1.9 cover the same 25 buckets; nothing in the rule-ID-mode keying drifted across the patch bump.
  • This re-run inherits every limit of the original. Released tags are a post-spec-gate population, so surfaced = 0 over the line measures baseline stability, not Rigor’s bug-detection power. Testing detection still needs sampling finer than release tags (per-commit / PR-head / bug-introducing commits) — see the v0.1.8 note § Caveats and the SKILL § “Phase 1”.
  • Config still omits the rigor-* plugins (not published in v0.1.x); severity_profile: lenient still downgrades possible-nil-receiver / argument-type-mismatch / flow.*.
  • Single project, single release line — the corpus is still two data points (this run and the v0.1.8 one) over the same project.

The first invocation failed: the rigor checkout’s vendor/bundle held native extensions (json, prism, rbs) compiled against an older Ruby ABI — dlopen … Symbol not found: _rb_cObject. Fixed with bundle pristine inside the Nix dev shell, which recompiled every native gem against the current Ruby 4.0.5. Worth a line in the SKILL if the sweep is run after a Ruby bump; not a Rigor bug.

Cross-version sweep — v3.5.19 → v4.5.10

Section titled “Cross-version sweep — v3.5.19 → v4.5.10”

A second sweep, run the same day on the same v0.1.9 engine, spans five release lines: 9 tags v3.5.19, v4.0.15, v4.1.15, v4.2.7, v4.3.0-beta.1, v4.3.0, v4.4.0, v4.5.0, v4.5.10. Per the SKILL § “Phase 1” this is a feature-spanning released-tag range — it measures whether new released code adds standing diagnostics the baseline did not already cover, a different question from the patch-line sweep’s baseline-stability one.

  • Same frozen config shape; baseline taken once at v3.5.19 and frozen. Separate artefact dir (_mastodon-major-sweep/), cache wiped before the run.
  • app + lib scope grows substantially over the range: 875 .rb files at v3.5.19 → 1,219 at v4.5.10.
  • One config delta from the patch sweep, forced by the data — see the parse-error-floor note below.

The parse-error floor — and the config fix

Section titled “The parse-error floor — and the config fix”

The first cross-version run showed a constant 7 rule-less error diagnostics at every tag from v3.5.19 through v4.1.15, including the baseline tag itself. That is the exact signature the SKILL § “Phase 8” documents: parse errors carry no rule, so Baseline never buckets them and they surface forever. The source was lib/templates/rails/post_deployment_migration/migration.rb — a Rails generator ERB template carrying a .rb extension (class <%= … %> fails to parse). It was removed upstream at v4.2.7, which is why the floor vanishes there.

Per the SKILL the fix is config-side: exclude: the generator template. Worth recording — the exclude is a File.fnmatch? glob without FNM_PATHNAME, and matched against an absolute path, so a bare lib/templates does nothing; the working form mirrors the built-in excludes: "**/lib/templates/**/*.rb". With that added to the frozen exclude:, the curve below is clean.

Tagfilesrawsilencedsurfaced
v3.5.1987522220
v4.0.1595523221
v4.1.1596125223
v4.2.7102225205
v4.3.0-beta.11112281810
v4.3.01096311813
v4.4.01188291514
v4.5.01218301515
v4.5.101219301515

Unlike the flat-at-0 patch line, surfaced climbs monotonically 0 → 15 across the five release lines. This is the expected shape for a feature-spanning range: code added between v3.5 and v4.5 carries diagnostics the v3.5.19 baseline could not have known about. The final v4.5.10 set (15 surfaced) is the same diagnostic cluster the patch sweep baselined at v4.5.0-beta.1 — call.possible-nil-receiver ×5, call.undefined-method ×4, flow.always-truthy-condition ×3, call.argument-type-mismatch ×3 — i.e. the v3.5.19 baseline “sees” only 15 of v4.5.10’s 30 diagnostics as new; the other 15 it already covered.

Two effects are entangled here — read the curve with care

Section titled “Two effects are entangled here — read the curve with care”

A cross-major sweep does not cleanly measure “error increase”. Two things move at once:

  1. Genuinely new standing diagnostics — new files / new code paths (the fasp/ workers, interaction_policy_concern, signature_parser) carrying diagnostics. The bulk of the rise.
  2. Rename artefacts. The baseline keys on (file, rule, count). Commit b6b4ea4c (“Move the mastodon/*_cli files to mastodon/cli/*”, #24139) renamed every CLI file — lib/mastodon/statuses_cli.rblib/mastodon/cli/statuses.rb. Its baselined call.undefined-method ×2 bucket therefore goes :cleared and the new path surfaces the same 2 diagnostics: a surfaced += 2 that is pure churn, not a regression. This is the SKILL § “Phase 8” rename caveat, observed live.

The silenced column tells the same story from the other side: it decays 22 → 15 as the codebase moves away from v3.5.19 — baseline buckets stop matching when their files are renamed, deleted, or edited past their count. Over 1,191 changed .rb files (30.7k insertions / 12.8k deletions in app + lib; 6,238 files repo-wide) that erosion is unavoidable.

Takeaway: the patch-line sweep (stable file tree, frozen baseline) is the clean instrument for baseline stability. A cross-major sweep is useful for “does newer code carry standing diagnostics” (yes — 15 by v4.5.10) but its surfaced count must be read as new diagnostics + rename artefacts, not as a regression count. When surfaced jumps, diff the tag and rule out renames first.

Tracing every surfaced diagnostic to its first-appearance tag and reading the source, the 15-diagnostic v4.5.10 set breaks down as:

ClusterRuleNCauseVerdict
signature_parser.rbargument-type-mismatch3StringScanner#[] called with a Symbol named-capture (scanner[:key]); Rigor’s RBS carries only the Integer overloadFP — RBS gap
account.rbundefined-method1in scope :duplicate_uris, -> { select(...).group(:uri) } Rigor resolves select to Enumerable#selectArray[String], so .group reads as undefinedmisinference — AR scope body
block_domain_service.rbundefined-method1@domain_block_event.affected_local_accounts after a return if @domain_block_event.nil? guard; the ivar is still typed nilmisinference — ivar typing
cli/statuses.rbundefined-method2table_name= on a Class — the statuses_cli.rbcli/statuses.rb rename artefact (§ above)churn artefact
ActivityPub ×3 filesflow.always-truthy-condition3flow-folding over-claims a condition constant in create.rb / linked_data_signature.rb / link_details_extractor.rbFP — flow-folding
CLI + workers + modelspossible-nil-receiver5AR query results / associations typed nilable, called un-guardedneeds per-site triage

The first-appearance trace (surfaced 0→1→3→5→10→13→14→15→15): each step is new code, not a regression in old code — media_cli / accounts_cli nil-receivers (v4.0–v4.1), the CLI-rename table_name= pair (v4.2.7), the signature_parser + block_domain_service + with_recursive cluster (v4.3.0-beta.1), account.rb group + two flow-folding sites (v4.3.0), the fasp/ worker + signed_request (v4.4.0), interaction_policy_concern (v4.5.0). One flow.dead-assignment appeared at v4.1.15 and was cleared upstream by v4.3 — a genuine refactor smell Rigor caught, then Mastodon fixed. A lib/active_record/with_recursive.rb cluster (with_recursive! on Integer + a flow-folding site) surfaced at v4.3.0 and vanished by v4.4.0.

Reading the increase: the rise is dominated by new code hitting Rigor’s known precision gaps, not by new bugs. Of the 15: the 3 argument-type-mismatch and 3 flow.always-truthy-condition are outright false positives; 2 are AR-shaped misinferences; 2 are a rename artefact. Only the 5 possible-nil-receiver warnings are candidate genuine findings, and those still need per-site triage (the rigor-baseline-reduce SKILL’s job). So a cross-major surfaced curve grows with code volume largely because more code means more contact with the same finite set of engine precision gaps — another reason it is not a regression count.

A cold --no-cache --no-baseline run at v4.5.10 confirms raw = 30 (error ×9 / warning ×12 / info ×9) — identical to the patch sweep’s v4.5.10 and to the v0.1.8 note. The shared cache masked nothing; the two sweeps agree on the absolute diagnostic set.

  1. The v3.5.19 baseline covers exactly half of v4.5.10’s diagnostics (15 of 30). Adopting Rigor on an old Mastodon and never refreshing the baseline would, by v4.5.10, surface 15 diagnostics — but ~2 of those are CLI-rename artefacts, so the genuine standing-diagnostic increase is ~13.
  2. The rename caveat is real and measurable. A team running a frozen baseline across a major upgrade should regenerate it after large refactors (file moves), exactly as ADR-22 / the rigor-baseline-reduce SKILL anticipate.
  3. The parse-error floor is real and config-fixable. A Rails project carrying generator .rb templates needs them in exclude:; rigor-project-init should emit that exclude for Rails stacks (queued follow-up — see below).
  • rigor-project-init should add "**/lib/templates/**/*.rb" (or the broader generator-template path) to the exclude: it emits for a Rails stack — the parse-error floor is otherwise a guaranteed first-run papercut on any Rails app with generators.
  • The rigor-regression-sweep SKILL § “Phase 8” parse-error-floor paragraph could note the File.fnmatch?-without-FNM_PATHNAME + absolute-path matching rule (a bare directory name does not work as an exclude: entry; use the **/dir/** built-in-exclude shape).

False positives / misinferences — TODO to resolve

Section titled “False positives / misinferences — TODO to resolve”

The cross-version § “What is increasing” table isolates four engine-side defects. Recorded as TODOs in docs/CURRENT_WORK.md § “Open engineering items” → “Mastodon cross-version sweep — FP findings”; summarised here:

  1. StringScanner#[] Symbol overload (FP, 3 sites). scanner[:key] — a Symbol named-capture argument, valid since Ruby 3.x — trips call.argument-type-mismatch because Rigor’s RBS for StringScanner#[] has only (Integer) -> String?. Already tracked: CURRENT_WORK.md § “Stdlib RBS coverage-gap pattern” names the references/rbs branch widen-strscan-resolv-stdlib-sigs that widens exactly this signature. This sweep is the empirical confirmation; the fix is the staged upstream RBS PR.
  2. AR scope-body method resolution (misinference, 1 site). Inside scope :duplicate_uris, -> { select(...).group(:uri) } the lambda’s self is the model class; select should resolve to ActiveRecord::Querying#select (→ a relation) but resolves to Enumerable#select (→ Array[String]), so the chained .group reads as undefined-method. The empirical case for ADR-26 (ActiveRecord::Relation typing) — note the sweep config omits the rigor-activerecord plugin, which is where the model-class query surface is meant to be typed.
  3. Ivar nil-guard / ivar-write typing (misinference, 1 site). block_domain_service.rb calls @domain_block_event.affected_local_accounts after a return if @domain_block_event.nil? guard, yet the ivar is typed nil (an undefined-method ... for nil, not a nilable possible-nil-receiver). Same family as flow-folding gap G2 (an ivar’s type is taken from its literal writes and not refreshed) — the guard clause does not narrow the ivar, and the ivar’s non-nil assignment is invisible to inference. Needs an ivar-narrowing / ivar-write-inference fix.
  4. Flow-folding over-claim (FP, 3 sites). flow.always-truthy-condition in activity/create.rb, linked_data_signature.rb, link_details_extractor.rb. Already tracked: CURRENT_WORK.md § “Flow-folding — loop-mutation tracking (gaps G1 / G2)” and the cluster-4 triage. This sweep confirms the cluster persists across the v3.5→v4.5 line.

The 5 call.possible-nil-receiver warnings are not recorded as engine TODOs — they are candidate genuine findings and belong in per-site triage (rigor-baseline-reduce), not the FP backlog.

The patch-line sweep lives in ~/repo/ruby/rigor-survey/_mastodon-sweep/ (sweep.sh, tabulate.rb, regenerated baseline.yml, reports/<tag>.json; the v0.1.8 reports preserved as reports-v0.1.8/). The cross-version sweep lives in the sibling _mastodon-major-sweep/ with the same file layout and its own v3.5.19 baseline.yml. The procedure is the rigor-regression-sweep SKILL.

© 2026 TypedDuck. Licensed under CC BY-SA 4.0.