ADR-41 — Inference budget design (wiring, on-hit policy, measurement-gated defaults)
Status: Proposed, 2026-06-03. Records the target design for Rigor’s
inference budgets after a survey established that the spec’s budgets:
table is normative-for-v1 but largely unwired, and that the
categories that drive real cost on large apps are exactly the unwired
ones. Nothing here is implemented yet; the work is sequenced as Layer 1
(doc/spec hygiene) and Layer 2 (wire the load-bearing budgets) below.
Grounding:
docs/notes/20260603-inference-budget-reality-survey.md
(the Rigor measurement) and
docs/notes/20260603-inference-cutoff-prior-art.md
(how six other checkers solve it). Normative spec:
docs/type-specification/inference-budgets.md.
Context
Section titled “Context”What the spec promises vs. what the engine does
Section titled “What the spec promises vs. what the engine does”inference-budgets.md
defines a ten-row configurable budgets: table (recursion_depth,
call_graph_width, operator_ambiguity, union_size,
structural_growth, interface_candidates, hash_erasure_keys,
hash_erasure_values, negative_fact_display) plus a static.*
incomplete-inference diagnostic family. A 2026-06-03 survey found:
- No
budgets:key is parsed and no row is enforced. The only operative cutoffs are three hard-coded, silent, non-configurable guards — the(receiver, method)recursion re-entry guard (effective depth 1 →Dynamic[top]), the 100-node ancestor-walk cap, and the ADR-20 HKT reducer’s 64-step fuel — plus ADR-10’s per-gembudget_per_gem(the one configurable budget, opt-in only). - The wired guards are orthogonal to the real cost cliff. They fire
on small recursive-descent parsers (haml 421, jbuilder 126 hits) and
keep them fast; the slowest corpus project (Redmine: 331 files, 172 s,
1.5 GB) fires the recursion guard only 71 times and sails past all
three guards uncapped. The memory profile points straight at unbounded
union / structural growth — the unwired
union_size/structural_growthcategories. - The manual mis-documents
budget_per_gem(claims “time budget in ms, default 1000”; it is a method-definition count, default 5000), and there is no user-facing explanation of budgets at all.
What the prior-art survey established
Section titled “What the prior-art survey established”Rigor is camp (b) — whole-program / deep inference without required signatures (the recursion guard exists because of this) — with a camp-(a) escape hatch (an RBS / inline-RBS / generated boundary signature stops the deep inference). Among the six surveyed tools:
- Sorbet / Steep / mypy (camp a, or camp-a-for-returns) sidestep the problem: mandatory signatures at boundaries, local-only inference. No union-size / iteration budgets needed. mypy additionally widens with join to a common supertype, trading precision for a fast fixed point.
- TypeScript / Pyright (camp b) use dense counting budgets and
mostly error on overflow (TS2589 / TS2590; Pyright
maxCodeComplexity). - PHPStan / Psalm (camp b) use widening of value-tracked types (array → general array at 256 keys, etc.), always silent. PHPStan’s removal of a constant-scalar union cap caused OOM regressions — proof the caps are load-bearing.
- TypeProf (camp b, the closest analogue: whole-program, no required
sigs) widens to
untypedatunion_width_limit = 10andtype_depth_limit = 5, never errors, and made its fixpoint incremental in v2 so the budget is re-spent per-edit.
The decisive observations: (1) Rigor’s design twins TypeProf, whose
union_width_limit = 10 is tighter than Rigor’s spec union_size = 24; (2) silent widening is mainstream but universally criticized for
opacity; (3) caps must exist but their values come from the observed
distribution, not a guess.
Working Decision
Section titled “Working Decision”WD1 — Keep the camp-(b)-with-escape-hatch architecture; budgets are intrinsic, not a bolt-on
Section titled “WD1 — Keep the camp-(b)-with-escape-hatch architecture; budgets are intrinsic, not a bolt-on”Rigor will continue to infer user-method returns deeply (no mandatory
signatures — that is the product). Therefore inference budgets are a
permanent, intrinsic part of the engine, not a temporary scaffold:
deep inference without budgets cannot terminate on real code. The
boundary contract (a return annotation / RBS / sig-gen stub) stays the
first-class escape hatch — the same role explicit return types play
in TypeScript and mandatory sigs play in Sorbet. A budget hit is
always resolvable by the user supplying a contract, never only by editing
a Rigor-only knob.
WD2 — On-hit policy: widen and diagnose (the differentiator)
Section titled “WD2 — On-hit policy: widen and diagnose (the differentiator)”Every budget hit degrades to a conservative type (Dynamic[top] / the
declared bound / untyped) — widening, like TypeProf / PHPStan / mypy,
never erroring like TypeScript. This preserves the
false-positive-discipline value: a budget must never turn working code
into a reported error.
But Rigor will also emit a static.* incomplete-inference diagnostic
(:info by default) recording which budget fired and where. This is
the gap every surveyed widening tool leaves open — mypy’s silent join to
object, Pyright’s silent “pin”, TypeProf’s anonymous untyped — and
the one the spec already calls for. The product claim is: widen like
TypeProf, but tell the user where inference stopped, like nobody else
does, so a Dynamic[top] is never indistinguishable between “genuine
open surface” and “budget cutoff.” RIGOR_BUDGET_TRACE (shipped
2026-06-03) is the aggregate-only debugging precursor; the per-site
static.* diagnostic is the real surface.
WD3 — Defaults are measurement-gated, not guessed
Section titled “WD3 — Defaults are measurement-gated, not guessed”For any budget whose value trades precision for tractability
(union_size, structural_growth, and any future value/shape cap), the
default MUST be chosen from an observed distribution on real
projects, not reasoned from first principles. Rationale: PHPStan’s
OOM-on-removal and the FP-discipline value both show the cost of a wrong
cap is asymmetric — set it too low and a genuine A | B | … | N union
collapses to top, manufacturing a false positive or silently losing
checking. Procedure: instrument the actual join-arity and shape-member
growth distributions (the BudgetTrace approach, extended from counters
to histograms) on Redmine + Mastodon, then set the default at a
percentile of the observed tail that captures the pathological cases
without clipping legitimate ones. The spec’s current 24 / 16 are
placeholders pending this measurement, and the prior-art anchors
(TypeProf 10 / 5) suggest the true values may be lower than 24.
WD4 — Two budget styles, assigned per category
Section titled “WD4 — Two budget styles, assigned per category”- Termination guards stay counting + hard (non-negotiable for correctness): recursion re-entry (the depth floor), ancestor-walk (100), HKT fuel (64). These cannot be disabled below their floor; a project cannot opt into non-termination.
- Precision budgets are widening + configurable:
union_size,structural_growth, and thehash_erasure_*display caps generalize to a supertype and are user-tunable within a validated range (the spec already specifies ranges). This mirrors Psalm’s model — the one surveyed tool that exposes precision budgets as documented settings — including a Psalm-style “tuning these changes precision, not correctness; surprises are not bugs” caveat.
WD5 — recursion_depth: split termination floor from precision-unroll depth
Section titled “WD5 — recursion_depth: split termination floor from precision-unroll depth”The spec’s recursion_depth = 5 conflates two things the engine does
not. Redefine:
- a termination floor (hard, ≥1): the re-entry guard that prevents non-termination. Wired today at depth 1.
- an optional precision-unroll depth (default 1 = off): how many
levels of a recursive method Rigor unrolls to gain return precision
before widening. None of the surveyed camp-(b) tools unroll general
recursion for precision (TypeProf widens to
anyat depth 5 with no precision claim), so unrolling is a deferred, demand-driven refinement, not part of the floor.
Align the documented recursion_depth default to the implemented reality
(1) and document the two meanings separately.
WD6 — Complete the documented table
Section titled “WD6 — Complete the documented table”Add the two real-but-unlisted guards — ancestor_walk (100) and
hkt_fuel (64) — to the budget table so the documented set matches the
enforced set. Neither fired anywhere in the corpus, so the values are
generous; keep them hard-coded (not configurable) until a project shows
they bind.
Evaluation verdict (2026-06-03)
Section titled “Evaluation verdict (2026-06-03)”Asked “maintain the current thresholds vs. search for better defaults vs. can’t even evaluate until the unwired mechanisms are wired” — the honest answer is that this is not one global choice; it partitions by budget category, and applying any single verdict to the whole table is wrong.
| budget group | wired? | verdict | why |
|---|---|---|---|
| termination guards (recursion depth-1, ancestor 100, HKT fuel 64) | yes | maintain | zero corpus evidence they misbehave: the recursion guard keeps parsers fast and is FP-safe (widens to Dynamic[top]), ancestor + fuel never fired. Tuning has no observed benefit. |
budget_per_gem | yes | maintain value, fix the doc | the per-gem demo showed 5000 fully covers the tested gem (plateau at 5000 = 20000); the default is sound, only the manual is wrong. |
union_size / structural_growth | no | un-evaluable until measured | nothing enforces them, so there is no observable effect to judge 24 / 16 against, and “searching for a better default” is searching over a disconnected knob. |
call_graph_width, overload_candidates, operator_ambiguity, interface_candidates | no | defer | never bound any corpus project. |
Dominant conclusion. For the only budgets that matter to the problem that motivated this work (the large-app cost cliff — Redmine 1.5 GB), the “better-defaults” question is currently unanswerable, and maintaining the status quo leaves the cliff unaddressed. So the gating action is neither threshold tuning nor status-quo acceptance — it is the read-only distribution measurement (Slice 2a): instrument the actual join-arity and shape-growth distributions on Redmine / Mastodon without yet enforcing a cap. That is the cheapest step that converts “un-evaluable” into “evaluable”, validates whether 24 is even in the right range (the TypeProf prior is 10), and supplies the observed tail that WD3 requires before any default is chosen. Picking a value before 2a would repeat PHPStan’s cap-removal-OOM mistake in reverse.
Caveat. “Maintain” for the termination guards means “no evidence to
change”, not “proven optimal”: the recursion guard fired 421× on haml,
but whether those Dynamic[top] widenings cost real precision / FPs
was not measured (only firing counts were). If the guards are ever
revisited, that too needs a distribution-of-impact measurement, not a
guess — the same discipline as Layer 2.
Slice 2a outcome (2026-06-03) — the union_size hypothesis is refuted
Section titled “Slice 2a outcome (2026-06-03) — the union_size hypothesis is refuted”Slice 2a ran (union-arity histogram, read-only). It overturned the
verdict’s working assumption that union_size is the Layer 2 lever
(Survey 4 in the survey note):
- Memory does not correlate with union width. kramdown produces a 932-member union at 124 MB; Redmine’s widest union is 37 members at 1.5 GB. Redmine’s blow-up is not wide unions.
- Capping at 24 would touch 20 of Redmine’s 254,584 unions for ~0 memory benefit, while adding a false-positive surface on those 20 presumably-legitimate joins — the exact asymmetric-cost failure WD3 guards against, caught before wiring.
- The prior-art reasoning is inverted by data: TypeProf’s 10 would
clip 95–117 unions per large project;
union_size = 24is if anything too low. If wired at all it is a display/pathology valve at ~40–64 (above Redmine’s max 37, clipping only the kramdown-932 / mastodon-184 tail), not a memory fix.
Revised Layer 2. Drop “wire union_size” as the entry point. The
load-bearing question is now: what actually allocates Redmine’s 1.5 GB?
— something that scales with the type universe but not union width
(structural_growth, fact-store / narrowing accumulation, the RBS env, or
retained per-method scopes). The next step is a memory profile of
Redmine (heap-allocation attribution), not a cap. union_size, if
wired, becomes a low-priority high-cap valve. This is Slice 2a doing its
job: one read-only measurement stopped us from wiring the wrong budget at
a harmful default.
Update — Slice 2b then resolved the cliff entirely (see § Slices, 2b):
the 1.5 GB was 4.2 M retained Strings from one unmemoized failure in
rigor-activerecord, not a budget at all. Fixed; Redmine dropped to
217 MB / 84 s. The memory cliff and the budget table are decoupled, and
Layer 2 budget wiring is now demand-deferred — no corpus project
demonstrates a budget-shaped cost.
Slices
Section titled “Slices”- Layer 1 — doc/spec hygiene (cheap, high-confidence, no new
measurement).
- Fix the
docs/manual/03-configuration.mdbudget_per_gembug (unit = method-def count, default 5000). - Reconcile
recursion_depthper WD5 (default 1; split the two meanings). - Add
ancestor_walk/hkt_fuelto the table (WD6). - Author the missing user-facing budget explanation (placement TBD — handbook appendix vs. manual section).
- Fix the
- Layer 2 — find and wire the actual cost lever (consequential,
measurement-gated).
- 2a. Done (2026-06-03). Extended
BudgetTraceto a read-only union-arity distribution; ran Redmine + Mastodon. Outcome: union width is not the memory lever (see “Slice 2a outcome” above) —union_sizedemoted. Shape-member-growth instrumentation still pending (a follow-on once the structural-growth site is located). - 2b (revised). Done (2026-06-03). Heap-profiled Redmine
(
RIGOR_HEAP_PROFILE/RIGOR_HEAP_TRACE). Outcome: the 1.5 GB was not a budget problem at all — 89 % of the live heap was String, and 98.5 % of those traced to a single line inrigor-activerecord(schema_table_or_nilmemoized only success, so a missing schema file re-appended an error string per AR call site → 4.2 M retained strings). Fixed by memoizing the failure; Redmine dropped to 217 MB / 84 s (−86 % / −51 %), Mastodon was never affected because it shipsschema.rb. See Survey 5 in the survey note. - Net effect on Layer 2: the large-app cost cliff that motivated
wiring a memory budget is resolved and was orthogonal to the budget
table. No corpus project currently demonstrates a budget-shaped
memory cost. So
structural_growth/union_sizewiring is demand- deferred — there is no measured cost to wire against. If a future project shows a genuinely type-structural blow-up, re-run the 2a-style distribution probe first (WD3). - 2c (if revived). Wire the budget a future profile implicates with
widening + the
static.*diagnostic (WD2), default from its observed distribution. - 2d (if revived). Make the precision budgets user-configurable under
budgets:with range validation + the Psalm-style caveat (WD4). union_size, if ever wired, is a low-priority high-cap (~40–64) display/pathology valve, not a memory fix.
- 2a. Done (2026-06-03). Extended
- Deferred: the remaining unwired rows (
call_graph_width,overload_candidates,operator_ambiguity,interface_candidates) never bound a corpus project; wire on demand. Precision-unroll depth (WD5) is demand-driven.
Relationship to other ADRs
Section titled “Relationship to other ADRs”- ADR-4 (inference engine) and ADR-5 (robustness principle) — budgets are the operational form of “strict on returns, lenient on inputs, never frighten working code”: at a budget, Rigor widens rather than reports. WD2’s diagnose-don’t-error is the ADR-5 reading.
- ADR-10 (dependency-source inference) —
budget_per_gemis the one already-wired budget and the template for WD4’s configurable + ranged + diagnostic-emitting model (dynamic.dependency-source.budget-exceededis the existing precedent for thestatic.*surface WD2 generalizes). - ADR-20 (lightweight HKT) — the reducer’s fuel budget is one of the termination guards WD4/WD6 keep hard-coded.
Rejected / deferred alternatives
Section titled “Rejected / deferred alternatives”- Move Rigor to camp (a) (require signatures, infer locally). Rejected: deep inference without mandatory annotations is the product (the robustness principle / “works on unannotated Ruby” promise). Camp (a) would dissolve the budget problem but is a different tool (Sorbet / Steep already occupy it).
- Error on budget hit (TypeScript model). Rejected for the default
path: erroring on a budget violates false-positive discipline on
working code. The
static.*diagnostic is:info, not an error. - Pick
union_size/structural_growthdefaults now by reasoning. Deferred to measurement (WD3) — the PHPStan OOM-on-removal and the asymmetric cost of a wrong cap make a guessed value the higher risk. - Make every budget configurable. Rejected: termination guards (WD4) must not be disable-able below their floor; only precision budgets are user-tunable.
Consequences
Section titled “Consequences”- A clear, intrinsic role for budgets: permanent termination machinery for a deep-inference engine, with a user-resolvable escape hatch and a per-site diagnostic that no surveyed competitor provides.
- A concrete, low-risk Layer 1 (doc/spec corrections) that can land immediately, and a measurement-gated Layer 2 that resolves the actual large-app cost cliff without risking false positives.
- The documented budget table will finally match the enforced set, and
budget_per_gemwill be correctly described. - Open question carried forward: the exact
union_size/structural_growthdefaults await the Layer 2a distribution measurement; the prior-art anchors (TypeProf 10 / 5) are the working priors.
© 2026 TypedDuck. Licensed under CC BY-SA 4.0.