Part 6 — Hash and array types

In the last chapter (Part 5) we narrowed Unions by case, slimming the type per branch of an if. Now we turn to the contents of a value. Give structural types to hash and array literals (HashShape / Tuple). Then we find the type of reading a value out of them. Ruby code is awash in “symbol-keyed hashes,” so handling this well makes things practical in one leap.

6-1. Raising a type from a literal — HashShape and Tuple

What’s the type of { foo: 1, bar: "a" }? “Hash” is too coarse. We want to remember which key holds which type. That’s HashShape:

module Type
  HashShape = Data.define(:fields) do   # fields: { foo: Const[1], bar: Const["a"] }
    def to_s = "{" + fields.map { |k, v| "#{k}: #{v}" }.join(", ") + "}"
  end

  Tuple = Data.define(:elements) do     # remember an array by per-position type
    def to_s = "[" + elements.map(&:to_s).join(", ") + "]"
  end
end

We just add two cases to type_of. In Prism a hash is a HashNode (each pair an AssocNode, a symbol key a SymbolNode), an array an ArrayNode:

when Prism::HashNode
  fields = node.elements.to_h { |a| [a.key.unescaped.to_sym, type_of(a.value, scope, diag)] }
  Type::HashShape[fields]
when Prism::ArrayNode
  Type::Tuple[node.elements.map { |el| type_of(el, scope, diag) }]

type_of(parse(%q[{ foo: 1, bar: "a" }]))   # => {foo: 1, bar: "a"}
type_of(parse(%q[[1, "x"]]))               # => [1, "x"]

Laid out in the three perspectives:

① Type theory: a type that gathers several values by label = a record type (『しくみ』 ch. 5).
② In Ruby: symbol-keyed hashes are everywhere. Arrays are used tuple-like too ([name, age]).
③ In Rigor: don’t round to Hash; remember the type per key and per position (an extension of Part 1’s “remember finely”).

6-2. Reading out — `h[:foo]` and `a[0]`

Since the type holds “which key is which type,” reading out is straightforward. h[:foo] is a [] method send in Prism (h.[](:foo)). If the argument is a literal symbol/integer, we can look it up from the type:

def read_index(receiver, arg_node)
  if receiver.is_a?(Type::HashShape) && arg_node.is_a?(Prism::SymbolNode)
    # unknown key is nil (because real Ruby returns nil. Don't error)
    return receiver.fields.fetch(arg_node.unescaped.to_sym, Type::Const[nil])
  end
  if receiver.is_a?(Type::Tuple) && arg_node.is_a?(Prism::IntegerNode)
    return receiver.elements.fetch(arg_node.value, Type::Const[nil])
  end
  nil   # can't special-case it → fall through to ordinary dispatch
end

# when h : {foo: 1, bar: "a"}
h[:foo]   # => 1     (Const[1])
h[:zzz]   # => nil   (★ don't error)
a[0]      # => 1
a[9]      # => nil

The point is that h[:zzz] doesn’t error. The reason is simple — real Ruby returns nil for {foo: 1}[:zzz]. Reading a nonexistent key is not a “bug”; “nil is returned” is the correct behavior. The type follows suit and returns nil.

6-3. Open or closed — allowing extra keys

This is the climax of Part 6. Consider this Ruby:

def greet(user)        # suppose user is expected to be { name: ... }
  "Hello, #{user[:name]}"
end

greet({ name: "Alice", admin: true })   # ★ admin is in there too, besides name

What greet wants is just name. But the hash passed in also has admin. Should this be a fit, or not a fit?

If it’s type equality, “the properties must match exactly” — but with subtyping it’s a different story: “you can pass {name:, admin:} where {name:} is wanted” is sound, and this is called width subtyping (『しくみ』, too, adopts this width subtyping in ch. 7, allowing extra properties).

Rigor, too, makes HashShape a fit. What differs is the motive and scope. It’s not handling statically-written records for soundness; the counterpart is Ruby’s options hashes, and the aim is not producing false positives:

In Ruby, “build a big options hash, and each method picks just the keys it needs” is the standard move.
If you balked every time there was an extra key, properly working code would go bright red.

So Rigor’s HashShape, seen from the expecting side, is “as long as it has at least these keys, it’s fine” (open). Extras don’t matter. Only when a needed key is missing is it a problem. This is how “never frighten working code” shows up in a structural type.

Figure 6-1 — an open HashShape: allow extras, blame only what's missing

▼ Figure 6-1 — an open HashShape: allow extras, blame only what’s missing

Laid out in the three perspectives:

① Type theory: in record width subtyping, the side with more keys is the subtype （『しくみ』ch. 7 uses the same width subtyping）.
- It can read backwards, but the relation is: “you can pass {name:, admin:} where {name:} is wanted (name is duly there); you can’t pass the reverse.”
- The side with more keys passes more demands, so it’s the subtype. “Subtype” is taken up in the next chapter, Part 7, as “does it fit in the box.”
② In Ruby: an extra key in an options hash is everyday. Forcing an exact match doesn’t fit reality.
③ In Rigor: the expectation is open (“at least”). Allow extras, blame only what’s missing = avoid false positives.

6-4. This chapter’s summary

What we added is the type carriers HashShape / Tuple, two cases in type_of, and the read read_index. There’s almost no new judgment logic (the read is just fetch’s second argument); the difficulty is concentrated in the concept — “the open policy.”

Running it:

Chibirigor.annotate("h = {foo: 1, bar: \"a\"}\nh[:foo]\nh[:bar]\nh[:zzz]\n").each { |a| puts "#{a[:line]}: #{a[:type]}" }

1: {foo: 1, bar: "a"}
2: 1
3: "a"
4: nil

h is a HashShape that remembers each key’s type. h[:foo] and h[:bar] return the remembered types respectively, and the unknown key h[:zzz] returns nil without blame (the open policy of allowing “at least”).

This chapter’s three perspectives:

	Content
① Type theory（『しくみ』ch. 5 / TAPL ch. 11 §11.8）	Gather values by label = a record type. The side with more keys is the subtype
② Ruby / RBS	Symbol-keyed options hashes are heavily used. An exact match doesn’t fit reality
③ Rigor’s implementation problem	The expectation is open (at least). Allow extras, blame only what’s missing = apply width subtyping to a dynamic hash and avoid false positives

Handed to the sequel:

full support for keyword arguments (def f(name:, **opts)). The main volume stops at handling them as hash values.
the type difference between map and filter_map: in Rigor, tuple.map { |x| f(x) } keeps the per-position types (applying f’s return type to each). filter_map, by contrast, has a result size that varies with the predicate, so it can’t keep per-position information and is forcibly widened to Array[T]. A natural consequence of type theory: “only an operation that doesn’t change positions can keep a Tuple’s precision.”
the depth of record subtyping (comparing recursively down to value types), read-only, and other RBS-record details.
types raised from Struct / Data.define (real Rigor’s DataClass / DataInstance).

Exercises

What type does a nested hash { a: { b: 1 } } become? Confirm with annotate.
Confirm that a = [1, "x"]\na[99] becomes nil, and explain why it doesn’t error.
How is a string key { "a" => 1 } handled now (only symbol keys are supported)? What caution is needed if you widen the support?

Next chapter (Part 7): at last we build accepts, which judges whether types fit each other. With the three values :yes / :no / :maybe, the “open” policy decided here comes into real effect too.

This chapter’s implementation (and answer key for the exercises) → impls/dist/part6/lib