Paul Blasucci's Weblog

weblog index

You Really Wanna Put a Union There? You Sure?

Published: Tuesday, 04 May 2021, at 16:05 UTC +02:00

(TL;DR -- I don't often use single-case discriminated unions. Jump down to the flowchart to see what I commonly use instead.)

In F#, single-case discriminated unions crop up fairly often. However, in many cases, there are other alternatives one might consider. In this post I want to sketch some of these alternatives, providing an account of the possibilities and -- more importantly -- the trade-offs involved. In order to keep this post to a reasonable length, I will consider three very common scenarios for which one might use single-case discriminated unions. Specifically, they are:

As Marker Values
As Tagged Primitives
As Value Objects

!!! DISCLAIMER !!!

The advice in this post is meant to be a set of guidelines. A starting point. A way of consciously evaluating decisions (instead of just blindly copy-pasting some code). Further, after nearly 15 years of doing F# professionally, I've developed a lot of 'instinctual decision making'. This blog is very much an attempt at unpacking some of that, and codifying it in a way others might leverage. In short, it represents how I approach things. This might not necessarily “work” for someone else. And it certainly won't cover all possible scenarios one might encounter. Still, I hope you find it useful.

Just take everything herein with a grain of salt, OK? 😉

Marker Values

For the purposes of this discussion, let's define a “marker value” as: a type only ever manifested in a single instance. Python's None is one example. The standard library in Pony also uses marker values to great effect. Though unlike Python, where None is a built-in construct, Pony leverages its flexible “primitive” construct. In F#, you might see a single-case discriminated union used for this purpose. In fact, one might argue any non-data-carrying variant of any discriminated union is an example of a marker value (but I don't want to wade too far into the weeds here).

type AmbientAuth = AmbientAuth

In any case, the main things one wants out of a marker value are:

It should not carry data
It should not support null-ness
It should be an actual type
It should have easy ergonomics (at the call site)
~~It should have only one instance~~ All occurrences should be semantically equivalent

So, what other alternatives are there in F#?

Plot Twist!!! SCDUs are the perfect type for this in F# (but do consider decorating them with the StructAttribute). Really?! Well, yes... and no. There is one caveat. If you intend to have your F# “marker value” be consumed by other .NET languages (e.g. C#, VB), you may want to consider a struct instead.

type AmbientAuth = struct (* MARKER *) end

Why something different for other languages? Because F# discriminated unions, regardless of the number of variants, generate some extra code (as part of the machinery encoding them onto the CLR). F# neatly hides this detail from you. However, it's plainly visible in, for example, C# consumers. So, using a struct produces a more language-neutral solution. But it will mean slightly more cumbersome invocation for F# consumers.

let auth = AmbientAuth
(* instantiation: ↑↑↑ Union vs Struct ↓↓↓ *)
let auth = AmbientAuth()

(*
!!! LOOK OUT !!!

let auth = AmbientAuth   ⮜ partially applied constructor -- NOT a value!
forgetting the parens ⮝⮝ is a subtle source of confusion
*)

So, to recap Marker Values present the following options and trade-offs:

Single-case Union	CLR Struct
Pro: Hits a syntactic 'sweet spot'	Pro: Language neutral
Con: Funky when used outside F#	Con: Instantiation cruft in F#

Tagged Primitives

Perhaps the most common usage of single-case unions -- or at least the one that leaps foremost into people's minds -- is that of a “Tagged Primitive”. In other words, a simple wrapper of some other basic type, usually meant to provide more context. Again, we can find similar constructs in other languages. Most notably, a newtype in Haskell. Some fairly common examples of such a thing in F# might be:

type ClientId = ClientId of Guid // ⮜ this should probably be a Value Object!

type Pixels = Pixels of uint64
    // NOTE other operators omitted for brevity
    static member ( + ) (Pixels l, Pixels r) = Pixels(l + r)

In fact, if you squint a bit, those even (sort of) look like a newtype. Alas, F# is not Haskell. Let me repeat that one more time for the folks in the cheap seats:

F# IS NOT HASKELL.

It really isn't. And it certainly doesn't have newtype. The above examples behave rather differently (than newtype) in terms of what the compiler emits. Requests to add such a feature have, to date, languished in discussion. So, instead, we have the previous example, or a few other alternatives. But first, let's have some criteria for why we'd want (need?) a newtype-like construct.

It should provide contextual information about the role of some code.
The 'new' type should be type-checked distinctly from the 'wrapped' type.
There are no behavioral restrictions imposed on the underlying type.

This seem like very useful goals. However, item 2 means we can not use type abbreviations. And item 3 requires a significant amount of 'boilerplate' code to do correctly (as we want to have a lot of behavior pass through to the underlying primitive). Also, let's pause to appreciate: if we want the opposite of item 3, then we are almost certainly talking about a “Value Object” (discussed later in this post). So what's left (besides single-case unions)? Depending on the type being wrapped, and the target usage, there are a few alternatives. Let's consider each in turn.

Units of Measure

Units of Measure meet all our given criteria. And syntactically they are very light-weight, as well. Further, they enforce the fundamental rules of mathematics at compile time (e.g. quantities of dissimilar units can be multiplied together -- but not combined via addition).

[<Measure>]
type pixels

let offset = 240UL<pixels>

However, they carry some heavy restrictions. They are limited to only numeric primitives (int, float, et cetera). They are erased at compile time (so no reflective meta-programming support -- and no visibility in other .NET languages). Additionally, since many operations in .NET are not “units aware”, it's not uncommon to have to explicitly temporarily discard the units for certain operations (only to re-apply them later). This unwrap-compute-rewrap dance has come to upset many an F# developer.

Generic Tags

It turns out, with just a little bit of hackery, we can actually get something like Units of Measure -- but for non-numeric types. We call this a Generic Tag. I won't go into the specific mechanics of it, though. There are a few different ways to achieve it. And all of them are definitely advanced (not to mention a bit... circumspect). However, there's a library which hides the details for many (most?) common non-numeric primitives. So I will should you an example.

open System

#r "nuget: FSharp.UMX"
open FSharp.UMX

[<Measure>]
type ClientId

let current : Guid<ClientId> = UMX.tag (Guid.NewGuid())

As with everything, there are some more trade-offs here. In fact, other than lifting the restriction to numerics, generic tags have all the same caveats as units of measure. Further, their are at least some “units aware” functions (abs, min, et cetera) in the core F# library. However, those are (again) limited to numeric types. For generic tags, you may well find yourself stripping off the tag frequently-enough that its usage becomes questionable. It's worth exploring -- but “individual milage may vary”.

Records

Since both Units of Measure and Generic Tags are erased at compile time, there really is no way to expose them to other .NET Languages. And, as mentioned in the section on Marker Values, even a Single-Case Union will surface some extra cruft when consumed from, e.g., Visual Basic. So, in cases where a (CLR) language-neutral implementation is required. We fallback to records, the bread-and-butter of F# data types.

type ClientId = { Value : Guid } // ⮜ this should probably be a Value Object!

type Pixels = { Value : uint64 }
    // NOTE other operators omitted for brevity
    static member ( + ) (Pixels l, Pixels r) = Pixels(l + r)

The drawback here is, clearly, the same as with single-case unions: more boilerplate, as we'd like all our behavior to pass through to the underlying primitive -- but need to code it ourselves.

Before moving on, the trade-offs for various Tagged Primitives can be summarized as follows:

Single-case Union	Units of Measure	Generic Tag	Record
Pro: Feel like a Haskeller 🙄	Pro: Low-syntax, No-overhead	Pro: Low-syntax, No-overhead	Pro: Language neutral
Con: Requires boilerplate	Con: Numeric types only	Con: Dodgy use of type system	Con: Requires boilerplate
Con: Funky when used outside F#	Con: Erased at compile-time	Con: Erased at compile-time
		Con: Standard library isn't “tag aware”

Value Objects

Earlier, I mentioned how a “primitive with customized behavior” is its own kind of construct (and the closely related notion: maybe your InvoiceIds shouldn't be divisible by any number?). But this is a very well-explored area of software development. It's called a “Value Object”. It's an essential part of Domain-Driven Design. And was first defined by Eric Evans in his seminal work on the subject. But let me re-iterate, here, the main aspects of a value object:

It lacks identity
It has no lifecycle
It is self-consistent

Unpacking these a bit...

“lacks identity” means there's no well-know identifier for this thing (e.g. row id = 12345). In practical terms, it means a thing with structural equality semantics. These are plentiful in F# (yay!).
“no lifecycle” means it's not data which evolves over time. Again, concretely, this means it's immutable. Cool. Lots of immutable constructs in F#.
“self-consistent” means if you've got an instance of it, that instance can be reliably assumed to contain valid (for a given domain) data. Oh, hmm... So this one doesn't come for free. In F#, we use carefully controlled construction functions to realize self-consistent values.

Fortunately, there are several ways to encode a value object in F#. The most common are as a discriminated union or as a record. Rarely, it will be beneficial to model the internal state of the value object as one of a set of mutually exclusive alternatives. In that (very infrequent) case, there's clearly only one tool for the job -- a multi-case union. However, in this post, we're focused on single-case unions. And they can be used to address the most common scenario of a value object simply wrapping one (or a few) field(s).

type Paint =
    private | Paint' of volume : float * pigment : Color

    member me.Volume =
        let (Paint' (volume=value)) = me in value

    member me.Pigment =
        let (Paint' (pigment=value)) = me in value

    // NOTE other members omitted

    static member Of(volume : float, color : string) : Result<Paint, PaintError> =
        // NOTE validation omitted

However, it's equally valid to use a record for this. In fact, the code is nearly identical. The only differences being how you define and access the underlying data.

type Paint =
    private { volume : float; pigment : Color }

    member me.Volume = me.volume
    member me.Pigment = me.pigment

    // NOTE other members omitted

    static member Of(volume : float, color : string) : Result<Paint, PaintError> =
        // NOTE validation omitted

In either case, the 'recipe' for a Value Object is roughly as follows:

Decide between a discriminated union or a record
Mark constructor and field(s) private
Add public construction (either by static methods or companion module)
Add any public read-only properties or operators (as needed)
Add any extra operations (either as methods or in a companion module)

But why pick a union? When should you prefer a record? Honestly, this comes down to personal preference and sense of style. Me? Personally, I prefer the record. Performing a decomposition just to access the internal state feels... gratuitous. But to each his own. I certainly wouldn't balk at code that chose to go with the union approach.

For completeness sake, the following table lists the trade-offs between unions and records for encoding a Value Object (though it's a very silly difference):

Single-case Union	Record
Con: Awkward access to underlying data	Pro: Direct access underlying data

Why lock the door if you're gonna leave the window wide open?!

One thing I've seen (far too often) is code like the following:

type Quality =
  private { value' : byte }

  member me.Value = me.value' // ⮜ This is... not amazing  :-(

  static member Of(raw) =
    { value' = min raw 50uy }

Personally, this seems like a big let down... especially after going through all the trouble of avoiding the so-called “primitive obsession”. It discourages consumers from thinking about the value object as a construct unto itself. And it doesn't provide any contextual guidance around the intended consumption of the underlying value.

So, how might we handle this instead?

Ideally, you can express behavior while never exposing the details of the underlying data. In F#, this could be realized via a module of associated functions, or some additional instance members, or even some active patterns. In any case, this drives the consumer to think about the object itself (rather than its implementation details).

type Quality =
  private Quality' of byte

[<RequireQualifiedAccess>]
module Quality =
  let MaxValue = Quality' 50uy

  let atLevel amount =
    let (Quality' limit) = MaxValue
    if amount < limit then Quality' amount else Quality.MaxValue

  let increaseBy amount (Quality' value) =
    let sum = value + amount
    let didWrap = sum < value
    if didWrap then MaxValue else atLevel sum

Please note: the code, as shown, only works because the union and the module are in the same “namespace declaration group”. They both have the same immediate parent 'container' (i.e. the same module or a namespace). Put differently, the private modifier, when used on records and unions, does not mean, “restrict to the defining type”. Rather it means, “restrict to the enclosing module or namespace”. By way of contrast, here's the same code using a record with instance members.

type Quality =
    private { level' : byte }

    member me.IncreaseBy(amount) =
      let sum = me.level' + amount
      let didWrap = sum < me.level'
      if didWrap then Quality.MaxValue else Quality.AtLevel(sum)

    static member AtLevel(level) =
      if level < Quality.MaxValue.level' then
        { level' = level }
      else Quality.MaxValue

    static member MaxValue = { level' = 50uy }

However, sometimes a more generic approach is warranted. In those cases, you can use an appropriately named function (or method), or even an explicit cast, to stress the notion of transitioning between types. This, again, centers the value object (from a consumer's point-of-view).

type Quality =
  private { value' : byte }

  override me.ToString() = $"%02i{me.value'}"

  // NOTE other behavior elided

  static member op_Explicit(quality : Quality) : int = int quality.value'

(* ... elsewhere ... *)

let quality = Quality.Of(7)
printfn $"%s{string quality}, or %i{int quality}"

Conclusion

Hopefully, by now, I've at least got you thinking about the various alternative approaches one might use when trying to 'level up' from simple primitives in F#. To further help, and for easier reference, I've also created the following flow chart (“How Paul Pick's His Primitives”, so to speak).

Have fun and happy coding!

My Instincts -- Visualized — Starting at the blue circle, follow the arrows.
Each yellow rectangle is a different use case.

The amber diamonds represent a yes-or-no decision.
By answering these, you arrive at a green capsule naming the F# feature I'd likely employ (given the previous constraints).