Paul Blasucci's Weblog

Thoughts on software development and sundry other topics

weblog index

You Really Wanna Put a Union There? You Sure?

Published:

(TL;DR -- I don't often use single-case discriminated unions. Jump down to the flowchart to see what I commonly use instead.)

 

In F#, single-case discriminated unions crop up fairly often. However, in many cases, there are other alternatives one might consider. In this post I want to sketch some of these alternatives, providing an account of the possibilities and -- more importantly -- the trade-offs involved. In order to keep this post to a reasonable length, I will consider three very common scenarios for which one might use single-case discriminated unions. Specifically, they are:

  1. As Marker Values
  2. As Tagged Primitives
  3. As Value Objects

!!! DISCLAIMER !!!

The advice in this post is meant to be a set of guidelines. A starting point. A way of consciously evaluating decisions (instead of just blindly copy-pasting some code). Further, after nearly 15 years of doing F# professionally, I've developed a lot of 'instinctual decision making'. This blog is very much an attempt at unpacking some of that, and codifying it in a way others might leverage. In short, it represents how I approach things. This might not necessarily “work” for someone else. And it certainly won't cover all possible scenarios one might encounter. Still, I hope you find it useful.

Just take everything herein with a grain of salt, OK? 😉

Marker Values

For the purposes of this discussion, let's define a “marker value” as: a type only ever manifested in a single instance. Python's None is one example. The standard library in Pony also uses marker values to great effect. Though unlike Python, where None is a built-in construct, Pony leverages its flexible “primitive” construct. In F#, you might see a single-case discriminated union used for this purpose. In fact, one might argue any non-data-carrying variant of any discriminated union is an example of a marker value (but I don't want to wade too far into the weeds here).

type AmbientAuth = AmbientAuth

In any case, the main things one wants out of a marker value are:

  1. It should not carry data
  2. It should not support null-ness
  3. It should be an actual type
  4. It should have easy ergonomics (at the call site)
  5. It should have only one instance All occurrences should be semantically equivalent

So, what other alternatives are there in F#?

Plot Twist!!! SCDUs are the perfect type for this in F# (but do consider decorating them with the StructAttribute). Really?! Well, yes... and no. There is one caveat. If you intend to have your F# “marker value” be consumed by other .NET languages (e.g. C#, VB), you may want to consider a struct instead.

type AmbientAuth = struct (* MARKER *) end

Why something different for other languages? Because F# discriminated unions, regardless of the number of variants, generate some extra code (as part of the machinery encoding them onto the CLR). F# neatly hides this detail from you. However, it's plainly visible in, for example, C# consumers. So, using a struct produces a more language-neutral solution. But it will mean slightly more cumbersome invocation for F# consumers.

let auth = AmbientAuth
(* instantiation: ↑↑↑ Union vs Struct ↓↓↓ *)
let auth = AmbientAuth()

(*
!!! LOOK OUT !!!

let auth = AmbientAuth   ⮜ partially applied constructor -- NOT a value!
forgetting the parens ⮝⮝ is a subtle source of confusion
*)

So, to recap Marker Values present the following options and trade-offs:

Single-case Union CLR Struct
Pro: Hits a syntactic 'sweet spot' Pro: Language neutral
Con: Funky when used outside F# Con: Instantiation cruft in F#

Tagged Primitives

Perhaps the most common usage of single-case unions -- or at least the one that leaps foremost into people's minds -- is that of a “Tagged Primitive”. In other words, a simple wrapper of some other basic type, usually meant to provide more context. Again, we can find similar constructs in other languages. Most notably, a newtype in Haskell. Some fairly common examples of such a thing in F# might be:

type ClientId = ClientId of Guid // ⮜ this should probably be a Value Object!

type Pixels = Pixels of uint64
    // NOTE other operators omitted for brevity
    static member ( + ) (Pixels l, Pixels r) = Pixels(l + r)

In fact, if you squint a bit, those even (sort of) look like a newtype. Alas, F# is not Haskell. Let me repeat that one more time for the folks in the cheap seats:

F# IS NOT HASKELL.

It really isn't. And it certainly doesn't have newtype. The above examples behave rather differently (than newtype) in terms of what the compiler emits. Requests to add such a feature have, to date, languished in discussion. So, instead, we have the previous example, or a few other alternatives. But first, let's have some criteria for why we'd want (need?) a newtype-like construct.

  1. It should provide contextual information about the role of some code.
  2. The 'new' type should be type-checked distinctly from the 'wrapped' type.
  3. There are no behavioral restrictions imposed on the underlying type.

This seem like very useful goals. However, item 2 means we can not use type abbreviations. And item 3 requires a significant amount of 'boilerplate' code to do correctly (as we want to have a lot of behavior pass through to the underlying primitive). Also, let's pause to appreciate: if we want the opposite of item 3, then we are almost certainly talking about a “Value Object” (discussed later in this post). So what's left (besides single-case unions)? Depending on the type being wrapped, and the target usage, there are a few alternatives. Let's consider each in turn.

Units of Measure

Units of Measure meet all our given criteria. And syntactically they are very light-weight, as well. Further, they enforce the fundamental rules of mathematics at compile time (e.g. quantities of dissimilar units can be multiplied together -- but not combined via addition).

[<Measure>]
type pixels

let offset = 240UL<pixels>

However, they carry some heavy restrictions. They are limited to only numeric primitives (int, float, et cetera). They are erased at compile time (so no reflective meta-programming support -- and no visibility in other .NET languages). Additionally, since many operations in .NET are not “units aware”, it's not uncommon to have to explicitly temporarily discard the units for certain operations (only to re-apply them later). This unwrap-compute-rewrap dance has come to upset many an F# developer.

Generic Tags

It turns out, with just a little bit of hackery, we can actually get something like Units of Measure -- but for non-numeric types. We call this a Generic Tag. I won't go into the specific mechanics of it, though. There are a few different ways to achieve it. And all of them are definitely advanced (not to mention a bit... circumspect). However, there's a library which hides the details for many (most?) common non-numeric primitives. So I will should you an example.

open System

#r "nuget: FSharp.UMX"
open FSharp.UMX

[<Measure>]
type ClientId

let current : Guid<ClientId> = UMX.tag (Guid.NewGuid())

As with everything, there are some more trade-offs here. In fact, other than lifting the restriction to numerics, generic tags have all the same caveats as units of measure. Further, their are at least some “units aware” functions (abs, min, et cetera) in the core F# library. However, those are (again) limited to numeric types. For generic tags, you may well find yourself stripping off the tag frequently-enough that its usage becomes questionable. It's worth exploring -- but “individual milage may vary”.

Records

Since both Units of Measure and Generic Tags are erased at compile time, there really is no way to expose them to other .NET Languages. And, as mentioned in the section on Marker Values, even a Single-Case Union will surface some extra cruft when consumed from, e.g., Visual Basic. So, in cases where a (CLR) language-neutral implementation is required. We fallback to records, the bread-and-butter of F# data types.

type ClientId = { Value : Guid } // ⮜ this should probably be a Value Object!

type Pixels = { Value : uint64 }
    // NOTE other operators omitted for brevity
    static member ( + ) (Pixels l, Pixels r) = Pixels(l + r)

The drawback here is, clearly, the same as with single-case unions: more boilerplate, as we'd like all our behavior to pass through to the underlying primitive -- but need to code it ourselves.

Before moving on, the trade-offs for various Tagged Primitives can be summarized as follows:

Single-case Union Units of Measure Generic Tag Record
Pro: Feel like a Haskeller 🙄 Pro: Low-syntax, No-overhead Pro: Low-syntax, No-overhead Pro: Language neutral
Con: Requires boilerplate Con: Numeric types only Con: Dodgy use of type system Con: Requires boilerplate
Con: Funky when used outside F# Con: Erased at compile-time Con: Erased at compile-time
Con: Standard library isn't “tag aware”

Value Objects

Earlier, I mentioned how a “primitive with customized behavior” is its own kind of construct (and the closely related notion: maybe your InvoiceIds shouldn't be divisible by any number?). But this is a very well-explored area of software development. It's called a “Value Object”. It's an essential part of Domain-Driven Design. And was first defined by Eric Evans in his seminal work on the subject. But let me re-iterate, here, the main aspects of a value object:

  1. It lacks identity
  2. It has no lifecycle
  3. It is self-consistent

Unpacking these a bit...

Fortunately, there are several ways to encode a value object in F#. The most common are as a discriminated union or as a record. Rarely, it will be beneficial to model the internal state of the value object as one of a set of mutually exclusive alternatives. In that (very infrequent) case, there's clearly only one tool for the job -- a multi-case union. However, in this post, we're focused on single-case unions. And they can be used to address the most common scenario of a value object simply wrapping one (or a few) field(s).

type Paint =
    private | Paint' of volume : float * pigment : Color

    member me.Volume =
        let (Paint' (volume=value)) = me in value

    member me.Pigment =
        let (Paint' (pigment=value)) = me in value

    // NOTE other members omitted

    static member Of(volume : float, color : string) : Result<Paint, PaintError> =
        // NOTE validation omitted

However, it's equally valid to use a record for this. In fact, the code is nearly identical. The only differences being how you define and access the underlying data.

type Paint =
    private { volume : float; pigment : Color }

    member me.Volume = me.volume
    member me.Pigment = me.pigment

    // NOTE other members omitted

    static member Of(volume : float, color : string) : Result<Paint, PaintError> =
        // NOTE validation omitted

In either case, the 'recipe' for a Value Object is roughly as follows:

  1. Decide between a discriminated union or a record
  2. Mark constructor and field(s) private
  3. Add public construction (either by static methods or companion module)
  4. Add any public read-only properties or operators (as needed)
  5. Add any extra operations (either as methods or in a companion module)

But why pick a union? When should you prefer a record? Honestly, this comes down to personal preference and sense of style. Me? Personally, I prefer the record. Performing a decomposition just to access the internal state feels... gratuitous. But to each his own. I certainly wouldn't balk at code that chose to go with the union approach.

For completeness sake, the following table lists the trade-offs between unions and records for encoding a Value Object (though it's a very silly difference):

Single-case Union Record
Con: Awkward access to underlying data Pro: Direct access underlying data

Conclusion

Hopefully, by now, I've at least got you thinking about the various alternative approaches one might use when trying to 'level up' from simple primitives in F#. To further help, and for easier reference, I've also created the following flow chart (“How Paul Pick's His Primitives”, so to speak).

Have fun and happy coding!

 

Starting at the blue circle, follow the arrows.

Each yellow rectangle is a different use case.

The amber diamonds represent a yes-or-no decision.

By answering these, you arrive at a green capsule naming the F# feature I'd likely employ (given the previous constraints).

My Instincts -- Visualized