Paul Blasucci's Weblog

Thoughts on software development and sundry other topics

weblog index

Getting JSON.NET to "Talk" F#, Part 1: Tuples

Published:

JavaScript Object Notation (hereafter, JSON) has become a very popular means of encoding data in a "plain text" format. And, as with so many things, there are several implementations of JSON-encoding libraries available in the .NET ecosystem. One such library, which I've used quite a bit, is the Newtonsoft JSON.NET library. It's both simple and efficient. However, it has a bit of trouble understanding some of F#'s "bread and butter" data types. Fortunately, the JSON.NET library also provides several extensibility points. In this post, we'll extend this library to support one of F#'s most fundamental types -- the tuple. (Please note: I've assumed you already have a good working knowledge of F# and the .NET run-time.)

Before diving into the "meat" of our converter, let's look at a sample of it in action, taken from an F# interactive session (where I've added some white space for the sake of clarity).

> open System;;
> open Newtonsoft.Json;;
> open Newtonsoft.Json.FSharp;;

> let employee = (012345,("Bob","Smith"),28500.00,DateTime.Today);;
val employee : int * (string * string) * float * DateTime = (12345, ("Bob", "Smith"), 28500.0, 7/4/2011 12:00:00 AM)

> let converters : JsonConverter[] = [| TupleConverter() |];;
val converters : JsonConverter [] = [|FSI_0006.Newtonsoft.Json.FSharp.TupleConverter|]

> let rawData = JsonConvert.SerializeObject(employee,converters);;
val rawData : string = "{\"Item1\":12345,\"Item2\":{\"Item1\":\"Bob\",\"Item2\":\"Smith\"},\"Item"+[49 chars]

> let backAgain : (int * (string * string) * float * DateTime) = JsonConvert.DeserializeObject(rawData,converters);;
val backAgain : int * (string * string) * float * DateTime = (12345, ("Bob", "Smith"), 28500.0, 7/4/2011 12:00:00 AM)

> printfn "%b" (employee = backAgain);;
true val it : unit = ()

As eluded to in the previous example, we can encode (and decode) tuples of any length by enriching JSON.NET with a custom type converter. This may seem involved, but we'll break the actual code into logical easy-to-digest "chunks". First, we've got some "boiler-plate" code which wires our class into the JSON.NET machinery.

type TupleConverter() =
  inherit JsonConverter()

  override __.CanRead  = true
  override __.CanWrite = true

  override __.CanConvert(vType) = vType |> FSharpType.IsTuple

We start by inheriting from JsonConverter, which is the abstract base class provided by the Newtonsoft library for building custom type converters. As part of inheriting this class, we must tell JSON.NET whether our class is meant to be used for serialization (i.e. CanWrite = true), deserialization (i.e. CanRead = true), or both. We also provide an implementation of the CanConvert method. This method will be invoked (potentially very frequently) at run-time when JSON.NET wants to know if it should transfer control to us. Our logic here is very simple: if the input type is a tuple, we want it and return true; otherwise, we're not interested and return false. Of course, the "is it a tuple?" check is delegated to a helper function provided by the F# run-time. Next, we've got to implement the methods for doing the actual encoding and decoding.

  override __.WriteJson(writer,value,serializer) =

Overriding the WriteJson method allows us to turn tuple instances into JSON. The Newtonsoft machinery passes three values into our method. The first, writer, is the stream to which we should write encoded data. Next up: value is the actual tuple instance to be serialized. And third comes serializer, which is a general sort of context which is threaded throughout the serialization process.

The algorithm for encoding is actual very simple and aligns with the way tuples appear when used in other .NET languages (e.g. C#). Specifically, the tuple is turned into an object with a property for each tuple element. The name for each property is the word "Item" suffixed by the tuple element's one-based index. So, the value

("paul",32)

will be encoded to

{ "Item1" : "paul"; "Item2" : 32 }

To realize this algorithm, we use reflection to get the list of tuple fields. Then we iterate over those fields, writing each value to the output after being sure to emit the appropriate property name.

  let fields = value |> FSharpValue.GetTupleFields
  fields |> Array.iteri (fun i v ->
    // emit name based on values position in tuple
    let n = sprintf "Item%i" (i + 1)
    writer.WritePropertyName(n)
    // emit value or reference thereto, if necessary
    if v <> null &amp;&amp; serializer.HasReference(v)
      then writer.WriteReference(serializer,v)
      else serializer.Serialize(writer,v))

Of course, these values need to be wrapped in curly braces (i.e. WriteStartObject and WriteEndObject). Also, in case any users of our converter want to use JSON.NET's instance tracking feature, we'll add a one-liner which optionally records the existence of the tuple being processed (i.e. WriteIdentity). Finally, we'll include a bit of defensive coding, leaving the implementation of WriteJson as follows.

  override __.WriteJson(writer,value,serializer) =
    match value with
    | null -> nullArg "value" // a 'null' tuple doesn't make sense!
    | data ->
        writer.WriteStartObject()
        let fields = value |> FSharpValue.GetTupleFields
        if fields.Length > 0 then
          // emit "system" metadata, if necessary
          if serializer.IsTracking then
            writer.WriteIndentity(serializer,value)

          fields |> Array.iteri (fun i v ->
            // emit name based on values position in tuple
            let n = sprintf "Item%i" (i + 1)
            writer.WritePropertyName(n)
            // emit value or reference thereto, if necessary
            if v <> null &amp;&amp; serializer.HasReference(v)
              then writer.WriteReference(serializer,v)
              else serializer.Serialize(writer,v))
        writer.WriteEndObject()

Now on to the most complex portion of this converter -- deserialization.

  override __.ReadJson(reader,vType,_,serializer) =

We'll again override a method; this time it's ReadJson. The JSON.NET runtime will pass us four pieces of data when invoking our override. The first, reader is the stream of JSON tokens from which we'll build a tuple instance. Second, we have the CLR type which JSON.NET thinks we should return. Next up is any existing value the Newtonsoft machinery might have for us. We'll be ignoring this parameter, as it's not useful for our purposes. The last piece of input is serializer, which we've already seen in the WriteJson method.

In order to generate a tuple properly, we need all of its constituent values up front. However, the Newtonsoft machinery is designed around advancing through the input stream one-token-at-a-time. To make this work, we'll read the entire object (all of the key/value pairs between the curly braces) into a Map<string,obj> instance, via a recursive helper function.

  let readProperties (fields:Type[]) =
    let rec readProps index pairs =
      match reader.TokenType with
      | JsonToken.EndObject -> pairs // no more pairs, return map
      | JsonToken.PropertyName ->
          // get the key of the next key/value pair
          let name = readName ()
          let value,index' = match name with
                              // for "system" metadata, process normally
                              | JSON_ID | JSON_REF -> decode (),index
                              // for tuple data...
                              // use type info for current field
                              // bump offset to the next type info
                              | _ -> decode' fields.[index],index+1
          advance ()
          // add decoded key/value pair to map and continue to next pair
          readProps (index') (pairs |> Map.add name value)
      | _ -> reader |> invalidToken
    advance ()
    readProps 0 Map.empty

One of the interesting aspects of the readProperties function is it's input. When called, we'll give it an array of the CLR types which comprise the tuple. Then, while stepping through the JSON tokens, we can match "raw" value to CLR type as part of the deserialization process. This introduces a subtle wrinkle, though. We should ignore this type information when we encounter any Newtonsoft "metadata" in the input stream. We accomplish this by keeping track of an offset into the type array, which will only get incremented when the key/value pair under scrutiny is not "metadata". Now with the actual JSON traversal finished, we can analyse our Map<string,obj> and take appropriate action.

If the map is simply a reference to data which has already been decoded, it will only contain an identifier as such. We can use this identifier to fetch the tuple instance from the JSON.NET run-time context.

 | Ref(trackingId) ->
     // tuple value is a reference, de-reference to actual value
     serializer.GetReference(string trackingId)

If the map holds a more sophisticated set of key/value pairs, we'll use it as input to the construction of a new tuple instance.

  | Map(data) ->
      let inputs =
        data
          // strip away "system" meta data
          |> Seq.filter (fun (KeyValue(k,_)) -> k <> JSON_ID)
          // discard keys, retain values
          |> Seq.map (fun (KeyValue(_,v)) -> v)
          // merge values with type info
          |> Seq.zip fields
          // marshal values to correct data types
          |> Seq.map (fun (t,v) -> v |> coerceType t)
          |> Seq.toArray
      // create tuple instance
      let value = FSharpValue.MakeTuple(inputs,vType)

This bit of logic simply massages the map into an array of the appropriate values, and uses a simple helper function from the F# run-time to instantiate the tuple. Finally, we'll put this code together with some helper methods, and some caching logic (again, in case any users of our converter want to use JSON.NET's instance tracking feature), which leaves the complete method as follows.

  override __.ReadJson(reader,vType,_,serializer) =
    let decode,decode',advance,readName = makeHelpers reader serializer

    let readProperties (fields:Type[]) =
      let rec readProps index pairs =
        match reader.TokenType with
        | JsonToken.EndObject -> pairs // no more pairs, return map
        | JsonToken.PropertyName ->
            // get the key of the next key/value pair
            let name = readName ()
            let value,index' = match name with
                                // for "system" metadata, process normally
                                | JSON_ID | JSON_REF -> decode (),index
                                // for tuple data...
                                // use type info for current field
                                // bump offset to the next type info
                                | _ -> decode' fields.[index],index+1
            advance ()
            // add decoded key/value pair to map and continue to next pair
            readProps (index') (pairs |> Map.add name value)
        | _ -> reader |> invalidToken
      advance ()
      readProps 0 Map.empty

    match reader.TokenType with
    | JsonToken.StartObject ->
        let fields = vType |> FSharpType.GetTupleElements
        // read all key/value pairs, reifying with tuple field types
        match readProperties fields with
        | Ref(trackingId) ->
            // tuple value is a reference, de-reference to actual value
            serializer.GetReference(string trackingId)
        | Map(data) ->
            let inputs =
              data
                // strip away "system" meta data
                |> Seq.filter (fun (KeyValue(k,_)) -> k <> JSON_ID)
                // discard keys, retain values
                |> Seq.map (fun (KeyValue(_,v)) -> v)
                // merge values with type info
                |> Seq.zip fields
                // marshal values to correct data types
                |> Seq.map (fun (t,v) -> v |> coerceType t)
                |> Seq.toArray
            // create tuple instance (and cache it if necessary)
            let value = FSharpValue.MakeTuple(inputs,vType)
            if serializer.IsTracking then
              match data |> Map.tryFindKey (fun k _ -> k = JSON_ID) with
              // use existing ""
              | Some(k) -> serializer.AddReference(string data.[k],value)
              // make a new ""
              | None -> serializer.MakeReference(value) |> ignore
            value
        | _ -> raise InvalidPropertySet
    | _ -> reader |> invalidToken

And that's what's needed to get JSON.NET to properly understand tuples of any length. Hopefully, this post has helped to shed some light on an important -- but relatively undocumented -- aspect of one of the better libraries currently available in the .NET ecosystem. (It should be noted, however, there is one "feature" of JSON.NET which this converter does NOT support: embedded type information. In brief, this is one feature I wish was never added to any JSON library... but that rant could be a whole separate blog entry.) In future posts, I will (hopefully) provide similar coverage of converters for other idiomatically F# types like discriminated unions and linked lists.

The complete source code for this class, as well as some other useful code for combining F# and JSON.NET, can be found in a GitHub repository.