High-performance C#: a test pattern for ref structs
C# 7.2 added support for ref struct
types, and as I've discussed before, these are critical to achieving high throughput in certain kinds of code by minimizing GC overhead and copying. But you pay a price in flexibility. In that last article, I showed how to work around the fact that you cannot use a ref struct
inside an async
method. In this article I'll discuss an issue we ran into when writing tests for the Ais.Net parser I recently blogged about.
We have numerous tests for each AIS message type our parser supports. We typically define a test for each element of the message that our parser can extract, and we normally supply multiple examples. For example, in a Position Report Class A (one of the most common formats in which vessels report their location, heading, and other information about their progress), one of the fields is the Repeat Indicator, a common element found at the start of many AIS messages. The test looks like this:
You can see it in context here.
We're using SpecFlow, which enables us to write tests in the Cucumber language. It's more commonly used for test specifications that express requirements in terms of an application's business domain, and it's slightly unusual to use it for a unit test. However, we found it made for highly readable tests for this library. In particular, Scenario Outlines were very well suited to testing Ais.Net—they enable us to write a single test definition and then to define multiple sets of inputs and the corresponding expected outputs. Here you can see we've got 4 tests with obviously faked up payloads (evident from the fact that they're mostly 0), one for each of the possible values for this field, and then 4 examples taken from real messages transmitted by actual vessels, also covering all 4 possible values. We find Cucumber to be a more convenient and readable way to express this sort of thing than the data-driven features of any of the popular .NET test frameworks. It's one of the reasons we like SpecFlow a lot at endjin.
But I digress. The important point here is that we repeatedly execute the same step—When I parse '<payload>' with padding <padding> as a Position Report Class A
—with various different inputs. In fact, if you go and look at the full feature file you'll see that all the tests use that same step, because all the tests entail parsing a message.
SpecFlow will execute all the steps in this Scenario Outline once for each row from the Examples:
table. We want to pass the first two columns of each row to the constructor for the message parser, like this:
Normally, when writing the code for this sort of test step, you'd just store the result of this expression either in the SpecFlow ScenarioContext
or in a field of the step bindings class. However, we can't do that here because NmeaAisPositionReportClassAParser
is a ref struct
.
So that's the challenge this whole post is about: how can we write tests for ref struct
in the way we want, given the restrictions these types impose?
As you may recall from the previous blog posts linked to above, ref struct
types have some desirable performance characteristics. They are a key part of the features added to C# 7.2 that make it possible to write libraries such as Ais.Net that can perform high speed parsing with minimal copying of data and very low GC overhead. But you pay for this efficiency in constraints: in particular a ref struct
type can only live on the stack. (And in case you're having a knee-jerk "No, C# structs don't always live on the stack" reaction, yes, I know that, but ref struct
types are an exception: they really do absolutely have to live on the stack in the current .NET runtime implementations.) This means we can't store it in a context or step binding object, because those live on the heap.
For the test to work, we're going to need to execute the test specified by the Then
clause in such a way that the NmeaAisPositionReportClassAParser
under test is above it on the stack. This is not totally straightforward because the way SpecFlow works is that each step in a test is implemented as a method that is executed completely before moving onto the next step. SpecFlow requires our When
clause to complete before it will start the Then
clause.
We therefore need to defer the work specified in the When
clause until SpecFlow is ready to run the Then
clause. The code implementing our Then
steps typically looks something like this:
This passes a callback containing the test to a helper method that will construct the parser in the manner previously described by the When
clause, and then pass that into the callback. And the code for those When
clauses typically looks like this:
So rather than constructing the parser, we create a callback which, when invoked, will construct the parser using whatever arguments the test requires. This is what enables the deferred operation. The When
helper used here just stores the callback in a field, which the Then
helper then uses when it's time to run the test for real:
The ParserMaker
type here is a delegate type defined by the test class:
We need to define one of these for each parser type. You might be wondering why we don't just use a generic delegate type here, e.g. Func<NmeaAisPositionReportClassAParser>
. It's because you cannot use a ref struct
type as a generic type argument. The reason is that there are all sort of constraints on what you can do with ref struct
types, but if you could just plug them into any old generic type, that might let you bypass these restrictions. For example, suppose some generic type declares a variable of type T
in an async
method. If the compiler let you use a ref struct
as the argument for that type parameter T
, that would provide a sneaky way to use a ref struct
in an async
method. Since the compiler blocks use of ref struct
in these situations for good reasons, it would be bad to be able to bypass the restrictions.
The obvious way to fix this would be for C# to introduce a new kind of generic constraint. You could imagine writing class <T> where T : ref struct
. Any type or method declared with such a constraint would prevent you from using the type parameter anywhere that a ref struct
is not allowed, and with that guarantee in place it would then become safe to supply a ref struct
as a type argument. Unfortunately, no such generic constraint exists today. (And even if it did, it wouldn't enable us to use Func<T>
because that type wouldn't have this constraint anyway.)
So we have to define a dedicated non-generic delegate type, something that's very rarely necessary.
With these elements in place, we can write tests in the obvious way, with separate steps for describing what we want to do and what the outcome should be, while fitting into the constraints imposed by a ref struct
.
The effect of the test steps above is as though we'd written this code:
That seems a lot simpler, and you might be wondering why we didn't just write that in the first place. But it's not really quite that simple: where do payload
and padding
come from here? In SpecFlow tests, we separate out the setup and the expected outcome: When
steps define the setup, and Then
steps define the particular thing we want our test to verify. This means that the inputs to the test normally aren't directly available to the step that performs the assertion. We could of course modify our feature file so that we pass everything into the Then
step, but that would mean making the test specifications look weird just to work around some technical constraints. I prefer to keep test specification files as readable as possible, so that's not a good option. (This separation of concerns is, after all, part of the point of writing tests this way.) Or we could make the When
step store those inputs in fields, so that the Then
step has access to them, and can construct the NmeaAisPositionReportClassAParser
itself. But I don't really like that either: while it leaves the feature file looking clean, it would make the associated step bindings harder to follow because we would have moved the setup out of the step that's supposed to be defining the setup.
So the advantage of this technique is that it enables feature files to read naturally, and for setup and test code to go where you'd expect it to, while fitting around the constraints imposed by ref struct
types.