C# 8.0 nullable references: defeating the point with empty strings
In this post, I'll explain why you should avoid the temptation to apply a quick, and seemingly easy fix for a particular warning that can occur when you start using C#'s nullable references feature. You can silence the compiler by initializing string properties to an empty string, but you really shouldn't.
If you enable nullable references on an existing project, you will often see large numbers of warnings. A particularly common one is CS8618, complaining that a non-nullable property is uninitialized, e.g.:
With a string
property, you might be tempted to initialize it
with an empty string, e.g.:
This will make the warning go away, but it defeats the point of enabling nullable reference types. The whole reason C# 8 introduced its Nullable Reference Types feature was to avoid situations in which properties and other variables do not have a usable value. Before nullable references were available, we had do to one of the following:
- Use ad hoc means to attempt to ensure that variables have useful values by the time we use them, with no help from the compiler.
- Check for
null
before attempting to use variables and then decide what on earth to do if we get a null.
With nullable reference types enabled, then we can indicate explicitly the cases in which it is useful to our application to be able to represent the absence of a value, but by default the compiler will give us a great deal of assistance in ensuring that we don't accidentally try to use something that isn't there.
Initializing a property (or any other variable) with an empty string typically undoes any benefit of enabling nullable reference types, because the empty string effectively becomes null by another name. The fundamental problem remains: we have variables which might sometimes contain a special value that signifies the absence of a real value, we're just using a different special value now. And unlike null, the empty string gets no special analysis from the compiler.
The basic problem that CS8618 is warning us about above remains: it's possible to
finish constructing an instance of our Person
class without having provided a
meaningful value for FavouriteColour
. (If having no favourite colour is
a position you need your application to support, you can just use string?
,
because nullability is a reasonable way to indicate that. A better way than using
an empty string, in fact.)
If your application absolutely needs a property to be present, the best way to do this is to use the correct-by-construction technique describe in my recent blog on nullable references and serialization in which you define constructors that force code to supply all required values:
If constraints prevent you from using this technique (e.g, some frameworks require types to supply a zero-argument constructor) then that blog describes some alternatives that still enable you to retain the fundamental benefit of nullable reference types.
It's not really about null
The critical insight here is that none of this is really about null. It's about knowing whether a valid value is available at a particular point in your code. C# has always gone to some lengths to try to ensure this with its definite assignment rules, but it was easy to undermine the compiler's attempts to keep you safe: assigning null to a reference type variable would satisfy the definite assignment rules. These rules were not designed to determine whether the value assigned was actually usable.
In fact the same problem exists for value types. The compiler will require
a local variable of type int
to be initialized before you attempt to
read it, but it won't stop you from picking some arbitrary value just
to keep the compiler quiet. If you set an int
to 0, the compiler has
no way of knowing whether that 0 correctly reflects something about the application's state, or
is just a convenient default to use until the real value is available.
Using 0 as a placeholder value in an int
variable before you have the
real value is conceptually no different from initializing a reference
variable to null. Practically it has one difference: at least a null
reference will fail conspicuously if you erroneously attempt to use it
before it is ready; with an int
, if you accidentally end up using something that was
meant to be a placeholder value, you probably won't get a runtime error.
You'll just get incorrect behaviour.
And that's what's so insidious about initializing a string
variable
with an empty string as a lazy way to handle the fact that you don't
have the right value yet. You've still got the same basic problem you
had before, but it will now fail in less conspicuous ways, increasing
the chances of incorrect program behaviour going undetected. Better to crash
quickly before your program's logic error can cause lasting damage.
If you want to get the benefits that nullable reference types offer, you can't just do the minimum required to keep the compiler happy. (If you want to take that approach, the quickest path is just to turn the feature off again.) To reap the potential value, you need to use techniques such as those shown in my recent nullable references and serialization blog. Let the compiler guide you in reworking your code to reduce the chances of getting unintended nulls. It is by working with, not against the nullable references feature, that you will see the greatest improvements.