C# faux amis 1: discards and underscores
In the introduction, I described the idea of faux amis in spoken languages and suggested that they can also afflict programming languages.
An obvious example in C# is the underscore character ('_'). Several programming languages let you use this character in places where the formal language structure requires an identifier, but you don't wish to supply one.
For example, imagine an API that requires you to write a callback function that takes two arguments, but in your application, you have no use for one of them. Instead of giving a name to the unused argument, many languages let you write _ to indicate that the argument is unused.
C# 7.0 picked up on this in a few places. For example, you can discard a value that a method returns through an out
parameter. Here's the signature of such a method:
Suppose that in the context in which you're calling this method, it will always fill the entire buffer
with data. This means you wouldn't need to inspect the value returned through the bytesRead
argument. Before C# 7, you would need to declare a variable just to hold this unwanted result:
Since C# 7.0, you have been able to write this instead:
You can also discard elements during deconstruction, e.g.:
We can use discards in patterns too. For example, a switch
statement (using the support for patterns that C# 7 introduced) might include this:
This pattern will match if the argument to the switch
is of type int
, and it does nothing with the value. You can use the same pattern in an is
expression (although in this example you probably wouldn't because you can remove the _ here without changing the effect of the code):
It's also common to write a lambda that ignores one or more of its arguments, e.g.:
But did you spot the odd one out?
One of these things is not like the others, in a way that becomes clear if we need to discard more than one thing. With most of these examples you can just add more discards, using the '_' character every time. But if you try that with a lambda it will fail:
This will not compile, and not just because it resembles a bench in a nudist colony as seen from behind. (In fact, the use of ASCII art of this kind to represent a special value called 'bottom' has a long and rich history in certain parts of computer science.) If you try to write a lambda in this way, the compiler will complain that you've used the same argument name twice, meaning that we're left with this rather ungainly idiom:
And so on to triple, quadruple, etc. underscores if you need to ignore more arguments. (Gallingly, although StyleCop was fixed to recognize both _ and __ as ad hoc discards, for some reason it was not fixed for the general case, so StyleCop will still complain if you write (_,__,___)
.)
Why can we use multiple discards all with '_' in the other scenarios but not here? It's because an underscore lambda parameter is a faux ami: it's not actually a discard, it just looks like one. It's really a parameter called _. You can use it in the lambda's body:
Lambdas are different because they were added to C# before discards. (Lambdas arrived in C# 3.0, discards in C# 7.0.) A plain underscore has always been legal as an identifier in C#, so there would have been no apparent reason to prevent anyone from using _ as a lambda parameter name back in C# 3.0. And since lambda parameters must have unique names, it is illegal to use _ twice in a single parameter list.
The compiler's-eye view
It can sometimes be easier to understand code if you know how the compiler processes it. So in this section I've annotated the various examples from the preceding section with how they appear in the compiler's syntax tree. I used the Syntax Visualizer in Visual Studio to discover this (although sadly, Microsoft removed the feature that would turn the syntax tree into a picture for you, so these are all drawn by hand). If you can't find the Syntax Visualizer in your copy of Visual Studio, run the installer, and ensure that .NET Compiler Platform SDK is checked in the Individual Components section.
Here's how Roslyn (the compiler API) sees an out
discard:
The syntax tree is the same regardless of whether out
is followed by a discard, or an existing variable. Only the semantic tree will tell you directly whether the identifier represents a discard.
Here's a deconstructing assignment:
Again, the syntax tree doesn't tell us that the identifier signifies a discard here. That is only made explicit in the semantic tree, when we ask for the symbol associated with the identifier.
Here's a type pattern in a switch
statement's case
:
And here's the same pattern in an if statement:
And here's the lambda:
Even in the world of the compiler's syntax tree, it's not always immediately obvious that you might be dealing with a discard. In some constructs, it is syntactically unambiguous: when a type pattern (or, as Roslyn calls it, a DeclarationPattern
) uses _ as the variable designator, that shows up as a DiscardDesignation
. And in a parameter list, it is never a discard. But in the other cases we had to dig further, into the semantic tree, to understand when a discard is present. This is because even where discards are supported, there may be complications caused by backwards compatibility constraints.
Further complications
No matter where we might like to use them, discards are only supported in language constructs introduced in or after C# 7.0. (I don't know if the language designers considered retrofitting discards to lambdas. Since attempting to use _ in more than one place causes a compiler error today, this doesn't seem like it would risk changing the meaning of existing programs. However, it might break tools such as analyzers and code fixers that are expecting lambda parameter names to be distinct.)
Even within new languages features designed to support discards, the fact that _ is a valid identifier causes some complication. Consider this example of a deconstruction assignment (a language feature added in C# 7.0, and as such, one with discard support fully baked in):
This is similar to the deconstruction assignment I showed earlier, but this does not declare any new variables. To be able to compile, this requires the symbol x
to identify something assignable. (It could be a local variable, but it would also be acceptable for x
to be a property or a field.) But here's the potentially surprising part: we can't tell from this line of code alone whether the _ is a discard. Here's an example containing that statement, preceded by a couple of variable declarations:
In this case, the _ is not a discard. Instead, the final statement assigns the second part of the deconstructed p
into the local variable named _
. A lone underscore like this will only be treated as a discard if no symbol named _
is in scope. (This is why in the Roslyn compiler API, we needed to dig into the semantic tree. Syntactically, that underscore will be represented in the syntax tree as an IdentifierName
, and until the compiler has done the analysis to find out whether it refers to an existing variable in scope, there's no way to know whether or not it's a discard. That's why you need to look up its symbol in the semantic tree and check if it's a DiscardSymbol
to discover whether it represents a discard.)
But I digress. Although, there is lexical ambiguity between discards and variables in deconstruction assignments, these aren't strictly faux amis by my definition. (Technically, it's the same construct in both cases.) Moreover, this only tends to bite if you like using _ as a symbol name. It falls into the broader category of ways in which programming languages get increasingly weird and complex corner cases as they age. Don't we all?
My main complaint here is the ugliness of ignoring multiple lambda arguments brought on by a faux ami: an idiom intended to look like a discard, but which isn't.