Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

Why does a string has to be nullable?

This question occured in the thread "Why is Use of unassigned local variable an error?"

A string is the only base type which is nullable. Why? When do we need to know when a string is not assigned? And why don't we need to know when ints or bools are unassigned?

Whenever I have a string I expect it to be a string, that means I can do any string operation on it. But that's not the case when the string is null.

What follows is code littered with zillions

    if (aString == null)

and the silliest of them all, the empty string test:

    if (aString == null || aString == "")

If the concept of uninitialized variables is that important, why is it not built into the language/framework itself, like:

    if (string.IsEmptyOrNull(aString))

But that would not be necessary if a string variable was guaranteed to have a value.

Thomas Eyde
Saturday, October 18, 2003

A string is null, because it's a reference type. It's a reference type, because of performance concerns.

Simple as that.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, October 18, 2003

Optimized how? Is a value type creation more expensive than a reference type creation? I don't buy that. How much memory does an empty string need?

What about programming efficiency? I don't believe it's impossible to build the string as a value type but still be optimized.

It's about having something which looks like a duck, walks like a duck, really be a duck. I don't have to initialize a string variable with

    new String();

hence, in behaviour, the string is a value. It should not be null.

Thomas Eyde
Sunday, October 19, 2003

Strings are reference types, because their immutability allows a string "copy" to be a simple copy of the reference. If the string itself were a value type, then such an optimization wouldn't be possible (copys would mean copying the entire string itself, instead of just a reference).

Brad Wilson (dotnetguy.techieswithcats.com)
Sunday, October 19, 2003

Use VB.NET and ignore the difference between null and "". Just don't use any String methods without checking for Nothing first... :-(

Mark Hurd
Monday, October 20, 2003

On the subject of evaluating an empty string, comparing it to "" or string.Empty is not efficient according to MS.

They say: "Comparing strings using the System.String.Length property is significantly faster than using Object.Equals".

So, instead of
          if (s1 == "")
they recommend
          if (s1 != null && s1.Length == 0)

Why doesn't the compiler do it for you (empty string comparison => .Length substitution) is anyone's guess.

More details:
http://www.gotdotnet.com/Team/FxCop/docs/rules/Performance/EmptyStringCompare.html

id
Monday, October 20, 2003

I've taken my stand on VB.NET. It's not an option.

It is the checking on null/Nothing I want to get rid of. So switching to VB.NET gains nothing.

Simple copying: A value object can have internal reference objects, can't it? I see no reason why a string can't be a value, but still optimized.

In the good, old VB6 days it was also more efficient to compare against the string length and not the empty string. I have chosen to do so because of readability. The statement (myString == "") says clearly I want to know if a string is empty.

Thomas Eyde
Tuesday, October 21, 2003

Thomas,

With all due respect, please stop being such a whiny little kid. Seriously. This board is populated with your petulant gripes about the way you would or wouldn't have done something. Too bad. Get over it. If you're so unhappy, do us all a favor and go find a new thing to learn and whine about so we don't have to listen to it.

Brad Wilson (dotnetguy.techieswithcats.com)
Tuesday, October 21, 2003

Brad, I'll do you the favour and stop. I don't get a real answer anyway. The only answers so far are variations of "string is a reference type, null is here to stay, get over it".

No one really bothered to investigate the mismatch between implementation and actual usage, which is what I care about.

A whiner? Perhaps. You don't have to read my posts if you don't like them. I guess I stepped on someone's toes here because of the activity and the fact that someone believes it is necessary to ask me to stop.

One last comment before I leave: Brad, if you are so smart and not immature like me, why didn't you give a decent answer long time ago? Most of my posts are attempts to explain what I really mean. Obviosly I failed.

Thanks anyway. It has been interesting.

Thomas Eyde
Thursday, October 23, 2003

Thomas...
Here's what it is: a leaky abstraction.
It dovetails nicely with Ori's 'every line of code is a liability' comment on anther thread.

Strings are a reference type because they're big. No value type can as large (several k is easy, you can do several mb), no value type is duplcated as often and called different names. (How many instances of "INPUT" does an HTML parser have?)

But to be nice to you, because strings are used so often, the designers of C# let you pretend it's a value type. Every once in a while, things break down, so there are really three types: values, references, and strings.

That's really all there is to it. Or at least all I see to it, not having anything to do with the design of the language, or even having read about it. It just feels that way after having used 'pointer' languages (C, etc) for years and C# more recently.

But as has been said in other threads, maybe this doesn't feel comfortable to you. Pick another language, there's no reason not to. Many people understand and are happy with the way C# works, but it's become a bit of a liability because you (and probably others) don't agree with the decision made.

mb
Friday, October 24, 2003

Another factor in designing strings to be reference types is the need for a null String. In many usage scenarios, the empty string is a valid value, distinct from null.  (This is called a total function - it is defined over all values of its domain) In these cases it is important to have available a value that is not in the domain to indicate the absence of information.  Implementing this kind of thing with value types requires you to designate a special value, otherwise legal, to provide the same information.  It's a poor design.

As far as checking all over the place for a null string, a good approach is to narrow down the spots where a null string can appear.  In other words, build the software so that a null string cannot find its way into a place where you do not want it to be.  Put the burden of assuring this condition on the caller by making this assumption explicit in your documentation.

string cheese
Sunday, October 26, 2003

*  Recent Topics

*  Fog Creek Home