Fog Creek Software
g
Discussion Board




Why are .NET strings immutable?

I understand that the .NET string class is immutable and therefore that once the value of a string is set it can't be changed.

I also understand that if lots of string modifications are involved I need to use a StringBuilder object otherwise I'll get bad performance from allocating lots of string objects that will also get garbage collected.

What I don't understand is why MS designed the .NET string class to be immutable in the first place?

Disadvantages I see are:
1) Code will be longer and harder to read with StringBuilders being created all over the place and requiring the final toString() call rather than simple easy to read operators on string objects.
2) Many less knowledgeable (or perhaps just plain lazy) programmers will not use stringbuilders and will get very bad performance in loops over string operations without the cause being obvious.

Advantages:
Are there any?

mutabled
Wednesday, February 25, 2004

because internally a string is represented by a char array which cant be resized! they have no choice but to make it immutable, except maybe for changing chars at different posistion in an array.

the artist formerly known as prince
Wednesday, February 25, 2004

Here's a good discussion of the reasons - it's Java based, but anyway...  http://www.churchillobjects.com/c/11027b.html

R1ch
Wednesday, February 25, 2004

Immutable strings are a performance enhancement. When I write:

string foo = "Hello, there!";
string bar = foo;

I have a single string in memory with two references. If strings were mutable, then I either couldn't do this, or I'd have to employ a copy-on-write system, with its associated performance penalties.

Brad Wilson (dotnetguy.techieswithcats.com)
Wednesday, February 25, 2004

I think this topic is also well covered in "Effective Java".  My guess is that this book would also be useful for C# programmers.

name withheld out of cowardice
Wednesday, February 25, 2004

The problem is not that they are immutable, but that the performance vis a vis StringBuilder exposes the implementation of the string to the programmer.

I should be able to say:
string foo;

for( int i = 0; i < 1000; i++ )
{
  foo = i.ToString( );
  // do some other stuff with foo here
}

and the *optimizing* compiler, recognizing that the StringBuilder is more efficient, should use that instead.

MR
Wednesday, February 25, 2004

"The problem is not that they are immutable, but that the performance vis a vis StringBuilder exposes the implementation of the string to the programmer."

The Java explaination linked to by R1ch above shows why having the String immutable is not directly related to performance.  That makes any discussion of performance somewhat moot.

Almost Anonymous
Wednesday, February 25, 2004

I'm sure that I've read somewhere that Sun's Java compiler can convert String concatenation to StringBuffer calls in simple cases anyway.

r1ch
Wednesday, February 25, 2004

I've read that too.  I'm sure .NET does something similar.

Almost Anonymous
Wednesday, February 25, 2004

Back when I was playing around with C# I had an application that does something like this:

string temp;

foreach database row in sometable
{
  temp = "";

  foreach column in row
    temp += column.data + newline;

  write temp to file;
}

I noticed performance in this loop seemed to poorly scale (it wasn't quite n^2, but it was definitely n^1.xx).  I asked a collegue about it, and he suggested I switch to the stringBuilder.

I did, and performance improved dramatically.  A 10,000 row dump took something like 30-40 seconds with the strings vs. about 2-3 seconds with stringbuilder.

MR
Wednesday, February 25, 2004

"because internally a string is represented by a char array which cant be resized!"

It's not a fundamental problem for C++'s std::string, not that I'm claiming that to be a perfect.


"If strings were mutable, then I either couldn't do this, or I'd have to employ a copy-on-write system, with its associated performance penalties."

I hate to refer to C++ again because I'm a strong advocate of not making programming any harder than it has to be, but; I seemed to survive just fine being able to modify strings or copy/reference them as I pleased. I'm more concerned about the 'invisible' and non-deterministic performance penalties of extra garbage collection through using strings instead of StringBuilders because it's very hard to see the overhead by looking at the code.


"Here's a good discussion of the reasons - it's Java based"

Interesting. To avoid the problems mentioned I would want that in a mutable-string world the hashtable in the example would hash from the value of the string and not the object itself (potentially by calling a getHashCode() method on the key object). This would mean that later changing the value of the object key would not affect the mapping in the hash table as the hashtable's not referencing the key object at all. This would be an overhead on copying the key's value but no more memory expensive than holding on to the original key object as java actually does. Copying large objects could be expensive, although surely there's some way of being able to take a hash key of a complex object and enough data to do the equality test without copying the whole thing? I realise that perhaps this is wanting to have my cake and eat it.

Re: thread safety - i agree about avoiding race conditions and that is obviously desirable but if this is such a motivation then why don't we see immutability practically everywhere?

My concern is that immutable strings force the knowing programmer to use StringBuilder to get decent performance which negate any of the benefits of thread safety etc while making the code lengthy and hard to read. Conversely the unknowing/lazy programmer will write code the obvious way but get poor performance and have little chance to spot the reason because it's caused by the underlying implementation, something OO was meant to help us abstract away from. I think it's quite a high price to pay for thread safe initialisation and no copying of the value of hash table key's.


" the *optimizing* compiler, recognizing that the StringBuilder is more efficient, should use that instead"

Given the number of people talking of the performance improvement from explicitly using StringBuilder over String I find it hard to believe this works particularly well, otherwise we'd all happily forget my original concerns and just use strings everywhere. I've compiled a release build of the following C#

String myString = "";
for(int x=0; x<1000; x++)
{
    myString += x;
}
System.Console.WriteLine(myString);

and get the following from ILDasm, I'll leave it to someone more experienced than me to work out what the heck it's doing and whether that constitutes using a StringBuilder but it doesn't look like it to me...

.locals init (string V_0,
          int32 V_1)
  IL_0000:  ldstr      ""
  IL_0005:  stloc.0
  IL_0006:  ldc.i4.0
  IL_0007:  stloc.1
  IL_0008:  br.s      IL_001b
  IL_000a:  ldloc.0
  IL_000b:  ldloc.1
  IL_000c:  box        [mscorlib]System.Int32
  IL_0011:  call      string [mscorlib]System.String::Concat(object,
                                                              object)
  IL_0016:  stloc.0
  IL_0017:  ldloc.1
  IL_0018:  ldc.i4.1
  IL_0019:  add
  IL_001a:  stloc.1
  IL_001b:  ldloc.1
  IL_001c:  ldc.i4    0x3e8
  IL_0021:  blt.s      IL_000a
  IL_0023:  ldloc.0
  IL_0024:  call      void [mscorlib]System.Console::WriteLine(string)
  IL_0029:  ret




Surely there has to be some crushing argument why immutable strings are so much better than mutable strings to justify the performance overhead and their adoption in both java and c#?

mutabled
Wednesday, February 25, 2004

Designs are compromises.
The article on churchillobjects.com sums up the advantages of this compromise.
If I'm correct, having immutable and mutable sequence types is not particular to Java or .Net.

GP
Wednesday, February 25, 2004

The main reason for immutable Strings: you can't trust other programmers to cooperate.  Which is also the same purpose for private methods, local variables, final classes, and other restrictions.

Say you maintain a String as a private variable.  You also have a get() method that returns the value of that String.  For example:

//your class
String getName() {
  return name;
}
------------
//somebody else's class
String s = yourObject.getName();

If Strings were mutable, the code that receives the String from your getName() method could then do something like s.setContent("xyz") which would also change the contents of the string which you are holding as a private variable.

That has obvious dangers and unintended consequences, so to prevent that you would be forced to always create a copy of the String before returning it to anybody else.  Creating that copy every time would get annoying and some programmers would forget to do it.  Hence the benefits of immutable Strings.

Of course, the same issues also apply to other compound objects that are held as private variables.  However, in a good design there should not be a common need for returning such an object to an outside class while also maintaining a reference privately.  But Strings are conceptually used like primitives, so there will always be a need to pass them around frequently.

Another such class that is commonly used like a primitive but is NOT immutable is Java's Date class.  It turns out that the lack of an immutable Date class was an admitted mistake by Sun, not a deliberate decision.

T. Norman
Wednesday, February 25, 2004

It's about correctness, not performance.

Immutable objects are less error prone than mutable ones. Although this often involves a performance tradeoff, the designers of .NET decided this tradeoff was worthwhile.

They probably did this because they provide you with ways to get the performance of mutable strings when required (by StringBuilder). In most cases, this optimisation is probably not going to gain you anything (you often jam a few strings together in code that's rarely run), so you don't need to do it if you don't have a performance issue.

The compiler could user StringBuilder "under the covers", and I'm sure if Microsoft's testing shows this to be a worthwhile optimisation, they'll do it in a future version. Most likely they'd do it at JIT compile time, not at compile time from C# to IL. That way it would benefit all languages that run on .NET.

Sum Dum Gai
Wednesday, February 25, 2004

Mutabled,
Looking at std::string "operator +" I can't understand why it is better then C#'s one from performance investigation point of view?
It still have to allocate new string with length of 2 combined strings. It hides the fact of reallocation the same way C# will.

WildTiger
Wednesday, February 25, 2004

"Interesting. To avoid the problems mentioned I would want that in a mutable-string world the hashtable in the example would hash from the value of the string and not the object itself (potentially by calling a getHashCode() method on the key object). This would mean that later changing the value of the object key would not affect the mapping in the hash table as the hashtable's not referencing the key object at all."

The hash table *has* to keep a reference to the object, or a copy of it.  The hash value alone is not enough, because there may be a collision.  So by making strings mutable you basically require the code to copy the string any time it is given to someone who shouldn't be able to change it.  Since strings tend to be used more like primitive types than compound types, it makes sense to make them immutable so they can't be changed by someone who shouldn't be doing so.

Requiring a knowledgeble programmer to do efficient string processing is nothing new.  A naive programmer in C will do much worse than someone who knows all about how strings in C work.  I don't really see how it is a big deal.  If you are going to be doing a lot of mutating of strings, create a temporary StringBuilder to do the work then put it back in the string, and you get the best of both worlds

MikeMcNertney
Thursday, February 26, 2004

*  Recent Topics

*  Fog Creek Home