Fog Creek Software
Discussion Board




Shlemiel the painter's algorithm

Reminded me of the article I read on this site...

http://www.eskimo.com/~scs/C-faq/q15.4.html

Uh isn't this kind of bad from a guy who wrote a pretty popular book?

http://www.amazon.com/exec/obidos/tg/detail/-/0201845199/qid=1054705827/sr=8-4/ref=sr_8_4/104-4076808-5225553?v=glance&s=books&n=507846

Sure the running time is not really relevant to the question, but I'm sure someone has copied that code and put it in their own program...

Choop
Wednesday, June 04, 2003

What's your problem with it?

He checks the length first, then allocates a buffer big enough to hold it, then copies the data.

That's the quickest way you can do it, unless I'm very much mistaken.

The painter's algorithm would be to reallocate after each string copy.

And the horse you rode in on
Wednesday, June 04, 2003

Search the site for "Shlemiel" and you will find the article Joel wrote about C strings.  He illustrated a common problem with using the standard library function "strcat", which concatenates two strings together, when you're using more than a couple strings.

Basically the algorithm is more inefficient than it has to be because of the way C chooses to represent strings, though it probably is not going to make a difference in most applications.

So the author of this book "C FAQs" wrote the inefficient "painter algorithm" version of a function to concatenate a bunch of strings together, while he could have done what Joel suggested in his article, which is much more sensible.  Though it might not make a difference in real life, it would probably grate on the nerves of any programmer who knew what the problem is, and how trivial it is to fix it.

To all the programmers out there, I see Joel's point but I actually have never seen a C program that DIDN'T use null-terminated strings... does anyone else program in C and not use C strings?  Most of the C programs I have written/seen in general do not use strings a lot , i.e. I have a skewed experience.

Andy

Choop
Wednesday, June 04, 2003

Weird, one of the posts disappeared.  I replied to someone who was a non-programmer asking what the deal was.

Anyway, read over Joel's article and note the difference between strcat and mystrcat.  Basically with the code in this C FAQ, on the ith iteration of the while loop with strcat, strcat will have to iterate over all the characters of the previous i-1 strings.  Thus turning a O(n) algorithm into an O(n^2) algorithm, roughly (ignoring the lengths of the individual strings).  If strcat returns a pointer to the end of the concatenated string, then you can use it to skip over all the i-1 strings coming before.

But you're right that reallocing after every concatenation would be another, even worse, way to "Shlemiel" this algorithm.  Which is basically what happens if you improperly use a C++-style string object with + overloaded, e.g.

for( int i = 0; i < numStrings; i++ )
  result = result + str[i];

instead of

  result += str[i];

with += overloaded NOT to realloc every single time or use strcat, basically using the length of the string intelligently.  (But even this is not as fast as checking the length first and never reallocing, and never iterating through already copied characters, but that's a bit nitpicky).

I think it's kind of funny that the guy uses stuff like (void)strcat, which is completely anal retentive and only obscures the code for a beginner, while not mentioning that his algorithm has a suboptimal running time.

Andy

Choop
Wednesday, June 04, 2003

It's inconvenient to not null-terminate strings, since you then can't pass them directly to libraries or system calls that want char*'s. But I do keep lengths sometimes - more for automatically lengthening buffers than for fast concatenation. My concatenator uses the dumb O(n^2) algorithm, but it's never been a performance issue. (although I do tend to use interned strings for a lot of things - i.e. one master hash table with precisely one instance of each string - makes it possible to compress long strings and do concatenation just working with pointers.).

Dan Maas
Wednesday, June 04, 2003

If you have to assemble records in structs then you tend to have to convert strings to chopping the null byte, usually replacing it with a size byte at the front of the string just like good ol' Basic.

Oh, the good reason for losing the null byte in records is that if you leave it in that null becomes data and not meta data.  It works fine if you use something like scanf to read it back in but other languages might bollix it up.

Simon Lucy
Wednesday, June 04, 2003

At a former employer I developed an application that used a "string object", yet in plain C.

The application (Unix only; MS systems were not up to the job) had to run setuid root and then switch between two user IDs.  Having seen too many buffer overflow exploits, I wrote a proper string library and used it throughout.

Probably the most stable application the company ever had. :-)

Today, with the advent of standard C++, I am no longer doing such silly things.

Inode
Saturday, June 07, 2003

*  Recent Topics

*  Fog Creek Home