Fog Creek Software
Discussion Board




C faster than C++ Myth

In his column Joel stated:

Sometimes I choose raw C when I need blazing speed.

Especially given the rest of the column, Joel basically states that C is faster than C++. This is a very old myth that's been around since C++ was introduced, and has often been used by "old timers" that wanted to resist the need to change. C is not faster than C++ because C++ is a superset of C. You can take most any C program and compile it using a C++ compiler (watch out for variables named 'new' or 'delete' however) and get the same results.

But beyond that, you can use C++ features without incurring extra overhead, as long as you are aware of what you are doing. One common cause of C++ overhead are the temporary objects sometimes introduced by the compiler, but you can easily avoid them, e.g. pass arguments by const ref instead of by value.

I can even claim that using advanced C++ techniques, based on generics, you can achieve better performance than C because some of the computation is done at compile time (see Blitz++ at http://www.oonumerics.org/blitz/ for a well known example).

About the only place C has the advantage is with some of the newer modifiers introduced in C9X in order to bring it up to par with FORTRAN in math computation. Stuff like restricted pointers (see http://www.lysator.liu.se/c/restrict.html). These features will probably make it into C++ eventually as well.

Dan Shappir
Monday, May 06, 2002

In his book on C++'s design, Stroustrup makes it very clear that a huge criterion was to never include a feature that imposed costs when you weren't using it.  So either Joel implies that C is faster, or he trusts his C compilers to have less Issues in his quest for speed.

Kenneth Bloom
Monday, May 06, 2002

Besides, I don't see any reason why C should be faster than say, Pascal. It just depends on how well the compiler that you're using is optimizing the code.

For example, C allows you to write i++ instead of i := i + 1 but the compiled code can/should be the same.

Frederik Slijkerman
Monday, May 06, 2002

Funny thing is, Pascal often makes it easier for the compiler to generate more efficient code. For example in C/C++ the for loop is definition is very loose:

for ( init ; cond ; incr ) ...

where as Pascal's for is much more rigid:

for var := init-val to/downto end-val [step incr]

Also, in Pascal you can't modify the loop variable inside the loop. In other words, while the C/C++ compiler *can* generate code that is as efficient as the Pascal compiler, it has to be much smarter about code analysis in order to do so.

I think, however, that the main issue in the C vs. C++ debate is the perception that simpler lower-level languages must be more efficient than higher-level ones. Specifically, that procedural languages are more efficient the object-oriented languages. Maybe this perception is the result of the extra overhead of dynamic binding, though I'd argue that alternative techniques that achieve the same affect have similar overhead. And in C++ you need only use dynamic binding where it’s actually needed. In any  event, this is a perception that IMO is wrong.

Dan Shappir
Monday, May 06, 2002

And one more thing: in C++ if you want efficiency you should write ++i rather than i++. Obviously this doesn't mater if i is an int, but it can mater if i is a user-defined type (such as an iterator).

Dan Shappir
Monday, May 06, 2002

Wow, thanks for the nostalgia. This thread makes me feel like I'm reading USENET in the freshman CS lab in 1991.

My dad can beat up your dad
Monday, May 06, 2002

Wow, thanks for the nostalgia.  It reminds me that not every bit of code is a web interface to a server database.  How refreshing!

Nat Ersoz
Monday, May 06, 2002

OK, technically speaking, I did NOT say that C was faster than C++.

But while we're on the subject -- those of us who had the unfortunate luck to remember 16 bit programming on the 386 processor (e.g. Windows 3.0 and 3.1) will remember that loading the segment pointer register was something like 40 times slower than the JSR instruction. So calling a virtual method of a class which was always a far pointer on a 386 in 16 bit mode required a segment load and thus was dramatically slower than a traditional JSR.

A similar problem was that you tended to call a lot of constructors during startup. These constructors were scattered all over the code, requiring you to load bits of EXE from all over the hard drive. With the C compilers in existence it was easy to swap tune and make sure all initialization code was in the same segment thus minimizing the amount of disk I/O required to launch an app. But C++ compilers did not like to put constructors in a different segment than the rest of the class because they were trying to solve the previous problem.

This stuff may sound esoteric, but there was a time when swap tuning to make your app launch quickly was a significant competitive advantage.

For this reason (and others) in 16 bit Windows C++ was a definite speed hit; the first major Windows application in C++ was Borland Paradox which took something like a minute to load and even displayed a progress indicator while launching.

Joel Spolsky
Monday, May 06, 2002

Theory vs. Practice once more.

In theory, C++ is at least as good as C for anything, because it includes C as a subset. But that's about as informative as saying that all languages are equivalent because they are all Turing complete. True, but not really interesting. It's not the language that matters, it's the environment.

In practice, modern C++ programs don't have the ability to exercise the same kind of control that C programs do. The problem is with code you don't have full control of; STL does not give you enough control over memory allocation or run-time use. "Don't use STL then" is not really a viable option in many cases, because you have other people working on your project, and THEY'LL put STL in (they can be your collaborators, your clients, your contractors or your llibrary vendors). Same goes for exceptions, templates, virtual functions, multiple inheritence, overloading etc. All of them are generally useful, and all of them have tremendous implications when you need full control - to the effect that, in most working environments, using straight C is the only solution. C people (library vendors, etc.) are much more "control" minded than the C++ people.

Another example - the C++ environment promotes the use of I/O streams, which are an extremely bad idea in terms of robustness and performance (in general, and especially the C++ implementation; I haven't found one person with experience in robust high performance systems that does not agree with this assertion). Again "don't use them then" is often not an option in practice because of the surrounding environment.

There are also myths that work in the other direction, e.g., that C++ has superior performance because many things are inlined (e.g., STL, Blitz, Boost, anything with "inline" in it). All those who believe this using either gcc 2.9 or Visual Studio 6 are invited to turn on the "wasn't inlined" warning of their compiler and see the extent to which this is actually true (hardly, in general use, except Blitz through gcc which is surprisingly well optimized). But the mere assumption that this is the case leads to practically inefficient code because "the compiler will probably inline, substitute and fold this code properly".

When I write C code (or C style code in C++, for that matter), I can only have pleasant surprises from the optimizer - it can do better than I expect. When I write common-practice C++ code, I'm usually unpleasantly surprised at how little of what it could have done the compiler actually does. Which is not to say it doesn't have its place - only that the cost is loss of complete control.

Which is why, like Joel, when I want blazingly fast code, I write it in C. For me it's Python when speed is less of an issue and C when it is (with Swig or native binding between them). It's a winning combination. Try it, kids - you won't regret.

Ori Berger
Tuesday, May 07, 2002

What's the problem with I/O streams?

Frederik Slijkerman
Tuesday, May 07, 2002

In Practice C++ streams should be faster than C I/O because value type is determined as compile-time, using overloading, not at run-time based on a format string. Then only reason this is not always the case is because several C++ RTL providers have implemented C++ streams using C I/O functions. As a result, you have the overhead of an extra layer. Add to this the many (most?) C++ programmers I'm familiar with stick with fprintf.

As to STL: one of the cool things about it is the despite its abstraction of containers and algorithms its so damn efficient. Yes, you could probably write a case-specific container that is more efficient than STL, but it wouldn’t be easy. And even if you could, most programmers out there can't, or at least don't have the time to. And with STL you get the added benefit that if you change the algorithm you can rather straightforwardly replace the container type.

Basically the STL issue to me is not so much a C vs. C++ issue. It's a framework/library usage issue. Either you are willing to use/trust third-party code or you aren't.

I agree that C++ compilers aren't quit there yet. They probably never will be (though they are definitely getting better). This is why when I was a game developer we used to write the rendering code in handcrafted assembly (actually we would write C/C++ code, have the compiler generate the assembly and optimize that). The Python/C combination is sort of reminiscent of this technique. I find that C++ gives me the right level of abstraction that lets me concentrate on problems at hand, with good performance, without having to jump between languages. But that's just me.

Dan Shappir
Tuesday, May 07, 2002

Some people (including myself) had been turned off to C++ because of experiences with crappy STL implementations... If your impression of C++/STL comes from out-of-the-box VC++ 6 or GNU G++, then I can't blame you for feeling that bad taste in your mouth =). I encourage anyone who's been disappointed with executable size bloat and runtime speed to try out STLport... It was the first C++ library I found that truly delivers on the promise of high-level abstraction without unexpected overheads.

One thing to note re: inlining - inlining code is becoming less and less important these days due to the increasing gap between CPU speed and memory latency. Cache footprint is overtaking CPU cycle count as the #1 optimization priority... Unfortunately a lot of compilers (esp. G++) are still operating on the once-true-but-now-incorrect assumption that inlining is always a win. (case in point - a Linux kernel developer recently tried moving a few often-used functions out of line and saw something like a 10% speedup...)

Dan Maas
Tuesday, May 07, 2002

Hey Dan,

In relation to

"One thing to note re: inlining - inlining code is becoming less and less important these days due to the increasing gap between CPU speed and memory latency. Cache footprint is overtaking CPU cycle count as the #1 optimization priority... "

Are you saying that CPU core speed is increasing at a greater rate than memory latency is decreasing? Also, when you say cache footrpint, are you talking about an amount of an application that may remain resident in cache during runtime? If it's an issue of size, what's better, more or less? If bigger is better, how are the penalties of overcoming a cache miss dealt with in code? Or is that even possible?

I used to kind of a wannabe hardware junkie before I started coding so some of the things you mentioned just put questions in my head.

Thanx,
TRC

TRC
Tuesday, May 07, 2002

TRC - CPU core speeds are still shooting up along Moore's Law, but the time to fetch bytes from RAM is improving much more slowly (maximum RAM *bandwidth* is improving rapidly, but bandwidth benchmarks assume that you are streaming data in a predictable manner from a huge contiguous array... Most real code involves many 'ifs' and unpredictable branches, and is therefore limited by RAM latency rather than bandwidth)

I don't have precise numbers for you, but an L2 cache miss means something like tens or hundreds of lost cycles, and a miss to RAM means hundreds or thousands of cycles go to waste. So minimizing the size of your program's working set (both code and data) can really help performance... e.g. in the past it was often beneficial to construct a lookup table to cache the output of a CPU-intensive function; but today it make be faster to just churn through the function (and avoid the possibility of a cache miss).

Dan Maas
Tuesday, May 07, 2002

I've had both the chance and need to comapre Visual C++ 6 streams to fopen/fread/fprintf I/O on multiple occasions, and even though the streams sometimes were as fast as classic C I/O, they were never faster; Often they were slower (when complex formatting was involved, with noticable difference) - and the streams executable was significantly larger (when you're trying to create a 200K download back in the days of 14.4K modems, and find out that 500K out of your 900K download is the result of templated streams and vectors, well - you go back to C streams).

The problem with the whole stream paradigm is that it effectively (a) forces you to use blocking I/O, which means you suddenly have to deal with synchronization problems, thread overheads, etc - all of which are much more delicate than developers usually believe; and (b) promotes "on-the-fly" stream generation, which makes it relatively complex to efficiently parse and work with.

In "on the fly" generation, the content generator pushes data when it sees fit, and expects the consumer to adjust accordingly.  This nearly always results in inefficient code (you can't preallocate the right amount at the receiver since you don't know what it is), and usually in non secure code as well, because, e.g., a host name field is limited to 256 chars so why should we accomodate larger values for this field? [answer: because we don't want IIS to yield control to hackers].

A better way to handle this is to shift any possible concern to the generator, so that the producer can work efficiently. Usually, the generator can do it with much less work, and at the worst case - it will have to do the same work that a robust reciever would have had to do. I've been able to speed programs by two orders of magnitude while at the same time making them more robust by using this technique. And it's pretty universal. It's not impossible to do this with any standard stream (after all, all TCP communication is streamed), but most stream implementations in programming languages make it extremely awkward, with "on-the-fly" generation dominant as a result.

Another problem having to do with robustness is that of transactional state -  when serializing to or from a stream and detecting an error mid-way, robust recovery is in most cases impossible, unless all of the transactional semantics are extremely well defined; In 99% of the cases I've had a chance to meet so far, they aren't.

Streams are better in Lisp or Icon when you have better control of program flow through continuations. But streams, although simple to grasp and easy to deploy in simple cases, make it extremely hard to create a robust system.

The interchangability of, e.g., different containers in STL is a myth - it works only for extremely simple settings, most (or perhaps all) of which are already implemented inside STL.

Let's say I have a new sort algorithm. I implement it for vectors, then try to apply it to lists. No go - lists have no random access iterators. So I rewrite the algorithm for lists to rely only on sequential iterators. Works well; Eventually, I try to switch back to a vector implementation - and it works, except once in every 2000 runs, the program randomly crashes. (Bugs like this costs tens of thousands of dollars to find ...); Eventually, I find it's because, in contrast with lists, vector iterators are invalidated by changes to the vector, and I was somewhere relying on an iterator to remain valid. But it only happens in 2000 runs or so because the iterators USUALLY stay valid, but not always - I may have shipped a product with this bug because they USUALLY stay valid. [Note: STLport can catch this kind of error immediately in return for some execution overhead - USE IT AT LEAST IN DEBUGGING if you're not doing it already; CONSIDER USING IT IN RELEASE BUILDS AS WELL if you can stand the performance hit].

Ok, so if I want to write an algorithm that works on all containers, I have to rely on the lowest common denominator. Which means I can't do anything effective. Sure, "find_if", "count_if" and friends are all abstract, but there's nothing else useful left that's abstract enough to work for all containers and isn't part of the STL already. Bugs like the one I described above are much more common than most people are aware because of low QA standards we now have, and the general notion that it's ok that "software sometimes crashes".

I don't trust 3rd party code until it has been proved trustworthy. MFC, ATL and all STL implementations except STLport haven't proved that way in my book; FLTK and STLport have. My experience with frameworks is that - in general, they suck. Libraries are OK, and STL is a library more than it is a framework but I still don't like it.

I try to stay away from writing ASM code these days - I tweak the C code until the ASM output is the way I want it. Usually, it does the trick, and remains perfectly portable (Even if it might harm performance on platforms other than the one I'm optimizing for). It's been a long time since I last did even that - but I've had my share of time writing pure asm because no compiler was smart enough.

Inlining is about a-hell-of-a-lot-more than just saving the branch/return. It's about being able to optimize within context. If you have a function

int f(int a, int b, int c, int d) { return a*b+c*d; }

Then inlining it within:

int x[1000];
for (i = 0; i < 1000; ++i) x[f(0,0,1,i)] = f(1,2,3,4);

will allow it to compile into:

mov eax, 14
mov ecx, 1000
mov edi, offset x
rep stosd

And ... yes, gcc will do this in many cases. Unfortunately, this increases the code footprint. I'm not aware of a good heuristic that tells the compiler what should and what shouldn't be inlined.

Ori Berger
Tuesday, May 07, 2002

*  Recent Topics

*  Fog Creek Home