Fog Creek Software
Discussion Board




Compile Times

Someone recently wrote:
"Even so, C++ is slow to compile. My C# projects can compile 10x -- sometimes even 50x -- as much code in the same time as the C++ compiler. "

1.  Is this true?  How do you measure how much code there is.  Lines, statements?  Typically in a higher level language each statement will be compiled into more instructions, but the comparison between C++ and C# is hard because one compiles down to machine language while the other does not.

2.  MOST C++ projects I would say have less than optimal "physical designs" (a term I would attribute to Lakos), i.e. the header structure is a f***ing mess.  If you could get rid of the idea of translation units in C++, and maintain what I guess is a database of types and prototypes across files, would it speed up the compilation or not?

3.  Is there anything else in the C++ langauge that would make it slower to compile than C# or Delphi or whatever.

4.  Any comments about the effect of optimizations on compile time.

I am interested in this because it seems like compile times are a problem for rapid development.  It's harder to get in the groove when you make a minor change a header, then have to recompile half the damn project before you continue.

It would be nice to hear from someone who has written a compiler, and not just wild speculation.

Andy
Friday, June 27, 2003

I find that the compile times for some C++ projects are bad purely because of the physical design issue.

The coupling is often screwed and every file pulls in far more than it needs. Reducing the coupling quickly improves rebuild times and also means that an isolated change doesn't cause the whole world to rebuild.

I blame precompiled headers and the wizard way of "include stdafx.h" everywhere. It may speed things up if done properly but the additional coupling isn't worth it IMHO.

I'm currently dealing with this kind of thing for a client. I've reduced the build time considerably by reducing unrequired coupling.

So, to answer your question ;) I think C++ allows more scope to screw up the project structure than C# does...

Len Holgate
Friday, June 27, 2003

This has the taste of urban legend.  Before I believe any language can compile the same function/feature set 50x faster, I would like to see independent benchmark examples.  This with real side by side comparisons of the code, functionality and compiler output. 

Compile time is a developer issue.  For non-trivial systems, inefficient compiler output is worse than making you wait.

Mike Gamerland
Friday, June 27, 2003

I have worked a lot both in Visual C++ and Delphi, and I can tell you that the compile speed of C++ is much lower than that of Delphi.

The compile time is the main reason I want a very fast machine.

K9
Friday, June 27, 2003

True, the effect of pulling up far more code than required is one helluva problem. I found that just taking all those header & cpp files which are normally just helpers for all the other code & which change once in a millenium :) out & making a separate dll out of them improved build times a lot.

Ajit
Friday, June 27, 2003

Does anyone have good links to sites that explain the best way to reduce coupling in header files? Examples would be really handy too...

jedidjab
Friday, June 27, 2003

While physical structure is a major point, there is one feature of C++ that can arbitrarily lengthen compile times which other languages just don't have: templates. Compile-time programming obviously can result in longer compile times =)

Sebastian Wagner
Friday, June 27, 2003

Why exactly do templates increase compilation times?

Frederik Slijkerman
Friday, June 27, 2003

It turned out that C++ with templates constitute a full-blown "meta programming language". You can do type computations, conditional statements, loops, (type) list processing etc.  These are all compile-time computations, i.e. you abuse the compiler to do such calculations, which costs time. For more info, see http://www.boost.org/libs/mpl/doc/paper/html/index.html

No linker can deal with templates properly; I once heard that templates in C++ were the first feature that was beyond the abilities of old C style linkers. Can't say much more about the deeper reason, however.

- Roland
Friday, June 27, 2003

I can tell you from experience: C++ is damned hard to parse, because the language is about 60% ambiguous. C# and Java are simpler and can be approximated with an LL(k) parser. To write an intelligible (hence maintainable) parser for C++, IMO, you really need to tackle it with a Generalised LR parser that can cope with the ambiguity by spitting out an ambiguous parse tree. The only other way is essentially to merge the tokenising, parsing and symbol lookup phases with feedback.

One reason the toolset for C++ is generally so poor when compared to other languages is simply due to the ambiguity. It's really hard to get useful information. As I've posted on another thread, part of VC6's Wizard problems, where the Wizards can get confused and trash your source code, comes from the fact that there are at least five parsers, each of which reads the code in a different way: ClassWizard, ClassView, IntelliSense, Browse Information. Oh, and the compiler itself, of course ;)

I still prefer it to anything else, though. Clearly humans (or at least, some humans) can cope with more ambiguity than our current parsing algorithms can. Of course we all get tripped up occasionally where a whole statement could be read in more than one way, and the disambiguating rule in the language standard works the other way to what we were expecting (example: if a statement can be read as an expression or as a declaration, the declaration wins - you can then get semantic errors if it wasn't intended as one).

Anyway, that's one reason C++ compile times are longer. Another reason is size of headers and templates. With metadata-exposed languages like VB, C# and Java, once a module is compiled, the compiler can read the exported declarations directly. The C++ compiler must read the header file completely, preprocess it to remove sections and comments, include contained header files, then parse the declarations and add them to the symbol table, even if few of the functions are used.

If you're compiling for Windows, you'll typically be including windows.h. This includes a whole load of other headers, which are huge - typically 13000+ lines.

If you have the option, use precompiled headers for anything you don't expect to change (the Windows headers are a good start). If you have a mixed C/C++ program, try to compile the whole lot as C++ if you can, or try two different precompiled headers, one for the C parts and one for the C++ parts. Alternatively build one or both parts as static libraries and get the linker to put it together. That way you can get precompiled header benefits on both parts of the program.

Other suggestions: use #pragma once with Visual C++, which will cause VC not to even try to read the header if it's already included. It's rare that you'll want to include a header more than once in the same translation unit. If you can't do this, another good trick is to surround your #includes with #ifdef commands. An example: say the file MyHeader.h uses a #ifdef guard of __MYHEADER_H__. In a file that includes it, do something like:

#ifndef __MYHEADER_H__
#include "MyHeader.h"
#endif

This stops the compiler having to locate the file, open it, read the header guard #ifndef and then read the whole file to find the matching #endif. Compilation is largely I/O bound.

Mike Dimmick
Friday, June 27, 2003

C# has fewer file dependencies than C++. If you change one C++ header file, then tons of .cpp files (that #include the header file) must be recompiled. Since Java and C# don't have header files, you just have recompile the on .cs file.

runtime
Friday, June 27, 2003

Funny this topic should come up.  I spent a couple of days earlier this week trying to decouple headers in my project.  After two days of this (remove, compile, fix errors, repeat), I had removed many #includes and made NO measurable improvement in compile time.

So, I quit trying to do that.

BTW, this is Visual Studio .NET with C++ code.

Anyone know of some quick tips or tools to make this effective?

David
Friday, June 27, 2003

Well, it seems to me that the simplest possible solution to improving compile times is just to split up your C++ files.  I have seen files that are 7000-8000 lines long, which I think is ridiculous.  I heard somewhere that 500-2000 lines is reasonable, I find this to be a good rule of thumb.  I would like to have less than 20 functions per file.

When you split, of course you have to remove #includes that are no longer needed.  If you find that the splits haven't resulted in removing #includes, then perhaps your logical design has too many dependencies, and you will actually have to change code to get your compile times to be faster.

The reasoning here is: If you change a type T, you have to recompile ALL THE CODE that #includes the header which contains T.  That includes functions that just happen to be in the same file as a function which needs T, but which themselves do not need T.  And that includes entire files that just need the header for type T2, and not T.  So basically from that you can infer that statistically speaking, changing a header will require you to compile less code if your source files are on average smaller.

That said, I would say for many projects, there are simply unneeded #includes, or #includes that should be in the C file but occur in the .h file.  Getting rid of these obviously comes first.  Think about it: when you delete some code, do you always look at the #includes to see if there is anything you can delete?  Probably not.

Do any compilers do any sort of caching that basically invalidates what I just said?

Andy
Friday, June 27, 2003

Also, to Mike Dimmick:

I have heard that C++ is very difficult to parse and I can easily imagine why.  In fact I heard that many major compilers all use the same "front end", the EDG one.  If I am not mistaken, the front end for a compiler includes the parsing engine.

What do you mean by C++ being ambiguous?  I mean it obviously isn't ambiguous in one sense because the standard AFAIK specifies what every statement is supposed to mean.  But yes there seem to be a lot of hacks, like the typename keyword, and having to do > > to end nested templates because of ambiguity with >>.

What about VC7?  I heard there is a source code model to programmatically access C++ source.  Does this work well?  Or does it not go down to the statement level?

Andy
Friday, June 27, 2003

<BTW, this is Visual Studio .NET with C++ code.

Anyone know of some quick tips or tools to make this effective? >

If your project is not specifically set up for precompiled headers, you will probably get a speed boost by turning them off. So that is one thing to try.

(The reason is, I believe, that VC++ precompiles headers for a specific sequence of #includes with a specific set of #defines, so with auto use it will rebuild them afresh for each file where a new sequence occurs.)

Add WIN32_LEAN_AND_MEAN to your predefined preprocessor symbols if you are including windows.h a lot.

If you want to try adding precompiled headers to the project, you can get a demonstration of how they should be set up[0] by setting up a new console application with precompiled headers using the new project wizard. All files have set "use precompiled header" with the name of a .h file; a single .cpp file has "create precompiled header" with that .h file; _all files_ #include that .h file as their first one.

Include your headers-that-don't-change-often from this .h file, and hopefully you will then have decreased increased compile time.

Whilst this can help compile speeds greatly (it's a real pain compiling others' projects when they don't use this!) it's obviously not always the easiest thing to retrofit to an existing project :(

[0] -- well, there may be some other ways!

Tom
Friday, June 27, 2003

It was me who made the claim. I have two comparably sized projects, when considering on-disk size. In order to test, I did two of each compile type, and used the second (faster) compile. I have 1GB of RAM, so all the files can easily fit in the disk cache (and on the second compile, the disk doesn't make a sound).

The machine is a P4-2.53GHz, 1GB of DDR333, 80GB ATA-133 drive with 8MB on-board cache, running Windows XP Professional and Visual Studio .NET 2002.

One is the current C# project I'm working on. It's made up of 2533 .cs files comprising 5.4 megabytes of source code. It draws from most of the major parts of .NET, including ASP.NET and ADO.NET (no Winforms).

The other is a C++ project I worked. It's made up of 434 .cpp files and 706 .h files comprising 5.9 megabytes of source code. It draws from ATL primarily, with no MFC support. It's properly using pre-compiled headers, and most of the code is relatively de-coupled.

C# (full recompile): 7 seconds
C# (incremental compile, nothing changed): < 1 second
C# (incremental compile, 1 file changed): < 1 second

C++ (full recompile): 2 minutes 46 seconds
C++ (incremental compile, nothing changed): 10 seconds
C++ (incremental compile, 1 file changed): 18 seconds

Just to verify that my pre-compiled headers were doing their job:

C++ (full recompile, PCH turned off): 10 minutes 5 seconds

Draw your own conclusions.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, June 27, 2003

"The comparison between C++ and C# is hard because one compiles down to machine language while the other does not."

I disagree. MSIL is nothing more than a high level assembler. Emitting the appropriate assembler or byte-code is hardly the most expensive part of the process, unless you're doing EXTREME levels of optimization. The problem, as others have pointed out, is that C++ is hard to parse, has no standard meta-data, and has a poor architecture (header files).

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, June 27, 2003

Brad, you're right that it probably doesn't account for the majority of the difference, but the fact that C# compiles to IL is definitely significant.  The compiler doesn't have to do things such as register allocation (I assume at least, I can't imagine it having to do this if the IL is in any way platform-independant).  Since register allocation is an NP problem this could be a substantial time savings (though I have no idea how long it typically takes since obviously compilers don't implement perfect register allocation).  I'm sure there are other examples of optimizations that can take a long time that C# doesn't do when compiling to IL.

However my guess is that you're right in that the majority of the difference is because of parsing and the poor architecture of C++.

Your example may also be extreme if it uses a lot of templates.  Templates in my experience tend to slow down C++ compiles drastically.

Mike McNertney
Friday, June 27, 2003

You couldn't pay me enough to use C++ without templates. :)

You're right that registers are not an issue. I forgot about that. The .NET execution environment is stack based, not register based.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, June 27, 2003

FWIW, out project, 5.3MB of cpps (in ~300 .CPP files), 1.2 MBs of .hs (a game, using windows.h, but no templates, MFC, etc, and properly using PCHs and LEAN_AND_MEAN) takes about 2 minutes for a full rebuild, ~10 seconds for an incremental, and from 10 to 40 seconds to link depending on how balky the linker is.

So that conforms pretty closely to the other example.  I'm really surprised C# can do a full compile on something of that magnititude in 7 seconds -I'd kill for 7 second full compiles.

Phil Steinmeyer
Friday, June 27, 2003

Yes, I'm extremely impressed by 7 second full compile too, even if it were a much smaller project.  Just changing one line in a C file takes about 7 seconds in my current project (a mess, as I already said).

What about Java?  Can anyone post some similar results?  What about other languages?

Speaking of compilation being I/O bound, the compiler is a separate executable in VS.NET (cl.exe), so I would guess that every time it runs it reloads the source files from disk.  Is this true and does it matter at all?  I mean, maybe one advantage of an IDE over command lines tools is that it would be possible to keep a lot of stuff around in memory between iterations.

Another idea I had is that it is if you're working on a team of 15 people, and one guy changes a header, the next time everyone else does a get, it will take say 3 minutes to recompile.  That's 3 x 15 = 45 minutes of lost time.  Over time this adds up, obviously.  If you're not working on that section of the code, there is no reason for you to recompile it, the other guy's compile will be identical to yours.  Since C++ has the idea of translation units, it should be possible to figure out which object files do not to be recompiled.  Basically the idea would be to have some sort of system for distributing the object files compiled by the first person.  I know a lot of compilers don't keep around separate .O files anymore, buut just checking in the O files to source control might sort of work, except I don't know about merges and all that.

I know there is a distributed build product for Visual C++, whose name escapes me at the moment, but it's not quite the same idea.

Andy
Friday, June 27, 2003

Re I/O: Visual Studio .NET requires all C# source files to be saved to disk before compilation can take place so I'd guess that they're read from disk, too.  Also, the examples given in this thread were fairly big -- I doubt they have all those hundreds of source files loaded in the IDE!

For what it's worth, on my system VS.NET takes about 10 seconds for a complete rebuild of a solution with 4.6 MB of C# source code in 208 files.  Partial builds are faster, of course. It's so fast that I've adopted a "change a few lines, recompile, test" style of programming because there's hardly any time lost doing so. Truly a huge change from C++.

Chris Nahr
Saturday, June 28, 2003

In VS.NET, there are internal parsers, obviously, used for the wizards and the while-you-type parsing errors/compile errors (depending on which language you're using), but actual compiling is done by the SDK command-line compilers, just as prior versions of VC++ have launched the external commands to do their work (cl, link, midl, etc.).

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, June 28, 2003

Java compile times are probably comparable to the C# ones (ie really, really fast).

I think a big part of the C++ problem is the header files (as had already been mentioned several times).

The advantage that Java has is that it's easy for the compiler to glean class definitions from compiled bytecode, whereas C++ has to parse source every single time. Moreover, the C++ compiler can't assume you haven't modified the meaning of a class definition with preprocessor tricks fromfile to file.

This is why precompiled headers are such a big win (a project I worked on managed to reduce full builds from 1 hour to 10 minutes by getting precompiled headers working) - they remove a lot of unnecessary repeat parsing.

Andrew Reid
Sunday, June 29, 2003

"The comparison between C++ and C# is hard because one compiles down to machine language while the other does not."

Very significant;

In Java, when class A imports class B,
then the compiler just has to look at the class file for B, this way it has all type info ready; so it is much easier to look up the results).

In C++, when File A includes file B, then the compiler has to parse B through; probably again and again.
Chew chew chew.

... i guess the time saving in compilation is due to an age long trick of precomputing results and reusing them.

The necessity of keeping type information execuctable file format enabled language designer to get rid of seperate compilation units. 

It's the runtime environment/byte code format that determines the higher level language features;

The tail pretty much determines where the dog goes ;-)

Michael Moser
Monday, June 30, 2003

... by byte code format i meant class file format / Executable file format.

Michael Moser
Monday, June 30, 2003

*  Recent Topics

*  Fog Creek Home