Fog Creek Software
Discussion Board

malloc()ing powers of 2?

Is this a good idea or not?

I just read Joel's "back to basics" article and in it he recommends allocating memory with malloc() and realloc() using powers of 2. It seems to make sense: limits the total number of calls to realloc() by always allocate more than enough memory; worst case scenario is allocating twice as much as you need; limits fragmentation.

But in the comments to the article, many posters advised against the practice, but not with much elaboration.

Could anyone comment on this further?

Monday, February 23, 2004

There was an interesting thread a few months back on comp.lang.c++.moderator where one of the STL implementors (Plaugher?) was talking about how in their newest release, they changed the std::vector growth strategy.  It had been using 2**n growth (as most implmentations do), but they reduced it to 1.5**n.  The reasoning was that having a lower base allows memory to be reused in subsequent allocations.
That is, 2**n growth prevents malloc from reusing memory that is freed on previous resizes.

I don't think I'm explaining it very well.  Look up the thread if you're interested.

Monday, February 23, 2004

I don't think there's a generic answer - testing and profiling on a case by case basis is probably the only way to know for sure.  Allocating more memory may kill performance by causing more page faults for example.  If you're working in multi-proc environments I highly recommend the use of Hoard however.  See for details

Monday, February 23, 2004

As I think Joel's article about malloc() said, if you are trying to optimize your calls to malloc(), then you will probably confuse it. malloc() was implemented to optimize for certain use patterns. If your program does not use those typical use patterns, then malloc() run more slowly than it normally would.

Plus this is pre-mature optimization. Why litter all of your code with calls to malloc() with size rounding? Though, I guess you could just write a malloc() wrapper that did the power of 2 rounding and then called malloc() for you.

The moral of this story: measure FIRST, then optimize hot spots.

Monday, February 23, 2004

If you look at the source for the MSVC run-time library, you may see that it's already rounding up the size of your malloc requests to the nearest power of 2: i.e. there's no need to do this rounding in your application-level code; instead you should trust the compiler run-time library that you're using.

Christopher Wells
Monday, February 23, 2004

Not quite. It's rounding them to a granularity that is itself a power of two.

I assume this is because the allocator will allocate chunks in some multiples of a power of two anyway, to ensure its own structures in the heap are aligned. It also makes the alignment a bit quicker, because you can avoide a divide and modulus, but I hardly think that will be a bottleneck.

Insert half smiley here.
Monday, February 23, 2004

From memory, ugh inadvertent pun, on i86 sbrk() which partitions memory when malloc() runs out of memory that it last sbrk()'d it does so on paragraph boundaries (the least 4 bits are zero).

Malloc() has an admin overhead of allocating the chunk of memory and that itself should be on an even boundary so that you don't get into the hi-low fetch swapping problem.

This argues for always asking for mod 16 sized allocations, however then you might get into the packed struct/unpacked struct issues with wastage, not to mention side effects like having unused memory that can be the target of security exploits.

In the past whenever I've had to handle odd sized allocations for packed structures (frequently drivers and parameter blocks need this), I've ended up writing my own allocator to return byte aligned allocations.

Simon Lucy
Monday, February 23, 2004

I always thought the powers of 2 thing was to minimize memory fragmentation.

Just me (Sir to you)
Tuesday, February 24, 2004

Trying to second-guess your malloc implementation by feeding it sizes you think it will like is *insane*, even if Joel suggests doing it.

0. The first law of optimization: Don't do it.

1. Different implementations of malloc will work well with different sizes of request. You think you're being oh-so-clever by always feeding malloc a power of 2? So what happens if your program is run on a platform whose implementation of malloc that allocates stuff in power-of-2 sizes *and stores some housekeeping data inside the block*, eh?

2. You may think you know what *the* implementation does. (This generally means "the one in the C library of the version of Microsoft Windows I'm using today".) There is no reason whatever why this shouldn't change in a later version of Windows, and then you lose.

3. On the (frankly, rather rare) occasions when this sort of micro-optimization might be helpful, you can generally win much bigger by taking more advantage of the allocation patterns of your own application. Which you can actually control, unlike the internal workings of the platform's implementation of malloc. Allocating lots of blocks of size 12? Make a special-purpose thing for allocating blocks of size 12.  You can save a lot of overhead this way. (You'd use the platform's malloc as a back end, allocate large enough blocks with that to make the overheads negligible, and use your own scheme for suballocation.)

4. If rounding up actually does anything, then one thing it does will be to make your program use more memory (so more page faults) and to make memory addresses of corresponding parts of two structures likely to have more identical low bits (so more cache line conflicts). This is not going to help performance.

5. Even if (1) you need your memory allocation to happen quickly and (2) you know that you'll only ever be running on one platform and (3) for some reason it's impossible or unpleasant to write a custom allocator and (4) you know that rounding up either won't use more space or won't use enough more to hurt ... just what do you think you're gaining by rounding up? It takes, like, maybe 5 cycles of branch-free code for the platform's allocator to do that. If optimizing that away is necessary for your application, then get the hell away from malloc and write your own memory management routines. In assembler. If you don't need that, then you don't need to save a few random cycles by rounding up your requests either.

Oh, and ... a wrapper that rounds up to a power of 2? Congratulations! You have just pessimized your program's use of both time *and* space!

Disclaimer: I've forgotten exactly what it was that Joel was originally allocating. Maybe it wasn't really something as stupid as rounding up all your malloc requests. For instance, if you have a resizable structure (an adjustable array, say) then you probably want to never grow or shrink it by less than a certain factor, so that you get constant amortized time for your operations. (If you grow a vector by a constant factor every time it needs to grow, then the time it takes to grow it to size N is proportional to N. If, e.g., you grow it by 1 item every time it needs to grow, then you can end up taking time of order N^2.) But this has nothing to do with "rounding" the sizes of your allocations, except for the coincidence that the latter is one way to do the former. I prefer to overallocate by a factor smaller than 2, anyway.

Gareth McCaughan
Tuesday, February 24, 2004

Er, "... what it was that Joel was originally *advocating*", not *allocating*. Ahem. :-)

Gareth McCaughan
Tuesday, February 24, 2004

Order your data within the C struct so that its packed tight as best you can: all the doubles first, then on down until all the single bytes are together at the end.  Alloc just enough for that and let the compiler/runtime figure out the best way to give it to you.


Bathmophobic skier
Tuesday, February 24, 2004

If you're new-ing or malloc-ing individual structures or small classes, then you've already lost.  You really don't want to be doing that.

Tuesday, February 24, 2004

With a half-decent library implementation even the optimisations that improve the allocations of small objects should be unnecessary.

Like creating your own version of a library class, optimisations of this sort with new and malloc are interesting to experiment with - but don't actually use them.

Jon Hanna
Friday, February 27, 2004

*  Recent Topics

*  Fog Creek Home