Fog Creek Software
Discussion Board




Bug in "Back to Basics"

"Smart programmers minimize the potential distruption of malloc by always allocating blocks of memory that are powers of 2 in size."

I hate to disagree here, but there is a slight inaccuracy here, that will make Joel's suggestion a very bad idea. When malloc allocates blocks, it attaches a small, constant size header in front of the block, to manage it. Now, the allocator allocates a block which both the header and the data region can fit in. So, if your block is exactly a power of 2, you'll get 2**n+h, which means it has to be rounded to 2**(n+1). This usually means a near 100% waste.

What should be done is to allocate blocks in a little less than a power of 2, to allow the header to fit there. I still recall it vividly from a "Structure of Operating System" exercise my partner and I had to do a year ago.

Cheers,

    Shlomi Fish

Shlomi Fish
Saturday, June 01, 2002

true. Just allocated what you need and let malloc() do its job.

I read about a "feature" in the Borland C compiler's malloc(). It would allocate an extra few bytes at the end of your allocated block. This was supposed to minimize the harm done by buffer overruns. Of course this is stoopid and not portable. When you ported buggy code to (say) Microsoft's C compiler, you program crashes.

Zwarm Monkey
Saturday, June 01, 2002


He wasn't saying that was a good idea, it was just a way to do things. I must confess that I used to do things like that, but just because I was taught so.

Leonardo Herrera
Saturday, June 01, 2002

Just malloc some memory and call _msize on the pointer. The allocated size returned is different from what you actually requested. So you doing a 2^n allocation and maintaining the actual size is counterproductive in this sense.

RK
Monday, June 03, 2002

I'm not sure how wise it is to try such tricky little optimizations. Everything that I read indicates that you shouldn't worry too much about optimizations, because the compiler may do a better job, or, as in this case, you don't know what the malloc() call will do.

My thinking is that you should allocate what you need and then let the compiler optimize it. I am of the opinion that if you are really /that/ crunched for memory in your application or platform, you should be using your own malloc(), because then you can be 100% sure of the behavior.

Mike Swieton
Monday, June 03, 2002

The reason to allocate only memory blocks that are powers of two has to do with speed, not memory (read the article). If you don't, malloc will sometimes be unpredictably slow. Horribly slow.

Shlomi Fish wrote: "So, if your block is exactly a power of 2, you'll get 2**n+h, which means it has to be rounded to 2**(n+1). This usually means a near 100% waste."

Malloc isn't going to round anything, so it won't lead to extra memory waste, it will just mean that you won't get your desired malloc speed-up. 

Michiel de Mare
Monday, June 03, 2002

Michiel de Mare: "Malloc isn't going to round anything, so it won't
lead to extra memory waste, it will just mean that you won't get your
desired malloc speed-up. "

Explaining again: malloc works by allocating doubly linked lists of blocks
of fixed size. The size of the blocks are powers of 2. However, every
block includes a relatively small fixed size header, which includes such
information as its size, the previous element, the next element, etc.
This means the size of the data, is actually a power of 2 minus a constant
size.

What will happen if you call malloc(2**n) is that 2**n+sizeof(header)
would be just above 2**n and so it will have to allocate 2**n+h. Maybe the
kernel can take care of this unused memory, but it basically should not be
allocated in that size.

Shlomi Fish
Thursday, June 06, 2002

Don't take such a simplistic approach; Good memory allocators can do with an overhead of less than one bit per allocation (averaged) in many cases. An example of how this works:

Suppose for the moment that all allocations are 80 bytes in size (pick any number you wish greater than 4). Preallocate one area of 80*10K = 800K bytes; Link the free allocations as a singly linked list. To allocate, take something from the free list; to deallocate, put it back on the free list. Now for an important observation - you can tell whether a block to be freed is part of this pool or not by checking whether the pointer falls within the pool. Thus, you can have pools catering for different sizes, without any additional per block storage.

For blocks smaller than 4 bytes you won't be able to use the memory as a linked list, and you'd generally need a bitmap to say what is allocated and what isn't (if every pool has up to 2^16 elements, only 2 bytes are needed for links; or, if every pool has up to 2^8=256 elements, you don't need a bitmap even for allocations sized one byte).

There's still reason to have a bitmap though - if you want to check 'double free' situations in constant time (if you can do with linear time, just traverse the list to see it isn't there yet).

There's still the question of how much pools of what size to preallocate - I leave that as an exercise to the reader. And there are of course several other systems which reduce allocation overhead until its negligible.

Ori Berger
Thursday, June 06, 2002

*  Recent Topics

*  Fog Creek Home