Fog Creek Software
Discussion Board




Dynamic Memory Management

I have a theory that most of the security and stability problems we are seeing are the result of relying too heavily on garbage collection, automatic memory management, and to some extent the C/C++ heap.  Dynamic memory works great, when it works.  But how about when it fails?  My guess that almost nobody does something sane in low memory situations when the underlying run time (whether that be Java, .NET, or even the C++ heap) can not access the requested memory.  You might claim you do something, but my guess you don’t know when or even tested all the points where a failure may occur.  Worse yet, most developers don't really even understand how their application can fail.  Almost always a memory failure results in undefined application behavior.  This is REALLY BAD if you are worried about the reliability and security of a public facing server. 

In my opinion this is why all web applications should at very least be proxied by an application that has a very strict memory management scheme.  Although I am biased,  as I spend many free hours developing such a proxy. 

I really enjoy programming in high level languages like Python.  It lets me focus on the problem and not the technical details, but I know there is the potential that something could go horribly wrong in a way I would never expect.  Hence I am always leery about putting such apps into production.  In the end the decision to do so is usually a statistical problem, and a business one as well.  For example, "what is the likelihood of failure, and how much would it cost to reduce that possibility?"

For testing apps the cost of failure is typically low, so very high level languages are fine.  For an application such as Amazon's order processing system at Christmas the cost of failure could easily run in the millions of dollars. 

christopher baus (www.baus.net)
Wednesday, June 16, 2004

"Dynamic memory works great, when it works.  But how about when it fails?"

Which is basically never. You'd have to be completely out of physical memory AND swap file space. And if that's the case, your app is just one of 30 processes (services, drivers, windows itself, etc) that will be out, so even if you cleanly handle new or malloc failure the system is still likely hosed and there's nothing your app can do about it anyway.

And "Managed memory" does not improve security except for that ONE particular exploit where you dynamically allocate an array on the stack, and the exploiter knows the exact size and parameters to overwrite it and stick a phony return address so it returns to a different location.

Ron
Wednesday, June 16, 2004

I'm all for testing, but why not make sure your production app never gets in a low memory regime in the first place? Buy more memory, servers, whatever it takes to avoid running in the edge case.

MilesArcher
Wednesday, June 16, 2004

> Which is basically never.

Not true.  Many developers incorrectly size buffers based on input.  If two requets for 2gigs come in, well, game over.  See Cisco's recent buffer overflow in their HTTP engine.  This won't happen in the lab, but if someone is trying overrun you, it will happen.

christopher baus (www.baus.net)
Wednesday, June 16, 2004

The problem is that ALL computers are finite machines.  On 32 bit systems, you are limited to a few gigs of addressable space. 

christopher baus (www.baus.net)
Wednesday, June 16, 2004

"Many developers incorrectly size buffers based on input"

If these developers are not validating data in a data-driven app, then that is the problem, not the failure to cleanly handle out-of-memory which is only one possible manifestation of not validating. An out-of-bounds request should never be made in the first place, same as trying to request 0.1 bytes of memory.

"On 32 bit systems, you are limited to a few gigs of addressable space."

That's right, and any programmer that handles large amounts of data knows to store it to disk.

BTW the Cisco overflow was exactly that, an overflow, not a failure to allocate memory.

All your points you bring up are valid, but your theory that a significant cause of security and stability issues is failure to handle out of memory cases is just wrong.

Ron
Wednesday, June 16, 2004

> If these developers are not validating data in a data-driven app, then that is the problem

Yep that is a problem, and it is a serious problem, that happens ALL THE TIME.  It is easy to say, validate your input, but time and time again it DOESN'T HAPPEN.  Again another reason for my validation server....

But it isn't the only problem.  Often temporary buffers will get allocated when handling a request and then freed.  The application often isn't tested in a highly-fragmented situation where all these temporary buffers are their maximal size when handling a large number of requests.  This could happen for instance when formatting a logging message string.  This is much more subtle, but again a problem if you serious about:

1) reliability
2) predictability in error situations

Throwing your hands up and crashing (err existing gracefully with error condition) is fine.  Just understand that is what you are agreeing to do.  I have software that does just this.  It just happens to auto restart, and the request is passed to a redundant server, but that is a rare architecture that benefits from being mostly stateless.

Dynamic memory management is a serious problem for server based applications, and I will not be easily convinced otherwise.

christopher baus (www.baus.net)
Wednesday, June 16, 2004

Here's a reference to the cisco HTTP problem I mentioned.

http://www.cisco.com/warp/public/707/cisco-sn-20030730-ios-2gb-get.shtml

christopher baus (www.baus.net)
Wednesday, June 16, 2004

"It is easy to say, validate your input, but time and time again it DOESN'T HAPPEN."

Your solution to validate memory allocations cleanly will not fix this problem. Bad buffer allocations are just one of any number of possible manifestations of bad data.  (Just as one example, what if "record id" is expected to be a number, but contains letters? So your hash algorithm returns -4 instead of a positive number? Handling memory allocations cleanly won't fix that... what if the "record id" is supposed to be unique, but isn't? Handling memory allocations cleanly won't fix that... etc).

The solution to failing to validate data in a mission critical system is to validate data properly. There's no other way. If your programmers are not doing that, you need to have code reviews, you need to educate them on the need for it, or you need to hire only people who understand the concept.

Ron
Wednesday, June 16, 2004

That is correct.  I am spending A LOT of time thinking about this problem. 

I don't want to go into anymore details, but I feel I have some unique solutions to the validation problem.  I would describe it more detail, but I would rather wait until the software is ready.

cheers

Christopher

christopher baus (www.baus.net)
Wednesday, June 16, 2004

When you run out of memory in a language like Python, the entire application shuts dowm with an error message.  And that's good:  You're completely out of physical and virtual memory, so what's left to do?  In a language like C++, you have to be sure to handle all out of memory exceptions, then be sure the handler for those exceptions doesn't need to allocate any dynamic memory.  If you don't handle things gracefully, the application won't crash, but might trash the heap or do other bad things for a while before the user realizes it.  That's bad.  The Python solution is generally a big win here.

Also note that you can often make better use of memory in a garbage collected language, because you don't preallocate giant buffers that you don't actually need, for example.  And because it's much harder to leak memory (it's still possible, though it isn't truly a leak in the traditional sense).  And if you use a language with compacting garbage collection (not Python!), then you completely avoid heap fragmentation issues and can make even more efficient use of available memory.

Junkster
Thursday, June 17, 2004

"Almost always a memory failure results in undefined application behavior.  This is REALLY BAD if you are worried about the reliability and security of a public facing server."

Not in all languages - in http://www.lingolanguage.com a memory allocation failure leaves a value in a valid but empty state and then causes a normal program error. IMO all bad situations that could occur in a running program should be handled properly leading to termination of the program.

Bill Rayer
Thursday, June 17, 2004

*  Recent Topics

*  Fog Creek Home