Fog Creek Software
Discussion Board




Smart Pointers in Heron

My programming language Heron uses two kinds of smart pointers, ala Boost to manage memory. One is a shared smart pointer ( I call it a strong reference) and a weak pointer, for breaking cyclic dependencices.

The problem in short is that making weak pointers safe, no matter how they are used. For instance someone could make a weak pointer to a stack object, but I want to guarantee that it throws an exception if the object gets deleted before the weak pointer.

I have described the problem in more depth along with a proposed sol'n at http://www.heron-language.com/blog.html , but I have a bad feeling that it sucks (it involves a whole lot of reference counting and double width pointers), I am hoping some of the brains here would share their insight into the problem.

Thanks in advance all!

Christopher Diggins
Monday, July 26, 2004

Traditionally, languages that support weak pointers specifically have them NOT prevent the referent from being deleted. So you can have a "dangling weak pointer". In a real GC, this isn't an issue, since the GC can null out the weak pointer when it cleans up the object.

So, it sounds like you've coded yourself into a corner.

Chris Tavares
Monday, July 26, 2004

I am not sure I follow, I might be having some kind of brain freeze.

To be a bit more precise, I don't want my weak pointer to prevent deletion but rather to prevent it from being used once the object has been deleted (whether it is through the smart pointer automatic deletion, stack unwinding, or explicit deletion). I guess I am looking for a good way to detect whether a weak pointer (at time of use) is a dangling weak pointer. Am I overlooking something important?

Christopher Diggins
Monday, July 26, 2004

I agree that implementing that needs reference-counting (I see no way to avoid that).

I don't know what you mean by "double width pointers" ... it seems to me that you're storing the reference-count in the thing being pointed-to, which makes sense to me: storing the reference count in the stack-frame (for stack-resident objects), or inside the heap-allocated object that you're pointing to ... so it affects the storage/data-layout of pointees, not of pointers.

Christopher Wells
Monday, July 26, 2004

Ahh, gotcha. You're on the right track there, then.

The only solution (absent a real GC) that I can think of is somewhat along the lines of what you're thinking of. Basically, I'd make the weak reference a double-indirect pointer. Every object would have a weakref field which points to another chunk of memory. That memory chunk would point back to the real object. The weak ref actually points to the other chunk.

So looks something like this:

weakref ---> weakref "stub" <---> actual object

When the actual object is destroyed, it zeros out the pointer in it's weakref stub. The weak reference needs to deference the stub to get to the real object, so it finds out immediately if the referenced object has gone away.

The stubs can be reference counted to know when to delete them; this reference count only counts when a weak reference is assigned, it does NOT include the reference from the real object to the stub.

Which, if I understood your blog entry, is pretty much what you're going to do.

Chris Tavares
Monday, July 26, 2004

This is interesting stuff. I had been planning to make the weak pointer would contain a pointer to the reference count and to the object simultaneously (hence double width), with the object oblivious to the reference count.

The motivation for that line of thinking was that in Heron every variable is an object, including so-called primitives like integers, floating points, characters ... etc. I expect only about about 1% of all instantiated objects to ever be referenced. What I ideally would like is a system that is only invoked when needed (i.e. when an object is referenced)

What other methods exist to do this, and how could all of this be optimized for Heron? I want objects to be small and fast, while references can be big and slow. Kind of the opposite way of thinking than I was used to with C++.

I am not opposed to using GC techniques to achieve the desired effect. I can't help but think there might be a way to leverage the fact that the GC wouldn't have to decide when to delete the object but rather just detect when an object is unusable. Along this line of thought, could we come up with a simple generational collector variant that only detected unusable weak pointers?

Christopher Diggins
Monday, July 26, 2004

Funny that everyone is named "Chris" ...

hoser
Monday, July 26, 2004

I think you are spending too much time in a C-induced haze.

Reference Counting is about the worst way to do GC.  It forces you to use weak references for cases where you really shouldn't and requires each and every alloc and free operation to touch the reference count.

Reference Counting is why Greenspun's 10th law is that "every large C program contains a *bad* implementation of at least 50% of Common Lisp" instead of just "every large C program contains an implementation of at least 50% of Common Lisp".  The lisp folks gave up on pure reference counting years ago because it was too slow and because weak-references are a ugly hack.

It's nonsensical, but by throwing caution to the wind and using a good copying or mark-sweep sort of system, you will end up with faster result.

If you want a revolution, you should be trying to combine all of the nice parts of non-GC'd C++ and the idioms it allows (i.e. RAII) with the nice parts of having GC, while still allowing for a measurable performance benefit.

Ruby, for example, has two hacks to get around this.  One's the usual "Finally" clause on the catch statement.  The other way is to allow you to create scope-locked objects.  But there's got to be a better way.

Oh, and having a 16 bit reference count is an incredibly, unbelievably stupid idea.  Sure you can't think of a case where there would be more than 65535 references to an object hanging around, but I can assure you, there will be.  Picture the object oriented in-memory database where, for one reason or another, each "row" object has a pointer back to the "table" object.  Now what happens if there's more than 65535 rows?  That's where reputations of being a toy come from.

Flamebait Sr.
Monday, July 26, 2004

I was told once by somebody who should know, that research showed that you really only need three bits for your reference count.

The reason? If your number of references goes over 8, the object in general is going to last the whole lifetime of the program. So once you go past that eighth reference, the object flips into the "live forever" category.

Chris Tavares
Monday, July 26, 2004

You have a very good point, Chris, and that works for a lot of code, but it's not the world's greatest idea for a general-purpose GC utility. ;)

Flamebait Sr.
Monday, July 26, 2004

> If you want a revolution, you should be trying to combine all of the nice parts of non-GC'd C++ and the idioms it allows (i.e. RAII) with the nice parts of having GC, while still allowing for a measurable performance benefit.

Sounds somewhat like Sutter++

From YAC (yet another chris)

christopher (baus.net)
Monday, July 26, 2004

There might be some confusion. I am *not* trying to implement a GC. The goal is to catch weak pointer violations. What I am doing is tryign to improve RAII approach. So in response to Flamebait Sr. I think that weak pointers are important in that they represent usage of an object, but are not intended to extend the lifetime of an object. I am following a bit the logic that inspired the auto_ptr design. With the auto_ptr in C++ ownership is restrcted to being single which is too limiting.

The 3 bit idea is interesting, but unfortunately doesn't allow me to catch dangling weak pointers.

I wish flamebait would contribute without being so insulting all the time, it isn't like he is any smarter than anyone else here.

btw What is sutter++?

Christopher Diggins
Tuesday, July 27, 2004

Sory, Chris, I'm obnoxious by nature.  I'm a troll, don't you know?  Besides, I only called you stupid when you limited reference counts to 16 bits.  ;)

What I keep trying to say is that you need to step back and look at the big picture.  Remember, the reason why C++ rules is not because it was merely a set of small incremental changes over C, it was a new vision for programming that was implemented as a set of small incremental changes over C.

Now, to be fair, I did assume that you were doing a reference counted GC and needing weak pointers.  But even though Ruby has "real" GC, it still has a concept of a weak reference.

I'm reading the documentation and you are heaping a lot of criticism on GC, when, your smart pointers are basicly GC, just only on a certain set of objects.

You want to know what would be really cool?  Make it programmable.  Keep the concept of a "weak reference", meaning a pointer to the address of an object that's non-lifetime-extending.  Keep the concept of a "strong reference", meaning a pointer to an object that is lifetime-extending and that you can call "free" on.  But have those be configurable inside of Heron so that a programmer could define one "weak reference" handler that would do no checks (i.e. a current reference or pointer) and a second "weak reference" handler that would reference count or otherwise check the weak references and throw exceptions as necessary.  Furthermore, you could make a really awesome debug-weak-reference class that would track the individual references so that you'd know exactly where the problem is.  Heck, you can even have a "Default" reference handler and also a way to specify which reference handler is to be used.

This is why I say that you folks who just know template metaprogramming do not know the real meaning of metaprogramming.  Real metaprogramming lets wise wizards of code you rip hunks out of the language if they get in your way. ;)

Of course, the big problem here is that a reference-counted "strong reference" is subject to circular dependencies, whereas all of the other forms of GC are not so limited.  However, there's nothing preventing you from implementing "strong references" using a more modern approach and banishing reference counting to the basement.

Counting weak references is a CPU load that itself has the distinct possibility of causing performance problems in complex systems -- hence the release build could ship with it weak reference counting turned off.

I'm not sure if any of the more advanced GC systems will work so well for tracking down memory issues with weak references.  The problem is that most of them are designed to be amortized operations, so you'd still catch a dangling weak reference, but you'd catch it only when the GC had run, so it wouldn't necessarily be easy to know exactly where it fired.

This, incidentally, is why modern GC can be so fast.  In most cases, the total number of CPU cycles expended on tracking memory is reduced by enough of a factor that having a little extra memory slosh is an acceptable penalty.  Furthermore, given that most of the GC systems out there do a very good job of ensuring that memory is in an orderly heap -- because they can move it around -- it ends up meaning that your heap manager CPU cycles, which most folks don't even think about, are greatly reduced.

Most C/C++ oriented GC systems are built around a variety of hacks because there's not a good API to work against available.  But with Heron, you *know* what's a pointer and what's a reference.  This gives your users a lot of options that they wouldn't otherwise have.

Well, that, and you can then use it to write a sample application and see if you can really manage your memory better than the latest and greatest GC algorythms.  ;)

Flamebait Sr.
Friday, July 30, 2004

*  Recent Topics

*  Fog Creek Home