Fog Creek Software
Discussion Board




C++ References

Would it be safe to say that a reference is an abstraction of a pointer? 

Which do you find yourself using most often, a reference or a pointer?

Dave B.
Tuesday, August 12, 2003

I wouldn't say that a reference is an abstraction of a pointer.  They are both abstractions of an address/location in memory.

Pointers came first, so usually you hear references described as "just like pointers except for a, b and c".  Basically safer pointers with some syntactic sugar.  But you could hear pointers described as "references WITH a, b, and c".

I use references when I can, and pointers otherwise.  The most common use is for passing by reference, naturally.

Andy
Wednesday, August 13, 2003

Typically, pointers are memory addresses to things that are allocated on the heap.  References usually deal with things on the stack.

People often pass as a parameter to a function the address of a variable on the stack using a pointer, but they can usually do this using a reference as well (it's a matter of preference).

One rule of thumb is, can the value of the address be NULL?  If so, you probably want a pointer.

Anon
Wednesday, August 13, 2003

Safe?  What harm could come of it?

But it certainly isn't true to say it - references cannot be reassigned, cannot be assigned a null value in a well formed program, reference arithmetic doesn't exist, etc.

Danil
Wednesday, August 13, 2003

I never heard the distinction about pointers on the heap and references on the stack, but I guess it is true since new/malloc return pointers and you can only delete/free a pointer.  But there is no problem with either references or pointers on the stack.  I would say C programmers tend to use pointers for everything because that's what they know (don't most of Microsoft's APIs use pointers and not references?)

But I think that could be a bit confusing for a beginner... I would just stick with the technical distinctions.

I call references safer because they are restricted so you don't screw up with them.  I think this is pretty standard.  You can't reseat references, must initialize member references in the constructor, they can't be NULL, no reference arithmetic, etc.  Thus when you use a reference you have to think less than if you use a pointer (at least you don't have to check for null or prove to yourself that they can't be null).

With pointers if you screw up your arithmetic or reseat them incorrectly, you can trash memory, read garbage memory, and all that good stuff.

It is useful to remember that pointers and references have the same implementation at runtime.  They just have different compile time checks and different syntax.  It's purely a programming language issue.  A reference is dereferenced just like a pointer is (in the actual executed code), but it doesn't show in the source code because of the "syntactic sugar".

Andy
Wednesday, August 13, 2003

>> cannot be assigned a null value in a well formed program <<

int* p = 0;
int& r = *p;

What's not well formed about this? 

SomeBody
Wednesday, August 13, 2003

> int* p = 0;
> int& r = *p;
>
> What's not well formed about this? 

The very fact that you assign a referenced null pointer
makes the program not well formed.

Yes, that's how the logic goes

   
      you cannot assign
      referenced null pointers            --------> if you do, it's
      to refs in well formed progs                  not well formed
                                        ^                                              |
                                        |_______________________|
                                         

Ignore my ignorance
Wednesday, August 13, 2003

"No," the Guru's quiet voice startled us both. Once again, she had appeared at the right moment, and now stood behind us, an open tome in hand. "There is no need of such an abomination. The Standard teaches that it is not possible to have null references."

http://www.gotw.ca/conv/002.htm

Ged Byrne
Wednesday, August 13, 2003

I think that this is missing the point.

Dereferencing NULL is *undefined*.  The compiler can do whatever it wants.  In some cases you will access the memory location 0 and cause an exception, but that is beside the point.

After you dereference NULL, you're already screwed, so whatever happens after that is meaningless (namely, the assignment to the reference).  Having the reference have a value of 0 is not something that can be expected with all compilers -- some could give it the value 0xDEADBEEF if they felt like it, and it would still be legal C++.

But 0 is probably common because that is the "default", "fall-through" behavior that doesn't require anyone to write any code to *define* this undefined behavior.

Andy
Wednesday, August 13, 2003

I never use plain references. Ever. My code is littered with const references though. I see them mainly as a way to preserve efficency.

Mr Jack
Wednesday, August 13, 2003

As with most anything else, you need to provide context when talking about references and pointers. In this case the context being the programming language you are talking about. This is because many (most?) programming languages provide either one or the other, but not both. Thus you might say that for a language that has only pointers, they are its references, and vise versa.

Moreover, references mean different things for different languages that do provide them. For example, Java references are very different from C++ references, behaving much more like C++ smart pointer objects.

The same is also true for pointers. Pascal pointers are very different from C pointers, e.g. no pointer math, no address-of operator.

C++ is one programming language that provides both pointers and references. For that language I would say that the major difference is that pointers are unique objects onto themselves while references are not. That is, a pointer exists independently of the object it's pointing at. A reference on the other hand, has no independent existence, its just an alternative name. This is why you can take the address of a pointer in C++ but not the address of a reference (talking the address of a reference returns the address of the referenced object).

BTW, C++ got references because of its operator overloading feature.

Dan Shappir
Wednesday, August 13, 2003

A reference is not a pointer.

If you take the address of something (i.e. create a pointer), then the compiler must allocate that object to memory, e.g. on the stack.  It cannot keep the value in a register.

If you initialize a C++ reference with something, then a smart optimizing/inlining compiler can deduce that the something can still be contained in a register. Many compilers will implement references as pointers, especially if debugging is enabled, but that's an implementation issue.

David Jones
Wednesday, August 13, 2003

My example wasn't supposed to be taken so literally.  If you use your imagination a little, it's easy to think of some scenarios where the problem isn't so obvious.  The point is that it's possible to end up with null references in C++ code that otherwise compiles and runs. 

I'm suspicious of statements such as "[references] can't be NULL".  That simply isn't true for many implementations and in code that mixes references with pointers, it's very easy to end up with null references.  The typical scenario where I've seen it happen is when a pointer is dereferenced to pass into a function that takes a reference parameter. 

SomeBody
Wednesday, August 13, 2003

When taught "C++ References" we were told that references were just syntactic sugar over pointers -- and that's really the case (even if the compiler might optimize them differently).  A C++ Reference is just an auto-deferencing pointer; all the evils you can imagine from that (including null references) are all true.

Almost Anonymous
Wednesday, August 13, 2003

<quote>
When taught "C++ References" we were told that references were just syntactic sugar over pointers
</quote>

Is it possible that what you were told was wrong?

Danil
Wednesday, August 13, 2003

Dan: notice the thread title : )  I don't think there is any generally accepted notion of "pointer" and "reference" outside the context of C++.  If someone refers to those concepts in general, then I don't know what they mean (specifically).

Somebody: to reiterate another way, it's also possible for new to fail every time in a program that compiles and runs, when you have sufficient memory (by trashing the heap in main, for example).

It is also possible for a virtual function of an object not to be called polymorphically through a pointer to the object (by trashing the v-table).

In each case you are doing something you shouldn't do, and causing a language feature to fail.  The "guarantees" are no longer valid once you do something like that.  Granted, your example is much more likely in practice than mine, but the point remains.

Andy
Wednesday, August 13, 2003

Almost anonymous, if that's how you think about references, you're missing the point IMO.  References are not simply an auto-dereferencing pointer.  They have other properties as well.  For example, the only way to get a NULL reference is to dereference a NULL pointer, an operation that is already undefined.  Otherwise, references can never be NULL.  Similarly with other invalid pointer addresses.  The point is that the pointer->reference interface is not quite safe, but once you have a reference to a valid pointer the rules are different from normal pointers.

That said, I don't find too much use for references in my day-to-day coding.  I don't like using them for pass-by-reference, because it is not clear from the call site that the parameters may be changed.  For this I prefer pointers.  I do use const references on occasion for saving a copy when passing objects, rather than using a pointer.  But that's about it.

Mike McNertney
Wednesday, August 13, 2003

"Is it possible that what you were told was wrong?"

No.

Next question...

Almost Anonymous
Wednesday, August 13, 2003

For whoever gave this definition above. Pointers have nothing to with whether or not something is on the heap.

void method()
{
  int i;
  int* p = &i
}

Oren Miller
Wednesday, August 13, 2003

Oren - read the friendly comment. "Typically, pointers are memory addresses to things that are allocated on the heap.  References usually deal with things on the stack."

Reply
Wednesday, August 13, 2003

> Would it be safe to say that a reference is an
> abstraction of a pointer? 

No. A reference is an alias, not an object. It is only similar to a dereferenced pointer, not a pointer itself.

> Which do you find yourself using most often,
> a reference or a pointer?

I use references until I can't, and then I may use a pointer. They have different capabilities. To consider them interchangeable is to miss out on the differences in their constraints and liabilities.

Steven E. Harris
Wednesday, August 13, 2003

I get the whole "references can never be NULL" idea.  The point is that you'd probably be hardpressed in the real world to find any amount of non-trivial C++ that qualifies as "well formed".  With real world code, null (or otherwise invalid) references are a reality.  The "references can never be NULL" idea smacks of something you'd read in a book or learn in a class.  Sure, it's theoretically true but I don't think it qualifies as a benefit of references over pointers.  If anything, this notion makes references LESS safe than pointers because people have the idea that references can never be null.

SomeBody
Wednesday, August 13, 2003

> With real world code, null (or otherwise invalid)
> references are a reality.  The "references can never
> be NULL" idea smacks of something you'd read in a
> book or learn in a class.

In my eight or so years working with C++, I have never written nor seen any code involving a potentially null reference, besides the undefined contrived example given above.

But maybe there is more than one "real world." In mine, references are free of null troubles. If some programmer is introducing null references, that's a sure sign that she's also introducing various other kinds of errors and probably lacks that "innate understanding of pointers" that Joel seeks out in his employees. Null references will then be the least of your worries.

Steven E. Harris
Wednesday, August 13, 2003

NULL references are unbelievably easy in C++...

--/ snip /--

function Test(const SomeClass& param)
{
    param.Method();   
}

someClass* instance = SomeOperation();
Test(*instance );

--/ snip /--

If SomeOperation returns a NULL pointer than the reference in the Test method will be null.  This is very common.  Of course, this doesn't apply to references which are not function arguments.

Almost Anonymous
Wednesday, August 13, 2003

SomeBody: The importants of the distinction is not that having a reference prevents you from having a NULL dereference.  The point is that it moves the responsibility of checking for that NULL.  If your function takes a reference, you are explicitly stating that your code will not deal with NULL.  If someone then dereferences NULL and passes it in, your function is not the source of the bug, it is that person not obeying the constraints of the language.  Yes this is a silly semantic in small examples, but when designing large projects, it can be very very helpful to isolate what parts of code can cause faults, and using references can help you do that.

On the other hand, in some situations checking for NULL in the function is preferable to checking at the callsite.  It really depends on the expectations.  If NULL is a valid potential input, you should use a pointer rather than a reference.  If NULL can't be valid, using a reference is a good option because A) it clearly indicates that the function does not deal with NULL, and B) it prevents an error from trickling along and causing problems later where they will be harder to detect.

Mike McNertney
Wednesday, August 13, 2003

Ugh, that will teach me to not proof read... that should be "importance"

Mike McNertney
Wednesday, August 13, 2003

> Test(*instance );

There's your bug. The person who wrote the Test() function is off the hook.

This has nothing to do with null references; it's still about null pointers. You never made it to having a null reference in hand. Before the function is even called, dereferencing the null pointer invokes undefined behavior. There's no point in discussing anything that happens after that.

Steven E. Harris
Wednesday, August 13, 2003

Almost Anonymous's example is exactly the type of thing I was talking about. 

I don't get what's so controversial about what I'm saying.  Sure, if programmers were more knowledgable, more careful, weren't rushed, had well-defined guidelines, etc., this sort of bug would never happen.  But that can be said about pretty much any bug. 

I'm not saying that you should stop using references or that you should check &r before using references.  All I'm saying is that references can end up null and I've seen (but did not cause) a few bugs due to null references in real world code (assuming experienced programmers working at a profitable company qualfies as the real world).

SomeBody
Wednesday, August 13, 2003

"> Test(*instance );
There's your bug. The person who wrote the Test() function is off the hook."

I totally agree with that.  If your function advertises that it accepts a reference than the caller should guaruntee that it's not null. 

"I have never written nor seen any code involving a potentially null reference"

My example shows a potentially null reference!  I could say in the documentation for SomeOperation() is guarunteed not to return NULL...  but who knows!

It's all semantics really: references *can* be null but they *shouldn't* ever be null.  I think we can all agree on that!

Almost Anonymous
Wednesday, August 13, 2003

Almost Anonymous's example does NOT expose a NULL reference bug, because on most compilers you won't get past the dereferencing of the NULL pointer anyway.

As evidence, GNU and GHS compilers for x86 linux both seg fault at the dereference. If your compiler doesn't segfault on NULL dereference then you have a very strange tool. :P

Steven C.
Wednesday, August 13, 2003

The call to "Test(*instance)" does not access the pointer at that point.  The pointer will be deferenced on the first access inside the function.

Also deferencing a null pointer is a runtime error not a compile-time error.

Almost Anonymous
Wednesday, August 13, 2003

Actually, Almost, you are incorrect (at least as far as the compilers I have at my disposal indicate).

When you dereference a pointer in a function call, the dereference happens at the call site, not in the callee. So the seg fault will occur at the call site, thus you never had a NULL reference to cause issues.

Additionally, I am aware that the seg fault is a runtime error, but the point of this was to have an example where the code could have a NULL reference. As I have stated, you can never actually RUN to that point, so you don't need to ever worry about it (i.e., your program will die for other reasons earlier).

P.S. If you don't believe me on the call site thing, email me and I'll send you a copy of the source I used to test this, you can compile with your favorite compiler and look at the assembly; trust me, the NULL dereference occurs before the call.

Steven C.
Wednesday, August 13, 2003

> My example shows a potentially null reference!

Not quite. Read on.

> I could say in the documentation for SomeOperation()
> is guarunteed not to return NULL...  but who knows!

Right, who knows? I wouldn't trust such a comment unless I owned the source to that function. Again, the bug is in not testing the returned pointer, or, perhaps in the presence of a comment like you suggest, the bug is in the function that's returning null when it promises it won't.

> It's all semantics really: references *can* be null but
> they *shouldn't* ever be null.  I think we can all agree
> on that!

We're getting closer, but I must be firm on this: references cannot be null. In order to have an null reference, one would have to initialize it with a dereferenced null pointer. But dereferencing a null pointer is undefined, so nothing can happen after that within a C++ program.

Now, of course, a real program would keep running and eventually trip up on trying to read an invalid address. But you're not even running a valid C++ program; you're running some bastarad program produced from source with undefined semantics, and the program is free to do whatever it wants.

I agree with you that one could write source like your example, compile it, run it, and watch it crash. I did just that before posting. We created a program with a "null reference," but we did not write a valid C++ program. We used C++ syntax to write a program with undefined semantics.

It would be nice if a compiler would warn about or even refuse to compile such code, but your example only manifests potential undefined behavior. If that called operation could never return a null pointer, the program would be unsafe only in surface syntax but still well-defined. It's easier to see the potential for undefined behavior in a given case than to prove that it exists.

Steven E. Harris
Wednesday, August 13, 2003

On the topic of "undefined behavior", I love the canonical story about this, involving the Fortran 77 (I think?) specification.

(Someone correct me if I get this wrong)

The fortran spec essentially said, "behavior on compile errors is undefined", which meant that if your program had compile errors, a fortran compiler could go ahead and output whatever it wanted -- like, say, a hello world program. Or your program, but with all the source generating errors ignored. Etc.

Steven C.
Wednesday, August 13, 2003

Steve C,

"When you dereference a pointer in a function call, the dereference happens at the call site, not in the callee. So the seg fault will occur at the call site, thus you never had a NULL reference to cause issues."

I just tried it with this very small example:

--/ snip /--

class TestClass
{
public:
virtual bool Method() {
    return true;
}
};
    
bool TestFunction(TestClass& obj)
{
    return obj.Method();  // Fault occurs here
}

TestClass* nullPointer = NULL;
TestFunction(*nullPointer);

--/ snip /--

The failure occurs inside the TestFunction and not at the call to TestFunction.  This is with the Metrowerks compiler but I'm sure it's true for all compilers.

The dereference does not occur at the call site because it's not deferenced at the point.  Instead, because it's a reference parameter, the address is pushed onto the stack.

If the parameter was not a reference parameter then you would be right...  the dereference would occur at the call.

Almost Anonymous
Wednesday, August 13, 2003

"Now, of course, a real program would keep running and eventually trip up on trying to read an invalid address. But you're not even running a valid C++ program; you're running some bastarad program produced from source with undefined semantics, and the program is free to do whatever it wants."

Obviously it's good practice to test for null pointers...  but that doesn't make the program invalid.  If that were true, 99% of all C++ programs are invalid!  In another topic here, there was a discussion about how common it is not to check for memory allocation failures!

Almost Anonymous
Wednesday, August 13, 2003

Visual C++ 6 and 7 allow the null pointer dereference through.  The address of the reference becomes 0 and attempts to use it result in an access violation. 

At least for Visual C++, in assembly, "int& r = *p" doesn't do much beyond copying a value (the address).  I have heard of compilers that segfault on dereferencing null pointers but I'm guessing that they insert code to do this?  Personally, I'd rather *not* have such things inserted automatically (unless some debug flag is specified during compilation). 

SomeBody
Wednesday, August 13, 2003

SomeBody,

Compilers don't segfault on dereferencing null pointers, it occurs in the processor (an interrupt) when the memory address of zero is accessed.  Nearly all platforms have the address of zero marked as invalid and protected.  It doesn't require any extra code inserted by the compiler.

Almost Anonymous
Wednesday, August 13, 2003

Right.  That's what I was saying.  However, there do seem to be some platforms/compilers that will result in some sort of crash at the point of dereference (I was trusting Steven C. that it's a segfault).  In order to get this behavior, I'm guessing that the compiler inserts extra code at the point of dereference. 

SomeBody
Wednesday, August 13, 2003

The interrupt that occurs on a bad memory access trapped by the operating system (to produce a "general protection fault") or by the debugger (to trigger a breakpoint). 

Nothing special is required by the program; this is entirely handled by the processor and the OS.

Almost Anonymous
Thursday, August 14, 2003

Right, but there's no memory access at the point of dereference (such as for the address held by p in "int& r = *p").  That's the point.  Am I talking in gibberish today or something? : )

SomeBody
Thursday, August 14, 2003

Asking for an address doesn't cause a deference (as per your example).  In all the compilers I've tried no error would be issued for that.

Try my code example above -- the error only occurs at the point of *actual* deference (the method call) and not when it's passed to the function.

Almost Anonymous
Thursday, August 14, 2003

AA, you are confusing the symptom with the disease. (You are also using the word "dereference" to mean "attempt to read the memory a pointer is pointing to", which is not what the word actually means; I think this is why this thread seems so confused.)

It is true to say that a reasonable compiler can produce code that will cause an illegal memory access attempt at some point after executing whatever code was emitted for the line with *nullPointer in it. But that illegal memory access is a symptom of the error, not the error itself.

Daryl Oidy
Thursday, August 14, 2003

"You are also using the word "dereference" to mean "attempt to read the memory a pointer is pointing to", which is not what the word actually means; I think this is why this thread seems so confused."

You are correct; dereference is a language concept.  But rarely do you deference a pointer and then not access it!

A reference in C++ is just syntact sugar... every access of a reference variable causes an implicit deference of the underlying pointer just before the access.

"But that illegal memory access is a symptom of the error, not the error itself."

Aren't we all in agreement on that?!?

Almost Anonymous
Thursday, August 14, 2003

Geez, now I know why so many people dislike C++ programmers. : )  Because we need to be detail oriented to comprehend the language, and thus like to split hairs and beat a dead horse...

But in that spirit, I will split another hair.  There is *no such thing* as a NULL reference.  Not for the reasons stated here (which I also agree with, see earlier posts).

NULL is language thing -- a special pointer value that is guaranteed to compare unequal to valid pointers.  It is only an implementation dependent coincidence (although a universal one AFAIK) that a NULL pointer happens to have the binary value 0.  (And I know NULL can be #defined to 0, but 0 in a pointer context is NOT the integer 0).  So what you really are discussing is whether a reference can have the binary value 0.

A NULL reference means nothing and does not exist *by definition* (can't argue with that).  A 0-valued reference is possible under many implementations if you assign it to a dereferenced NULL pointer.

And is still not the reference's "fault" that they can be 0, or make them a bad language feature, or make the standard untrue.  Consider my earlier example of overwriting the v-table:

Derived d();
Base* b = &d;
*(unsigned int)b = 0; // will overwrite the v-table pointer on many implementations
b->virtualFunction(); // no longer calls Derived::virtualFunction

A program like this will COMPILE and RUN.  But wait, the standard says that virtualFunction should be called polymorphically in this case.  Isn't the standard wrong???  What a shitty feature, if it doesn't even work all the time!!!

No, it is no more the "fault" of references that they can be 0 than it is the "fault" here of virtual functions here that the right function does not get called.

Andy
Thursday, August 14, 2003

Mea culpa -- I compiled with some extra runtime checks by accident, which led me to believe that I was correctly interpreting the issue.

In fact, AA is correct the program (both his and my test) won't segfault under vanilla options until inside the Test() function.

I still maintain the problem is NOT a NULL reference but the NULL dereference which created that reference.

But really, now we're just arguing in a circle.

Steven C.
Thursday, August 14, 2003

"But really, now we're just arguing in a circle."

I think, somehow, we actually all agree with each other. 

Frightening isn't it...

Almost Anonymous
Thursday, August 14, 2003

*  Recent Topics

*  Fog Creek Home