Fog Creek Software
Discussion Board




Software Usability

Hi Joel,

Very cool idea, this 'Ask Joel' forum.

Anyway.  Every time I use another message board, I can't help but notice how insanely usable this site is.  I haven't read your entire archive, but I've not seen any essays on code usability.  Since the essays here are so insightful regarding UI usability, I'd love to know if you have any opinions/advice about the state of code usability. 

* What are your general thoughts on usability at the interface between the application and the library/platform/package/COM component? 

* Do you see a relationship between usable software and design patterns? 

* Do you believe that elegant interfaces into software might increase user productivity by, perhaps, orders of magnitude?

To me an elegant, easy to use object model is the quintessential aspect of beauty in the art of computer programming.

Adam N

anon
Tuesday, February 24, 2004

Good point. This was actually the first thing I did in my career: designing the VBA programming environment for ease of use.

There were a bunch of things we did there for usability considerations; I can't remember all of them. Here are some things -- you may think some are misguided, but they were all intended for usability:

(1) indexes are one based. That's how humans count. Zero-based is better, I agree, but one-based is what humans expect, and the program model must conform to the user model for ease of use.

(2) No APIs use abbreviations. Everything is spelled out, because when the API uses an abbreviation, you have to remember what the abbreviation is, but if everything is spelled out, this is one less thing to memorize.

(3) APIs should never require that you understand pointers. Pointers are one of the great dividing skills between hard core programmers and dabblers. 90% of the people who dabble in code and need to write macros just don't understand pointers and can't be expected to make them work. So for example in the Excel object model, there is >always< a way to get an index of something in some collection which you can later use to get the original thing back. Same effect as a pointer but easier to understand for the pointer-deficient.

(4) Lots of rules for consistency. Consistency is one of the best ways to get usability because it allows learned skills to be reused elsewhere. For example in the Excel object model every collection class has a plural name and every non-collection class has a singular name. If you have a collection class you are guaranteed to be able to For Each it, to call Count on it, etc.

The original goal was to have ALL OLE Automation interfaces follow these rules, which they mostly do inside Microsoft, but outside vendors thought they were clever and did things like create 0-based collections in the OLE Automation interfaces, and nowadays when you work with an Automation object you have to study the documentation to figure out if it's 1-based or 0-based... which the documentation rarely mentions, for some reason.

These are some of the things that I remember.

Joel Spolsky
Fog Creek Software
Wednesday, February 25, 2004

Hi,

Just want to say something about the following :

---
The original goal was to have ALL OLE Automation interfaces follow these rules, which they mostly do inside Microsoft, but outside vendors thought they were clever and did things like create 0-based collections in the OLE Automation interfaces, and nowadays when you work with an Automation object you have to study the documentation to figure out if it's 1-based or 0-based... which the documentation rarely mentions, for some reason.
---

I think most (advanced) OLE controls and components are written by C/C++ developers, not by VB developers.

Imagine the frustration of those C/C++ developers, if they suddenly have to think in 1-based terms.


Greetings from Belgium,


Jeroen

Jeroen Jacobs
Wednesday, February 25, 2004

But you have to think in units of frustration/sec over the lifetime of the software product.


Wednesday, February 25, 2004

Agreed -- in an API designed for C/C++ programmers 0-based is the way to go; in an API designed for BASIC programmers and non-programmer dabblers 1-based is the way to go.

Joel Spolsky
Fog Creek Software
Wednesday, February 25, 2004

This thread is extremely interesting. :) Very wise words!

MX
Wednesday, February 25, 2004

If you're writing .NET code, you may want to look into FxCop.  It's a free download from the .NET community site (gotdotnet?) and it checks your assemblies for conformance against the .NET class library guidelines, like using pascal casing in some places and camel casing in others. (i.e. class and method names are LikeThis() but parameters are likeThis, interfaces start with capital I, like IThis, etc.)  It is fully configurable and automatable.

Something else to look into is using the code documentation features of C#, which are a bit like JavaDoc.  Couple that with NDoc (which is open-source) and your class library can ship with documentation.  NDoc is actually more sensitive than the compiler and will sprinkle red "missing" sentences in its output when you forget to replace the "Summary description for class" comments, etc.

Plus there's always NUnit and then, while you're at it, script the whole thing with NAnt and you have also knocked off a few items from the Joel Test.

Oli
Wednesday, February 25, 2004


---
Agreed -- in an API designed for C/C++ programmers 0-based is the way to go; in an API designed for BASIC programmers and non-programmer dabblers 1-based is the way to go.
---

But COM and ActiveX are supposed to be language-neutral. Therefore it seems logical to use 0-based since this is wat most programming languages use. treating ActiveX and COM as VB-API's is a serious simplification (is that a correct word?)since COM/ActiveX are also used by Delphi and a lot of other languages. Of all those languages VB is the only one that's one-based.

Why should activeX developers follow the convention one 1 programming language and ignore the convention of all the other ones ?

Besides, I'm programming VB and LotusScript (a VBScript-like language used in the Lotus Notes environment) and I' ve always used 0-based collections. (Maybe because I programmed assembly language and C back in the DOS days :-) )


Greetings :-)

Jeroen

Jeroen Jacobs
Thursday, February 26, 2004

COM is supposed to be language neutral but COM Automation (IDispatch) is specifically for scripting languages.

Use 0-based for COM interfaces and 1-based for IDispatch interfaces.

Joel Spolsky
Fog Creek Software
Friday, February 27, 2004

You can't be serious. So if someone does

dim obj as object: set obj = new CObject

you'll get different indexing than if someone did

dim obj as new CObject

Inconsistency within the SAME language on the SAME object???

Humbug
Friday, February 27, 2004

There always is a dillema in UI: follow User's habits or take some time to make any operation in 'one click.'

Alan Bekoev
Friday, February 27, 2004

I can easily agree with most of what you say here, but I'm afraid I don't agree on the subject of array indexing.

0-based indexing does require a small mental adjustment at first from a novice programmer, but I believe it's a one-time adjustment which will generally improve the readability of their code (because variable names and code structures can usually be expressed in a manner that comes closer to English usage if using 0-based as against 1-based indexing) and will serve them well if they do choose to move on to other programming environments.

A small (and in my view, unnecessary,) but fundamental concession to user expectation can lead to a great deal of confusion and problems with interoperability and mental context-switching - what a shame.

Thanks for explaining the reasoning, and sorry I can't agree on that point!

Gavin Greig
Friday, February 27, 2004

Its beyond me how zero based can be justified as being better than 1 based on any ground that have to do with ease of understanding by the programmer.

We count /things/ from 1 - so the 1st item in the list has the index 0. This is logical how?

We put up with it for the dreaded "historical" reasons - surely its a hangover from the need to manage stuff based on pointers, a convenience for C compilers...?

Murph
Friday, February 27, 2004

It's the compiler / collection's job to deal with efficiency.

Remember in Pascal / Delphi you can dim a class with starting index of 13 if you need to. The indexes in the user's domain are what's important. It all depends on who your audience is. If you are making objects and intend to sell them to excel hackers, use 1 base indexes by default.
If you are selling to C# or other COM developers, then by all means use 0 base indexes.

I have sinned grievously on this in my own implementations. The right answer isn't always obvious. (Hence the need for excellent documentation)

Christian Mogensen
Friday, February 27, 2004

> We count /things/ from 1 - so the 1st item in the
> list has the index 0. This is logical how?

It's logical in that you're counting displacement from the starting address (or item), not the number that the address (or item) would be assigned in a sequential list.

An early mentor (back in my Big (Blue) Iron days) told me that observing which way someone counted was one of her ways of telling whether that person "thought like a programmer" or not.

- former car owner in Queens
Friday, February 27, 2004

One non-computer based example of real-world zero-based indexing is floor numbering in buildings. In the UK, and some commonwealth countrys, the floor that is at street level is called the Ground Floor, and the 1st floor is up the stairs from there. I think the US starts floor numbering at 1 for the street level floor. So for parts of the world, the streel level floor is the zeroth floor, and you have to climb a flight of stairs to get to the oneth (hey, it might be a word!) floor.

I wonder if there is any difference in the ease with which programmers of the different countries can learn zero-based indexing?

Dave Webb
Friday, February 27, 2004

I personally get annoyed with 1-based counting in general, but that's just me.  (I also get annoyed that they teach base-10 instead of, say, base-16, but that's treading into the unrealistic too.)

For numbers in general, it makes sense to do 0-based counting:  0-9 is the first ten, 10-19 is the next ten, etc.  It's logical:  The tens-place digit tells you which set you're in.  1-based counting requires an implied speculation to properly group counted things.  We're just used to it.

The fact that A.D. years started from 1 instead of 0 causes some confusion in computing B.C. dates.  You skip 0, you skip a place on the number line.

IMO, it was a mistake of any programming language to define 1-based indexing.  I see it as a semantic regression more than a usability feature.  I remember in GW-BASIC you could specify OPTION BASE 0 to use 0-based indexing.  At the time, I didn't appreciate why and thought it was silly.

My argument is thus:  1-based indexing screws me up as badly as 0-based indexing screws up, say, a VB programmer.  (It's my big beef with Numerical Recipes.)  I don't know, but I'd suspect that most programmers who will be using most custom APIs would prefer a 0-based index.  For language-independent APIs, I'd argue 0-based indexing, with the wish that all languages let you specify a preference as a pragma.

Trevor Schrock
Friday, February 27, 2004

Also on ships. 

The main deck is essentially the 'zeroth' deck.

The first one above is the 01 deck.

The first one below is the 1st deck.

If I remember correctly.  Been a while.

jorge fortuno
Friday, February 27, 2004

I have three apples:
apple#1, apple#2 and apple#3
Yes sir, I have three apples

I don't care how your computer addresses my three apples here on earth or on mars.

I still have my three apples
apple#1, apple#2 and apple#3

and that's the way I learned how to count.

FailedMAthInElementary
Friday, February 27, 2004

If you step into an elevator press the button marked '1', where does the elevator stop? I'm sure someone will correct me, but I think in the US it's the ground floor, but in the UK it's the first floor above ground.

What about the other countries?

Interaction Architect
Saturday, February 28, 2004

It depends on what you're doing. Most things fit nicely into "1-based" indexes.

With the list ['a', 'b', 'c'], the "first item" is 'a' - index 1. We don't have a way of saying "zeroth item". If you ask someone where is 'b', they will tell you it is the second item.

With a "1-based" system, the tenth item is at index 10. The last item in an 7 item list is at index 7.

The only place I can think of where 0-based makes sense is if you are actually counting things and you need 0 to represent having no things. [0,1,2,3] is then a list of states - the state of having no apples, the state of having one apple, ...
What function do I call when I have n apples?  f = states[n]

I prefer not to give my full name.
Saturday, February 28, 2004

I too prefer one-based (rather than zero-based) indexes. Another syntax issue I still dislike is using the equal sign ( = ) to mean assignment and then using two equal signs to actually mean 'equal' ==. Everyone undertsands the equal sign, why redefine its meaning?

Mediocre programmer at best
Sunday, February 29, 2004

There is a distinction between counting and indexing.  If you are counting with whole numbers zero means that you have none and one always means that you have one and not two.

Zero based indices assume that the index is the "offset from the start."  So then the first entry has zero offset.  This is natural in the pointer arithmetic world of C where array[index] is equivalent to *(array + index * sizeof(type)).  Sorry if I don't remember K&R exactly.

One based indices are more the counting kind where you count the number of entries that you have seen.

Where to start indexing is like the Lilliputian big-endian vs. little-endian war.  Use foreach for LBound() to UBound(), if you can, to enumerate an array.

Doug Ferguson
Sunday, February 29, 2004

Interaction Architect: in Mexico, pressing button "1" takes you to ground floor, except in elevators that, in addition to numbers, have one marked "PB" (which is spanish for Ground Floor).  I would say the 1/PB button ratio tends to be 0.5

So I guess in this country everybody first looks at the first button and decides according to it. Which may be the reason why I had never heard anyone complaining about 0 or 1-based arrays... :-)

Dario Vasconcelos
Monday, March 01, 2004

A good API doesn't force me to count, so I won't be troubled by indexes starting with 0 or 1.
A good API offers an Iterator. A good language let's you just write "foreach".
I'm thinking of the BioPerl API, and Perl itself, but I think it held true the one time I wrote a script that used the Word Object Model. Or maybe I just unloaded all the collections onto Perl arrays and iterated using the language.

Dotan Dimet
Monday, March 01, 2004

The question whether 0-based or 1-based arrays
are in any way preferable depends on the way
you measure it. I guess that someone made a
list of code examples (in their favourite
programming language) and carefully counted
the keystrokes, for  example:

for(i=0;i<array.length;i++) // 27 characters

for(i=1;i<=array.length;i++) // 28 characters

The choice is about as arbitrary as on which
side of the road you should push your bicycle.

However, it is quite important that you use
_some_ convention, since  that tends to
minimize the amount of intellectual work you
need to deal with arrays. In particular, it
helps to avoid those off-by-one  errors. 

When designing for COM, or in fact any kind of
interface, you have to choose between the
convention shared by the people that read, fix,
maintain your code, or the people that use
your interfaces.

By the way, mathematicians also don't agree on
whether 0 is a natural number. A logician says
"yes" (because it helps when starting an
induction), a number theorist says "no" (because
division by 0 is awkward). They both agree,
however, that 0 is an integer.

Martin Roller
Monday, March 01, 2004

My personal favorite is a language disinction between what is "true" and "false".  I'm used to false=0 and true <> 0 (typically true = 1)

When using a VBX some years back (in a non-VB language) the docs were very clear on it returned "true" for success "false" for fail.

It took a lot of hair pulling before we deduced that true=0 and false = -1. (I was told this was true for all VBX's - which may be apocrophal - but hey, this was my first one...)

I just _love_ programming.....

Bruce

Bruce Johnson
Tuesday, March 02, 2004

0 indexing goes right back to the days of machine programming, where you indexed from a Start Address (e.g. a machine word address) using an Index Register to hold the increment.

So with the data starting at word 4000, the index register was always set to 0 initially.

When the first high-level languages were developed with arrays these were frequently lower and upper bound, e.g. integer X[1:100]. Where you started from was personal preference or laid down in the company programming standards.

In the 70s with the development of BCPL and then C, which were meant to be slimline languages for systems programming, the use of arrays with implicit lower bounding of 0 became prevalent because it allowed the use of address pointers for speed. Computer Science had just squared the circle again, high level index registers.

As for APIs, array indexing is generally language or implementation dependent, it should be hidden from the API user.

Which is preferable? Well maybe its cultural like the examples of elevators in Mexico. I wonder if those that start at 1 were imported from the USA and those with PB were local or from another Spanish speaking country?

joneverett
Wednesday, March 03, 2004

Whether indices should be 0-based or 1-based is half of a specific instance of a more general question: when you represent an interval with upper and lower bounds, should the interval be open, closed, or half-open (and if half-open, on which end?)

I prefer half-open, for the following reasons:
- measuring the number of items in the interval is easy: merely subtract the bounds.
- dividing an interval into subintervals is easy: [N, M) is [N, X) plus [X, M), with no overlap or missed items
- knowing whether two intervals meet end to end, with no overlap or missed items, is easy: alice.upper = bob.lower.

So, if you're going to use half-open, then the interval covering your entire 10-item array is either (0, 10] --- i.e. [1, 10] --- or [0, 10), which is the [0, 9] method we usually use in C.  Which you choose is largely a matter of preference, but it's sometimes simpler to iterate for (i = 0; i < 10; i++) than for (i = 10; i > 0; i--), and certainly more in accordance with how a human, or an input or output file, would do things.

So I think that's why "people who think like programmers" number arrays from zero.  The other available evils are counting backwards, or putting lots of +1 and -1 warts on your code because you decided to use closed intervals.

In high-level languages, though, I rarely use array indices.  Perl's excessively orthogonal subroutine calling mechanism makes it a bit of an exception, but these days, I usually use foreach, map, filter, foldl, and various whole-array operations, rather than writing a lot of code that explicitly indexes into arrays.  I mean, you only have to write the binary search routine once.

(PS: Binary search and quicksort are two examples of algorithms that have lots of +1 and -1 in them regardless of how you represent your intervals.)

Kragen Sitaker
Wednesday, March 03, 2004

Out of curiosity - how do VB.NET and C# implement this?

Walter Rumsby
Wednesday, March 03, 2004

I've read that the reason the Roman Empire fell was that, not having a symbol for 0, they had no way to indicate successful completion of their shell scripts.

Actually, I think the concept of counting generally assumes the existence of the things being counted.  If you're counting sheep you start with 1, unless you don't have any sheep, in which case you'd better start with 0.

John Seal
Friday, March 05, 2004

>> I also get annoyed that they teach base-10 instead of, say, base-16, but that's treading into the unrealistic too

And as you know, there are 10 types of people:  those who understand binary and those who don't.

GML
Monday, March 08, 2004

>> There is a distinction between counting and indexing.
To me, this is the crux of the matter. Indexing is meant to be an offset, and this relates to the importance of 0-based indexing. With 0-based indexing, indexes can be safely added together; with 1-based indexing, you have to add indexes and then subtract 1 (or more correctly, subtract 1 from each index, sum the indexes, then add 1).

In a separate comment..
>> (PS: Binary search and quicksort are two examples of algorithms that have lots of +1 and -1 in them regardless of how you represent your intervals.)

Not necessarily true. Here's a binary search written in java, that does a single "+1":

  public static int getIndex(int[] a, int alen, int key)
  {
    int low = 0;
    int high = alen;
    int mid;
    while (low < high)
    {
      mid = (low + high) >>> 1;  // Divide by 2 and trunc
      if (a[mid] < key)
        low = mid + 1;
      else
      {
        if (a[mid] > key)
          high = mid;
        else
          return mid;
      }
    }
    return -1;
  }

Paul Miner
Wednesday, March 10, 2004

If you want to iterate through a collection in Visual Basic 6, you can write

  For i = 1 To myCol.Count

because Visual Basic adopted Joel's philosophy of one-based indexing.  VB.NET also provides a collection class, but it is built from their zero-based containers.  You have to write

  For i = 1 To myCol.Count - 1

because there is a dummy element zero which the Count property unaccountably, uh, counts.

What a kluge, eh?  I hate that.

Bogon
Friday, March 12, 2004

I personally don't like 0-based indexes but I am used to it now that I've been coding in VB.NET for a good while.

I think the choice of collection bases in a language should reflect whether the language is a low-level language (LLL) or high-level language (HLL).

In a HLL, I would like to have high-level thoughts. So an index should serve as an *identifier* rather than a *locator*. I am more concerned about *which* item I want rather than *where* that item is (vice versa for LLLs).

1-based indexes behave like identifiers and say which item I want (first, second, third, etc), while 0-based indexes behave like offsets and say where the item I want is located (N'th item from location X).

Requiring a HLL (eg VB) to use 0-based indexes is not in tune with the nature and purpose of the language (same goes for 1-based indexes in a LLL like assembly).  Verdict? choose the base of indexes based on the "level" of the language and build interop bridges when the different worlds need to interact (afterall we already do that when two technologies X and Y are similar but not the same and need to interact eg COM/CORBA bridges).

Eric Mutta
Sunday, March 14, 2004

0-based indexes always give me the creeps, mainly for the reasons mentioned above.  Conceptually I think of an array as being nothing more than a numbered list, and if I want to look at, say, the fifth item in the list then MyList(5) makes an awful lot more sense than MyList(4).  Possibly having cut my teeth on 1-indexed arrays (I started with Sinclair BASIC) it's just habit, but I think it runs deeper than that.

For some purposes (bitmap addressing, etc) starting at 0 makes more sense (although as the origin of the screen tends to be at the top left, you can't quite claim it's akin to Cartesian co-ordinates, but it's close enough), but if I'm counting things I'll start at 1, thank you very much!

If it ain't broke, you didn't press hard enough.
Monday, March 15, 2004

Well, it really depends how you think about it. We're discussing and intermixing two concepts here; indexing and offsets. 0-based are offsets and 1-based are indexes.

When writting code you tend to think in terms of an offset, even though out lound you might say index. But when accepting input or writting output to the user, you better convert to an 1-based index if you wan the user to understand it.

A program could make the distinction plainly visible with no confusion by using english/or other language words such as first, second, 1st, 2nd etc.. But we haven't evolved computer programming to make it user friendly like that. Its easier to make the user guess. A more advanced user when sees a "0" in the list might make the neccessary connection and realize he's looking at 0-based list, same thing for a list with first item starting at 1.

Mark Bednarczyk
Tuesday, March 16, 2004

How's this for deciding:

If I'm in a high-level language, why should I have to bother with indexes? I love Perl's way of iterating through arrays

foreach $obj (@array_of_objs) {
    do_something_with( $obj);
}

(If you don't know Perl, don't sweat the $ and @)

I should only be using indexes into an array if I'm coding something low-level, so I'd rather use 0-based indexes, 'cause that seems to have the best impedance match with low level arrays (at least in c, C++, Java, the languages I've used).

Jonathan Shapiro
Friday, March 26, 2004

Actually, the reason that C uses zero-based indexing is quite simple.

By definition, in C the expression "x[i]" is the same as "(&x) + i". That is, indexing by "i" is the same as pointer arithmetic. So, since "x + 0" is always "x", that means that "x[0]" must be the same as "x".

So, within the C language, zero-based indexing is a matter of consistency, its just not consistent with the same concept as:
      "If you have a collection of I items, they will be x[1] to x[i], and adding another item would make that one x[i+1]."

Hence, the comment made earlier about "thinking like a programmer."

David Lathrop
Saturday, March 27, 2004

Not really directly on topic, but it has to be said: VB IS NOT A HIGH LEVEL LANGUAGE!

It's nowhere near expressive or powerful enough to qualify for that label (notwithstanding the fashionistas who label everything, including Java-the-language, as high-level these days).

yipyip
Sunday, March 28, 2004

True, "Visual Basic" is just a real visual programming language. Its a textual language with a GUI editor and a (semi-)WSIWIG form painter. Check ProGraph or Sirius (orinally on the Mac) for true visual programming languages. (I seem to recall posting a similar comment in another thread on this or another site recently.)

David Lathrop
Saturday, April 03, 2004

The defining quality of a number is that it is a scalar indicator of magnitude. The magnitude of a thing is a property of that thing. In the absence of that thing, its properties are also absent - to put it another way, magnitude is contingent on existence. Since zero indicates absence of existence it logically follows that it indicates absence of magnitude. Therefore zero is not a number: it is an anti-number.

Zero is a qualified null. It has type and indicates the absence of magnitude (if you think zero describes magnitude then I point out that if you allow this then you must accept that there are an infinite number of things sitting on your desk in zero quantities, including me. I'm evidently sitting on your desk so don't give me any lip, boy) and this typing makes its behaviour far more orthogonal; most arithmetic works with zero.

That said, array indexes are not scalar measures. They are ordinal identifiers. Every computer representation of the notion "zero" exhibits the same ordinal behaviour as all the represented numbers, so there is no good reason to forbid the use of the zero symbol as a hash key value. That's a fancy way of saying that zero works perfectly well as an array index.

Oh for crying out loud it doesn't MATTER so long as we are consistent. Let's vote. There are more script kiddies (I include all VB programmers in this category; this will no doubt annoy them to my lasting delight - small revenge for the damage they cause) than C or Java programmers, so there are more people who expect 1-based indexes.

So programmers of the world unite. 1-based only, so we don't have to keep teaching a horse to sing.

Peter Wone
Tuesday, April 06, 2004

As a former RPG programmer on the AS/400 (aka iSeries), I count array elements starting from one. As a Java programmer on the AS/400, I count array elements starting from zero.

I can live with that.

But does sysdate.get(Calendar.MONTH) have it return 0 for January and 11 for December?

My code is now littered with comments reminding me of this oddity, with mutterings on whether the parents of the responsible Sun programmer were married.

Doug Smith
Tuesday, April 13, 2004

I don't understand how anyone could put forward an argument for VB collections being 1 based while by default leaving arrays 0 based.

I realise that languages need to evolve but changing the accepted standard and adding inconsistencies into a language isn't a good start.

Joel has previously stated that alot of people don't understand pointers, people making an argument for 1 based collections fall into this category.

I also have serious issues when January == 0 but the 1st day of a month == 1.


To David Lathrop: Hi, VB is a high level language. goto google and enter "High level language" and definition.
And dont shout it makes you look like an idiot.

Joel V.
Saturday, April 17, 2004

*  Recent Topics

*  Fog Creek Home