Fog Creek Software
Discussion Board




Joel's algorithm is flawed

>> "This turned out to be a great example of one of those places where "the last 1% of the code takes 90% of the time." The first draft of the code looked like this:

1. Open the file
2. Read it all into a big byte array
3. Store the byte array in a record

Worked great. For reasonable sized existing files it was practically instantaneous. It had a few little bugs, which I worked through one at a time."

Joel's problem is that his algorithm is flawed.  Step number 1 is ok, but steps 2 and 3 need to be improved.

Why would you read a file into 1 huge byte array?  Why would you not do it in chunks equivalent to the speed of the machine?  (If this is what he is doing then my question would be how would a child process improve the usability of the main application?  Would'nt the computer still be slow due to all the IO.)

Considering that he is using DAO and VB the algorithm would go something like this:

1. Open File, Establish Connection to DB, Initialize ProgressBar
2. Begin transaction on connection
3. DoEvents
4. Read in X Number of bytes from file into buffer where X is determined by the speed of the machine.
5. (optional) DoEvents
6. AppendChunk to OLE field in rs AND update rs.
7. Update Progress Bar
8. Repeat 3 through 7 until EOF = true
9. Rollback or Commit Transaction depending on error status

I don't see why you couldn't perform two of these at the same time.  I think even with a child process the computer would still be slow because of all the IO.

Just my $0.02.

EiEiO
Thursday, December 04, 2003

Your logic is flawed.

In particular you seem to think 'computer running slow' and 'application is totally locked, completely, showing the user no progress status or indication that it isn't running through a tight infinite loop' are equal.  They aren't.  One is unavoidable when dealing with large data sets, and one MUST be avoided if you expect customers to use your software.  Which one is which is left as an exercise for the reader.

Mr. Fancypants
Thursday, December 04, 2003

The trouble is the DoEvents.

Without it, the app is frozen and even the progress indicator won't be painted.

With it, there is a chance that the user will do something to the application that I'm not prepared to handle (e.g. insert another, smaller file, which finishes before I'm done, and then foreign keys may get crossed.)

For very short files, I'd rather not have a DoEvents at all. And they're short; might as well slap the whole thing into a byte array.

For very long files, DoEvents is dangerous and causes synchronization problems. For these files I'd rather have the copy going on in another process. To be fair, that process will be using the algorithm your provided, but it does not have any synchronization issues with DoEvents.

Joel Spolsky
Thursday, December 04, 2003

With the above algorithm the progress bar is updated just fine, the data is put into the database just fine and the computer doesn't stop responding or for that matter "slow down" much do to IO.  You need to explain yourself and your viewpoint more clearly.

EiEiO
Thursday, December 04, 2003

I was referring to Mr. Fancypant's comment.

EiEiO
Thursday, December 04, 2003

You spelled due wrong!

Mr. Fancypants
Thursday, December 04, 2003

Ok that makes sense.  DoEvents = Bad.  Bad = DoEvents.  DoEvents = False.  NULL == DoEvents.  Don't DoEvents.

EiEiO
Thursday, December 04, 2003

I love this kind of stuff.  Basically I had to try it out and see what Joel was talking about, so I created a sample program. (I make lots of those heh)

http://www.sswltd.com/downloads/big%20file.zip

(You need VB 6.0 and ADO 2.7? to run it.)

Run the Project1.exe and a form will come up with some rectangles being blitted.  There'll be two buttons Create File 1 and Create File 2.

Try Clicking 1 at a time and see what happens.  Notice that when you click both of them (on the same form) the first progress bar will stop and the other will keep going.  Same process, so execution simply continues in a different procedure.

Now click the P2 button.  This spawns a child process (1 allowed per form.)  Now click two buttons (1 on each form).  One progress bar on each form should be moving.  Separate processes so the processor is controlling who gets to run.

Now click all 4 buttons.  Things really start to slow down now. 

Lot of synchronicity issues that need to be dealt with here, but otherwise pretty neat.  Actually made me realize some issues that could arise in my code.  (Using DoEvents, but I usually display a modal form with a progress bar.)

Dave B.
Thursday, December 04, 2003

Joel,

<quote>
The trouble is the DoEvents.

Without it, the app is frozen and even the progress indicator won't be painted.
</quote>

Using Control.Refresh is better than using DoEvents.

<quote>
With it, there is a chance that the user will do something to the application that I'm not prepared to handle (e.g. insert another, smaller file, which finishes before I'm done, and then foreign keys may get crossed.)
</quote>

That is why Control.Refresh is better. Otherwise, use DoEvents but disable all the forms beforehand.

BTW, threads (or apartments in VB) are very safe. If you run two separate threads in the same EXE it is not significantly riskier than running an EXE that communicates with a separate ActiveX EXE due to the use of apartment threading. Admittedly, debugging is fun since the VB IDE does not support multiple apartments.

Seeya

Matthew
Thursday, December 04, 2003

Dave B. is the upper 20% programmer we are talking about in the other thread.

I like Dave B.
Thursday, December 04, 2003

Dave B,

<quote>
Lot of synchronicity issues that need to be dealt with here, but otherwise pretty neat. 
</quote>

The synchronicity issues are the hard bit - I think that is part of Joel's point. Your code is good in that it does show the importance when using DoEvents of only using local variables (not module or global variables) - which of course minimises problems that can occur. And it does show the dangers of using DoEvents too - for example, should the user be allowed be allowed to click Create File 1 twice on the same form (and what should happen if it IS allowed)?

<quote>
Now click the P2 button.  This spawns a child process (1 allowed per form.) 
</quote>

Using an ActiveX EXE with multiple threads and callbacks would be just as effective, plus allow easier communication between the multiple instances. Its been a while since I looked at them, but I believe the Coffee examples on the VB CD give a reasonable primer in that area...

Matthew
Thursday, December 04, 2003

What's wrong with just using DoEvents and a modal progress dialog?  It's an incredibly simple solution that can be thrown together in a few minutes, complete with a cancel button, and easily made reusable for similar things in the future. 

SomeBody
Thursday, December 04, 2003

SomeBody,

Joel's article states:

From a UI perspective, what I really wanted was for long operations to bring up a progress bar of some sort, along with a Cancel button. In the ideal world you would be able to continue doing other operations with CityDesk while the file copy proceeded in the background.

Note the final sentence. That is the fun bit. Your suggestion:

<quote>
What's wrong with just using DoEvents and a modal progress dialog?  It's an incredibly simple solution that can be thrown together in a few minutes, complete with a cancel button, and easily made reusable for similar things in the future. 
</quote>

does not allow the functionality as described in the second sentence.

Seeya

Matthew
Thursday, December 04, 2003

Aye Matthew.  Some things I would add (off the top of my head):

1. "Create File 1" and "Create File 2" should probably be made non-reentrant

2. Since the names of the temporary text files aren't different, one thread might have to wait while another finishes loading the file into the database.  Or maybe each file could have different names.

3. DoEvents should be removed and replaced with .Refresh

4. Error Handling for the database needs to be added.  Two process could be trying to write to the DB at the same time.

Lot of stuff could be added, but my original intent was just to see how how one process behaves verses many.  Anyway, neat stuff and thanks for suggestions.

Dave B.
Thursday, December 04, 2003

Matthew, I realize that Joel said that in his article but note that he included the disclaimer “In the ideal world”.  I took that line to be expressing more of a bonus from the solution rather than the primary justification for it.  If someone is importing a file, it’s probably because they want to do something with that file at that time so what use is the ability to do other operations during the import?

DoEvents combined with a modal dialog meets all of the non-ideal world goals – a progress bar, a cancel button, and the prevention of dangerous operations by the user – and can be completed in a fraction of the time required for the out of process solution and in a much less complicated manner.  The saved time could then be devoted to features and fixes that would be useful to more than just 1% of CityDesk users (I suspect that this estimate is way high). 

The multiple process solution brings up a ton of issues such as what do you do if the user quits the parent app?  Joel also mentioned some mechanism for the child to notify the parent when processing is done but what happens when the child crashes (as CityDesk just did when I tried to import a 300 MB file)?  Of course these things are solvable but it’ll take a heck of a long time to get it right. 

I’m not saying that Joel is wrong.  I’m sure he realizes the tradeoffs involved in spending so much time on such a minor thing.  I was just curious why he’d reject the simple but adequate solution in favor of the complex but ideal solution. 

SomeBody
Friday, December 05, 2003

i'll need to re-read the article to be certain, but there are other potential advantages to the 'separate process' architecture: future automation.

imagine the process is like this: launch citydesk.exe (or whatever it's called) with the commandline /importfile:c:\temp\file\name. it will then pop up a progress dialog with cancel, etc. the user can switch back to the main app, or whatever.

in the future, you could write a simple 'stick in citydesk' batch script which called the command line. like a test app.

also note this is the model which IE uses when you download a file: that little 'file download' window is basically a browser window, it just looks and acts different. 

mb
Friday, December 05, 2003

> I was just curious why he’d reject the simple but adequate solution in favor of the complex but ideal solution.

That was the whole point of the article.  Craftsmanship, to Joel, means that all the screws line up: the complex, ideal solution.  He's taking pride (understandably, I wish I were so lucky) in the fact that his organization CAN do this.

Ian Olsen
Friday, December 05, 2003

Which type of application would you prefer to use: an adequate one or the ideal one?

The Pedant (Brent P. Newhall)
Friday, December 05, 2003

>> "Which type of application would you prefer to use: an adequate one or the ideal one?"

Hrm... I would choose the ideal one.  As I think most people would.  Why?  Because they have a choice.  If a company came out with another spreadsheet product today, would you switch to it from Excel?  I don't think you would.  Especially a version one product.

All applications can be rated on the adequate to ideal scale.  And this scale is really just a measure of the time an application has been under development.

Adequate applications almost always equal applications that have been just been released, i.e. Version 1.0, or applications that have never been maintained but are still in use because no one maintains them or writes a new one.

Ideal applications have been around a while and people have conformed to them and are comfortable with them.  They are frequently maintained and new releases are often made every 12 - 18 months.

Bear
Friday, December 05, 2003

>>
Which type of application would you prefer to use: an adequate one or the ideal one?
<<

This is about an adequate versus ideal *feature*, not application.  Focusing on making fringe features ideal wastes time that could be spent on making more often used parts of the application better.  Head over to the CityDesk forum if you think there aren't other areas of the application that people would like improved. 

SomeBody
Friday, December 05, 2003

I normally agree with about 99% of what Joel says. Not this time.

This is that 1%.


I have to disagree with Joel on this one. It's all about opportunity cost (i.e., tradeoffs).  WHAT feature is he giving up to make this one perfect?

As a CD user, I think his effort was mispent.

I'd RATHER have other functions in CD than to have this one work perfectly.

I'd rather have the simple design (modal progress window with cancel button) AND some more assistance with importing my existing website into CD.  Not only will that make ME happier as a CD user, but I think it'll get him more sale$.

The case of transferring a big file on a slow connection, AND wanting to do something in CD is very very unique.  I'm willing to have a non-perfect solution to something I only user rarely.


"The perfect must not be the enemy of the good."

Entreprenuer
Tuesday, December 09, 2003

*  Recent Topics

*  Fog Creek Home