Fog Creek Software
Discussion Board




Calculation of Results VS Storage of Results

One of the neat things about the computing industry is that we have incredible amounts of processing today.

I can remember using a seek command in FoxPro 12 years ago. In the mid 90’s we saw the advent of systems and code being used to retrieve a record form a database (we are talking about SQL server here).

This is means, it is quite common to see a whole SQL string used to retrieve ONE RECORD. If someone had told me this would happen ten years ago, I would laugh at you. I mean come on:

      Build a SQL string
      Send SQL string to database server
      SQL text is tested for syntax
      SQL text is parsed and a query plan is built
      The query plan is optimized
      The database engine then uses the query plan to retrieve our single record.
      The record is sent to the client.

Without a doubt, the processing and huge amounts of systems required to retrieve ONE record requires many times the processing to run most applications 10 years ago. In other words, that simple seek command in FoxPro now requires more memory, more processing and more software then the WHOLE APPLCTION did 10 years ago.

While this concept is compete insanity to anyone came from a older efficient computing environment, today this type of humungious processing to retrieve one record is standard fair, and we rarely hear how incredible this fact is.

This brings me up to my subject title, and the issue of  how having incredible amounts of processing allows one to change system designs.

Lets take the typical database application with inventory. We just sold a book, and thus must reduce the inventory (in stock) of this book by one book.

In older systems the traditional approach is:

  Load the inventory record for that book
  Reduce the quantity (in stock) field by one
  Write results back to disk

In a new system, since we have so much processing, we can now do the following:

  Run a query to total the amount of books in stock (this could be one record in a table, or simple the last stock taking day).
  Run a query to total the amount of that type of book sold

Our In Stock is the difference between the two.

Notice how in the 2nd approach, we DO NOT actually store the number of books in stock anymore. That value simply does not exist in the database. We CALCULATE the value when we need it.

My designs now certainly are pretty well moving to the 2nd approach, and I am finding some amazing improvements in terms of reducing the code required. Of course, if you have some server side code at the table level (triggers), then the savings by using the “calc” approach are not as pronounced, but even then can still result in reduced code.

A good example is that now my customer invoices do NOT actually store the total of the invoice, nor do I store the payments “total” against the invoice. I simply use SQL to CALCULATE  the results. This simplifies the user interface code by a large amount, since if the user adds a payment, or even has to remove a payment, or change a value, then I don’t need any code to “update” the balance owning.

Do you folks as a general rule now not store the results in a database if you don’t have to ?

Due to so much processing, are your designs now taking this into account?

Mine are!...

Albert D. Kallal
Edmonton, Alberta Canada
Kallal@msn.com

Albert D. Kallal
Sunday, September 22, 2002

Dude,
A company by the name of Relational Software shipped a product called Oracle waaayyy back in 1979. 

jack booted thug
Sunday, September 22, 2002


  "Notice how in the 2nd approach, we DO NOT actually store
  the number of books in stock anymore. That value simply
  does not exist in the database. We CALCULATE the value
  when we need it."

You can still do it the old way. Add an extra field, and a simple UPDATE will do the trick "as efficient" as before (actually, more efficient, because of indexes, cache, async writing, etc)

You have a point when you complain about the multiple steps involved when you want to do a simple operation. But the steps involved aren't expensive when you have a client-server app that maintains a persistend connection.

Of course, sometimes I think that the old way would be enough for our current "Web Apps". I hate these things.

Yo
Sunday, September 22, 2002

>>Dude,
A company by the name of Relational Software shipped a product called Oracle waaayyy back in 1979.

My point here is not about fact that we have relational database systems. My point here is that due to 1000’s of times of processing we now design products where we don’t store typical values that just 10 years ago it would be silly to consider other wise. So, we might have had oracle 20 years ago, no one in their right mind would have written a application back then that did not store values for “on hand” for inventory. We simply did not have the processing to even consider the approach of calculating values on the fly.

For newer software developers when building a tour reservation system, the number of people booked on a tour bus will NOT be stored. If you are from the old school of design, the number of people booked on a bus, or booked in a hotel room (or bananas in inventory)  will actually be stored in the database. Code will be written to maintain these values. Today, we don’t have to write that code anymore. You save writing code!

The real question here is then what are trade off between storing values, and calculating values?

How can one decide what approach is better? What approach will reduce the cost of the system (or give us more features for less cost)?

Having values stored can make parts of the system *much* easier to write. Reports, and even screens that display the “on hand” values are much easier to write. Thus, code that checks, or displays the “on hand” values is easier to write...since you only have to retrieve a record to display results. 

However, code to update the number of people on the bus or in the hotel is going to be MUCH more complex. Hence, the solution today, due to the increase in processing, is to not store that information any more. In other words, you don’t store the number of seats used on a bus. We don’t store the number of rooms used in the hotel.

Thus, using one approach, or the other...which is better? Right now, I am leaning towards the calc approach. With the calc approach, I can now import records directly from my palm pilot into my database, and my bus seats used will be correct. Look mom...no code had to be written to update the number of people booked!

Also, if a user cancels a trip, to release seats on the bus..I just delete the booking...again...no code needed to update the on hand seats!

Also, if anyone has ever written software that updates the “on hand” values, you will *ALWAYS* find some bugs, or some update problem that means the values stored do not match the orders in the database! Hence, you wind up writing some code that goes through the data to sum the values to fix and correct the on-hand values anyway! (anyone who has written a lot of software will understand this!).

Anyway...the jury is still out on this new approach. I find big gains in one area...but big losses in other area’s. It also is a big question of how well this approach will scale. I have used this approach in a recent project, and I am impressed.

Over all I find that the development process for the “calc” approach is better. It is better since you can prototype the application faster. Thus, it suites the iterative type approach used in software development today (as opposed to the older waterfall type model).

Anyway, I will continue to make notes as to the pro’s and cons, but the answer here is not easy at all.


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com

Albert D. Kallal
Sunday, September 22, 2002

Some food for thought from you, as always.

ryan ware
Monday, September 23, 2002

OK,  I'm sure I'm missing something, but I'll ask it anyway:

Suppose there are 2 travel agents "booking seats on a bus".  While you're making "offline" calculations for what may be the last few seats on the bus, your competitor books a small tour group into the remaining seats.  Only he issues the RPC write command and the database is informed of his commitment - but not yours.

While I'm no DB developer, it is generally my rule to work with the actual data rather than a shadow of it - unless there is some performance benefit (as you noted) and I can guarantee the write data lock.

FWIW.

d00bie
Monday, September 23, 2002

Doobie -- there are several ways of handling the situation you described. It's a common issue that has to be dealt with -- Lock on read vs. lock on update, contention vs. accuracy, etc. There are a few standard ways of handling it.

Albert -- you better beleive I store results. I take designs to 3NF, but then I denormalize for performance and convenience. Speed is essential for the product I work on, and we avoid calculation whenever possible. There are several waypoints from zero to final result set, and intermediate results are compared to the waypoint from the last time the calculation was run, and if at any point we're identical, processing stops there and the prior result set is returned. We used to calculate everything each time, but we've managed to reduce calculation times from an average of 20 seconds to an average of < 1. Not only does this improve response time (our life blood), but reduces system requirements as well since some resource-intensive steps later in the process can be avoided most of the time.

One of our selling points is response speed, so it's valuable for us to reduce cycles where possible. Employing some "old school" methods like you mentioned (read/alter/update) rather than calculate-on-the-fly makes all the difference for us.

Troy King
Monday, September 23, 2002

One thing you have to be wary of with some types of calc results is how they deal with change. Say you are calculating your totals for an invoice rather than storing them. How are you calculating the tax? You initially might just be grabbing it from a 'salesTax' property, say .05. But then what if that goes up (outside of Alberta that can happen ; ). Will all your old invoices still be accurate? Or more tricky, what about invoices for jobs that were completed before the raise, but still being processed? If you just store the total, and store the tax rate in the invoice itself, then you don't have to get into timestamping etc for every property.

For all properties, you have to consider this type of senario. What about the company address? It might be something you want updated on 'old' invioces once you move, but then if you ever compared the 'payment' object == the 'invoce' object, who knows it might not match anymore. Ok, dumb example, but I think that in general there is more power with calc'ing results, but I don't know if it easier or less code. I guess as always it depends on the situation...

Interesting subject : )

Robin Debreuil
Monday, September 23, 2002

If you want an easy way to get the value on hand without having the work of calculating it each time you need it then create a view.

A view will give you a simple interface to a more complex query.

Matthew Lock
Monday, September 23, 2002

But Robin, does this problem go away if you update instead of calculating on the fly? I don't think so. Your problem is more about keeping historical data and it's definitely difficult!

Btw, I agree with Albert's posting. Keeping the same data stored in different places is a bad practice that always creates problems somewhere along the line. IMHO hiding the calculations in a stored procedure is also very good in this case. You'll have better manouvering space later on.

Stefan Haglund
Monday, September 23, 2002

Actually Robin, I do copy the tax rate to EACH line of the invoice details. This is done due to your exact example of a tax rate change. In addition, some tours to the USA do not have GST tax, but certain options, and stuff like clothing purchases etc will in fact have a Canadian GST.

Also, in theory, the GST rate could change, and tours/services delivered *after* a particular date would have one rate, and tours/services delivered before a particular date would have another rate. This would be despite both of the tours being booked on the same day! In accounting rules, the tax rate is assumed is to be at the time of services delivered. Hence, I do in fact have a GST tax table *with* a date. Hence, my software will work if the GST tax rate is changed for a future date. I do in fact for each detail line copy the tax rate (thus, for some parts...I do in fact copy things..where they could perhaps be calculated).

However, I still don’t save values that I can easily calculate, and thus each detail item total is in fact a qty times the price + tax (I do not store the total..but only the unit price and tax rate). Thus, I DO NOT in fact store the total for each detail line. I re-calculate the total price for *each* line of detail, and then the total of each line is then the tour cost. This is done using what is called views in sql (as Matthew mentioned...).

Hence, I don’t store things I don’t have to!

Of course the primary thing here has to be the issue of performance.

I wrote the original reservation software 12 years ago. Back then, the total for a invoice was stored. Back then, the total number of people on a bus was stored. Thus, I had many years to think about how I would write this software differently next time around. Over time, I also had come across some articles and ideas on the concept of NOT storing results.

2 years ago that magic chance happened, and I had a chance to re-write my software. The old system was written in Pick (80,341 lines mv-basic code). The new system was written in VB. Actually it was ms-access/VB, and had 22,981 lines of code.

Deciding to store the number of people booked on a bus, or calculate the number of people on a bus was the hardest design decision I had ever made. I spent about 3 FULL DAYS thinking about his issue. I was actually pacing around in a room. Which path to take was not clear. Help!!! This is hard, and I felt I was about to jump out of a plane for the first time. I was scared, and still not 100% sure that this design was the way to go. There was also this nagging issue of performance, and I never attempted such a radical design. Worse, I had several people telling me that calculating everything was a dumb idea!!

Now I have written a complex system both ways. I have this rare chance to compare, and learn from the two different approaches.

As mentioned, different parts of the system get harder, and different parts get easier when you use the calc approach. However, two things occurred that give the edge to the calc approach:

1) Fear of performance problems did not occur. The little JET database engine is very fast, and can easily grab 10, or 20 thousand records in less than a second. In my case, only about 800-900 people would be booked to a tour, and thus performance never became a problem. It certainly does tax the system more then the old approach, but performance and response times are acceptable. Again, even in a large inventory system, this approach can work quite well, and response times should stay below one second.

2) I mentioned the issue of prototyping and testing. The calc approach is *much* more flexible this way.

For example, I stated to the client that you will have a clothing inventory “system” added to the reservation software (we did not have one in the original old system). Well, in fact for several months, they had a inventory table, and one could add, and price out clothing in a invoice. At this point I had not WRITTEN ONE line of code for inventory stuff!!. In other words, they had actually started using a system where clothing items were being sold, and *supposedly* removed from the inventory table! I had no inventory code, or reports at this point, but the system was live and in use!!!

Eventually the inventory screen did start to show the “in stock” field. However, this was calculated field! If I had used the traditional approach here, then I would have to have finished all of the code to update the inventory BEFORE the customer could start using this feature (or leave the code out, but then be faced with writing a bunch of code to total up and *UPATE*/fix the existing sales and instock values to the correct value in the inventory table).

In addition, once I did add the “instock” feature...no table structure changes were required. Also, no code changes were required to the invoice screens to “update” the inventory. Further, if some bug existed the in the inventory update code, I just had to fix the code that displays the totals. I NEVER had to write a bunch of routines to update, or fix incorrect values stored....since I was not storing any values! I did actually have a few bugs..but they were a minor inconvenience , since when I fixed the “code”, I was done!...no update code had to be run!

This kind of parallel approach where the client is running the software, and the features are still being added to a “live” system is not possible with the traditional approach to storing the on-hand values.

I have lot more notes on where benefits occurred, and where some real snags occurred with using the calc approach. This issue is perhaps not enough for a book...but it certainly needs a good essay on this subject. (ie: does the calc approach reduce software cost??). In fact, this post is becoming a too long of a essay now!!!.... I going for a coffee...


Albert D. Kallal
Edmonton, Alberta Canada
Kallal@msn.com

Albert D. Kallal
Monday, September 23, 2002

I really think calculating is a good idea, I'm sure there are many parallels with using accessors vs a public field, or even leaving the byte size of primatives undefined in C. I think when programming there are always two forces at play - you want to commit to as little as possible to keep things flexible, and you want to commit to as much as possible so you can be more certain of correctness at compile time. Obviously these work against each other somewhat. I also get the sneaking suspicion that there is a zero sum game happening here, and while it seems that the little annoyances are easy enough to fix, when all of them are fixed perfectly you are back to something much like your origninal problem with a new face, perhaps on a new layer...

This is a bit OT from the thread subject, but sometimes I think we always see a 'silver bullet' just over the horizon with some new method or technology. It often promises us that we will no longer have to commit (xml, interfaces, accessors, many design patterns, downcasting, exceptions, encapsulation, mvc...), or that we can be certain our code is correct (unit testing, class signatures, typesafe delegates, contracts, emums, strict compilers, oo). One half usually undoes the other half, but over time becomes a new layer. Really when you make a class, one thing you get is you are sure of its type. But then if you downcast to something that (hopefully) is the type you really want, and throw an exception if property x isn't there -- where are you? Probably closer to where you started than is comfortable. For example I just read an article that recommended always using'EventHandler' as your delegate type, and then downcasting to 'mouseEvent
Handler'. Hey dude, don't stop there, why not pass 'object' and downcast, after all, what if someone wants to pass a tomato rather than a delegate...? It's like workarounds in Java that allow you to avoid committing to a type, vs workarounds in Javascript that allow you to be certain of a type.

On more cynical days I see everyone in the early days, dealing with problems like 8 bit to 16 bit - 'what if the cpu changes?'. Then after racking their heads, coming up with things like 'int' which doesn't commit to the size. Languages are born, problems are solved. Sure its harder to prove the code is correct as its a bit more abstract, but well worth the trade.

Trying to get things provable again, data types, oo, method signatures, contracts etc are all made to increase verifibility. You compile and you are surer than you've ever been. Until something changes and you realize things aren't very flexible this way again. Really they say, its a problem if you commit to int in your field - 'what if your data changes'? So a bunch of fancy oo techniques are born, composition comes in vouge, inheritance is out, design patterns to make rigid classes more flexible.  Problems are solved.

Your program might look like 100 chunks of unrealted stuff, all passing the buck to the other, written in 12 languages, and glued together by ever evolving standards, but is it flexible if you can understand it. But now it is almost impossible to verify again - maybe you don't even control half of it.

I'm sure eventually all these patterns and interface formats will become a form of simplified meta language, and then instead of making some class factory with 12 levels of indirection, you will simply say something like 'metaNumber x = 5;'. And then someone will notice that you really shouldn't commit to metaInteger. Why commit on this layer when you can put it off until another layer?

At one point, it might be easier to program in two columns: 1) things I'm going to commit to 2) things I'm not. Then specify then all on one sheet of paper in one spot. Accept the not things are riskier, and the commitments are less flexible.

Well, not to say I don't like calculating data, I use accessors like candy at this point in time. Just that as with human relations, its very hard to commit and still have a flexible sex life. Maybe if marriages contracts were written in xml...

Robin Debreuil
Monday, September 23, 2002

It may sound like an obvious simplification, but generally I only store what I cannot calculate in a realistic time from the data at hand in the database.

One of the problems with the original example concerning number of books on hand is that it assumes that all book transactions from day 0 are still in the database.  In most larger production systems, historical data will be archived out after a certain time period, making it impossible to calculate totals from it.

Matt
Monday, September 23, 2002

That's cool.  I just wanted to point out that this hasn't been since "the mid 90's".  Ergo, this problem is not a new problem and it has been discussed numerous times by minds a lot bigger than mine & yours. 

As for your question.  It's mental masturbation.  Its too generic.  In general, I prefer to store results.  I hate to see things calculated over & over again.  But it depends on the situation, so there is no clear answer, nor even a general rule of thumb.

jack booted thug
Monday, September 23, 2002

"does this problem go away if you update instead of calculating on the fly? I don't think so. Your problem is more about keeping historical data and it's definitely difficult!"

For sure, you can choose your problem, but either way there are things to work around - the uncertainty of a dynamic system, or the inflexibity of a static one. If you store properties, you are sure of what went on the invoice, but it is harder to be sure its up to date etc. If you calc values, you are sure its current, but its harder to be sure its an accurate snapshot. It depends if you are using the address field to decide if GST should have been applied, or to send an email The problem is usually that it is for both. I've never gotten past just patching one or the other, though I'm sure there are better ways to do these things than what I usually do.

The other thing about calculating, is you have to decide when to stop putting it off and commit. Say if you count the number of bookings rather than have a totalPassengers field - but should you use bookings as a real object, or step down and calcuate a booking based on a successful transaction? What if someone hasn't paid within a week as required, suddenly that booking has expired. You get a lot of flexibility now on the 'booked' object, because it doesn't exist either. But for the trasaction, how can you be sure the credit card object hasn't failed, or the person object hasn't died in the meantime...? etc.

It is always the ponit where you commited to something that change screws you up. Then you kick yourself for having committed to that and make a noteto use an abstraction next time. Its like interfaces protecting you from changes within your system. "Just think very carefully about the interface, because it is a giant problem if it has to change". Sure, and "think very hard about the cpu you are targeting, because its a giant problem to change from 8 to 16 bit". Shit, if I could do that, I wouldn't need a solution to it, would I. I bet there are people today who would like to make some Java code CLS compliant, but have things like uint or type sensitivity problems. Who would have saw that coming? Should they have used virtual uints then, just in case something like that came along?

At one point, change will find your sore spot. I guess that is one of the attractions with things like Latin, Cobol, and a 30 year old atm system, at least the ground is still...

Robin Debreuil
Monday, September 23, 2002

SQL Server's maintained views are an excellent hybrid approach to this sort of thing, as long as it doesn't get too far out of hand.  No idea what Oracle has.

Jason McCullough
Monday, September 23, 2002

Matthew, you said: "If you want an easy way to get the value on hand without having the work of calculating it each time you need it then create a view."

That's beside the point. The view may save you some client coding, but the calculation is still performed on the server when the view is called. The view doesn't save you from the calculation, it just saves you from looking at the bigger query in your code editor.

Troy King
Monday, September 23, 2002

Whoa there pardner.  I've been told for several years now by people a lot smarter than me that the view was only modified if the data it operated on was modified.  Therefore it doesn't really recalculate it each time the view is called.

too lazy to look it up
Monday, September 23, 2002

You: "Whoa there pardner. I've been told for several years now by people a lot smarter than me that the view was only modified if the data it operated on was modified. Therefore it doesn't really recalculate it each time the view is called. "

Then you're talking about something other than a MS SQL Server view. Some DBMS may store something called a view that is actually stored data, so I probably shouldn't have spoken up so soon. This whole group seems MS-tool oriented, though, so I might have made a bad assumption.

However, if you're talking about views in MS SQL Server, then you're talking about nothing more than a stored query statement, one that runs a query each time you call it, including any calculations. Running a view in MS SQL Server is literally the same as running the raw SELECT statement that makes it up, except that SQL Server may use a stored execution plan rather than calculating costs each time the query is run. The query and calcs themselves, however, are run every single time.

Troy King
Monday, September 23, 2002

Are SQL Server 'maintained views' much the same thing as Oracle's 'materialised views'?

Materialised Views look like views but *do* store data in the database, and are updated via various mechanisms (that I have never really had the time or inclination to try to understand).

We have found that Oracle is far too slow to do calculations on the fly, so we have had to resort to calculated summary tables in some cases.

Our preferred approach is to pull raw data out of the Oracle database and manipulate it locally. Our preferred data manipulation tool? Microsoft Visual FoxPro (!)

Les C.
Monday, September 23, 2002

>>We have found that Oracle is far too slow to do calculations on the fly, so we have had to resort to calculated summary tables in some cases.

Well, I am sure that Oracle gives the Microsoft JET engine a run for the money. I used JET in a file share mode for this application. We are talking about only 50 related tables. It is not normalized not all the way to 3nf, but a good job was done. The relations in the tables don’t generally go more than 4 or 5 levels deep.

Hence, we are talking only about 50 related tables, and recordset sizes in the tables are not that large, only in the 30,000 record range. With five users in a file/share mode (we are not using a server here), it absolutely smokes from a performance point of view. I am sure that the calc method in Oracle would work fine for a system at least 10, or 100 times larger then this little jet system.

As mentioned, I also was worried about performance here, but I finding that a true mind set has to take place...since the new hardware is so unbelievely fast..that I am continued to be amazned.

My notes on the creating of this system can be read at:

http://www.attcanada.net/~kallal.msn/Articles/fog0000000003.html

A “early” ER sample of the table layout of the system can be seen at the end of the article.

Albert D. Kallal
Edmonton, Alberta Canada
Kallal@msn.com

Albert D. Kallal
Tuesday, September 24, 2002

Les C: "Are SQL Server 'maintained views' much the same thing as Oracle's 'materialised views'?"

I've never heard of "maintained views" or anything like them available in MS SQL Server, unless this is a made-up term for some OLAP data aggregation. I won't claim to know everything about MSSQL, but I'll claim to know a lot. I taught it for 5 years and have used it daily since v4.2. Bill Vaughn used my sample set and walkthrough for bcp in his "Hitchhiker's Guide to VB and SQL Server". I was the person that found the backup/restore bug that resulted in the original MSSQL 6.5 SP5 being pulled the week of its release, and I almost routinely watch my own bug reports/repro steps/workarounds make it into KB articles. I turned down 4 opportunities for SQL books to either co-author or technically review/edit.

It is possible something like "maintained views" exists in SQL Server and I just don't know about it, but it's not likely. I know I sound like an arrogant prick, but this is my area of expertise.

Troy King
Tuesday, September 24, 2002

Albert, a small system like that would probably be pretty quick no matter what :) JET is extremely under-rated as a db engine. IMO it got its bad reputation for the same reason VB did -- it made working with it so easy that people with no knowlege of the correct way to do things did them anyway, and frequently made a mess with their efforts. (Of course we all started out that way <g>).

Troy King
Tuesday, September 24, 2002

Troy,

maybe Les C. meant to refer to the "indexed views" of SS2KEE.

http://www.microsoft.com/sql/evaluation/features/indexed.asp

http://www.dell.com/us/en/esg/topics/power_ps4q00-microsoft.htm

Just me (Sir to you)
Tuesday, September 24, 2002

Doh, that's what it is.  Can't keep the names straight.

Jason McCullough
Tuesday, September 24, 2002

For some reason I was thinking in SQL 6.5 Mode. Les C is entirely correct -- Indexed Views do "materialize" the data. Sorry for spazzing out.

Troy King
Wednesday, September 25, 2002

*  Recent Topics

*  Fog Creek Home