Fog Creek Software
Discussion Board




Use of Bitmask appropriate in DB?

I am currently using a bitmask in my Access database to eliminate an intersection table.  As a simplified example let's suppose that I have two tables (I'm not actually dealing with appointments here):

tblAppointment:
- iDays

tblDays:
- sName
- iBitMask

Let's say that an appointment was scheduled on Monday and Friday, my program would build the appropriate value using the bitmasks from tblDays and store that value in tblAppointment field iDays.  The work is done in code and all is well. (Jet SQL does not support bitwise operators.)

The possibly more "normal" DB way to do this would be to use an intersection table.

tblAppointment
tblDays
tblApptDays

Using this approach you can simply use an SQL statement to do the work.

I chose the Bitmask because it prevents the size of the Access file from growing when appointments are rescheduled.  Using the intersection table method records are deleted from the file and the file has to be compacted quite often whereas using the bitmask method the file remains relatively stable in size.  In the real application there may be 500 - 1000 deletions or additions per week.

Any thoughts on how to improve this using the intersection table.  The only limitation I have run into using the bitmask  method is the number of "days" you can have; limited by the bit count in a long int.

Mountain Dew
Monday, February 02, 2004

As much as possible, data in a database should be "complete". That is, one should not need extra information to interpret it.

The problem with a bitmask is that what each bit means is somewhere else (it's not in the data).

njkayaker
Monday, February 02, 2004

"Glossary:
First Normal Form
Definition: A relation is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic. More simply, to be in 1NF, each column must contain only a single value and each row must contain the same columns."

It's been my experience that in all but a very few circumstances, it is best to normalize data.

anon
Monday, February 02, 2004

1. Are the performance improvements enough to justify the (perhaps slight) added inconvenience/inelegance?  Usually the answer is "no" but there are a lot of situations where "yes" is a perfectly valid answer.  :)

1a. If performance is criticial, why use Access?  (and not SQL or whatever)  I guess it's a budget thing?

John Rose
Monday, February 02, 2004

>I chose the Bitmask because it prevents the size of the Access file from growing when appointments are rescheduled.

For sure, any one using JET should avoid deleting, and then creating records. That deleted record space is not recovered. Of course, this problem exists for most pc based database systems. Even FoxPro had the pack command.

It is strange that most developers don’t seem to make this issue a VERY important design priority. So, your goal, or question here is a most fair, and most important question. In fact, it is critical to the reduced maintenance of your application. So, your goal is a normal concern that any good developer should have!

However, I think the additional work, programming, and lack of a ability to use straight SQL means that I would NOT use the bitmap idea. Further, extending the data model, or simply modifying it will become very difficult when you have a bunch of bits in a field that represents your data model. Using bitmaps like that is not going to give  you much in the way of design flexibility.

A booking record should look something like:

    BookingStart  (date time) or just date
    BookingEnd.  (date time) or just date
              BookingInterval. (may not be needed)
              TimeOfDate      (only needed if we don’t use datetime above)
              TypeOfBooking etc. etc. etc
            Cust_id

Etc. etc. etc.

Hum...is this discussion is obviously being carried over from the other recent thread here?  I have written more reservation software then I can even remember at this point.

By only storing the booking start, and end date, then you can far more easily use standard SQL to grab and manage your record collisions.

Further, to change the above booking, I DO NOT have to delete a bunch of records! Worse, is your code to maintain, and update and delete, and create those records is going be come VERY HARD to maintain. Just the code to change the start date is going to be quite messy, as you have to now start adding more records to the booking table (or, if you are moving the start date forward...you are now deleting records!). With the above, I don’t’ have to add, or delete records to change a booking range.

Further, your UI code will be more difficult to write, since when a user has to move, or change a booking, a LOT of code will be required to change those bookings. Further, you are also dealing with the issue (or problem) of not wanting a high rate of deleting in that booking file. The above design  solves both of these problems.

By storing the intervals and NOT a record (or bit) for each slot of that interval, you eliminate all that code that has to update and delete those interval/slot records. You are much better off to check for collisions by using a date range and a query then having to “loop” through a bunch of records that represents the slots. Further, buy requiring those slots, you again have to write fairly complex code to add, or create those slots also! In other words, you have to write lots of code to make the slots that you book into also!

This concept of resource management extends to just about everything you can write concerning resource management.

For example lets have a bus with a capacity 46 seats. Often the temptation for the developer is to write out 46 records to a database that represents those seats/slots available.  I would NOT do that. But only have ONE record that says 46 seats are available. In fact, I don’t EVER even store how many seats of the bus are used!

So, I am suggesting that for both the bookings, and booking resource, you do NOT write out slots, but make everything based on the ranges entered. This also means that you can take resources off line MUCH easier. (often, you wind up adding a fake booking record to remove available for certain times).

This approach means that a bus capacity can be changed with great ease. All of a sudden, we might decide to use a 55 seat bus in place of that 46 seat bus. With my design, you simply edit the ONE capacity record for that buss available and change the 46 to 55. We are done! How simple can you get! We do not write out a whole bunch of records here! In fact, we don’t even have to write code to FIGURE OUT and calculate how many records are needed to change the capacity from 46 to 55 (ok..9 records...but that is still code we don’t write!). Even more problematic is if you decrease the bus size. To decrease the bus size you how have to search those 46 records for spots that are free and CAN BE deleted. All of a sudden a VERY simple problem starts to become very complex from a coding point of view.  The huge difference in code required in this simple bus example is due to NOT storing those slots...but only a value that represent the resource we are dealing with. Thus, when reducing the bus size I simply check/make sure that when reducing bus size, it is not reduced below the current number of people booked on that bus. Sure, that is a sql query, but then at least I don’t “search” or write code for WHICH free records to delete.

I will say that the above concepts and designs are more difficult at first, but result in a HUGE savings in the amount of code required. So, sure, having a bunch of records with the slots makes checking the one slot real easy, but the code and UI to manage those slots is MUCH harder to write and maintain! So, the non slot approach generally translates into a MUCH better UI for the user. The application will also generally be more flexibility as it is much easier to change/add/remove resources that you are trying to book into.

In my bus example, it is a piece of cake to increase the bus size.

The same goes for your bookings. You can move, or extend, or change a booking, but not have to deal will a large number of deletions or even additions to the booking table. The code to add new availability will also be much easier in the long run. I have also found that the design tends to run faster since the tables have MUCH less data in them. And, the UI code to update stuff also runs faster as once again the number of records being dealt with is very low.

So, if you want to eliminate the large number of deletions...then don’t store the slots.

You can read my notes and comments on the design of my reservation software at:

http://www.attcanada.net/%7ekallal.msn/Articles/fog0000000003.html

I even show most of my table designs at the end of the above doc.

I also explain the use of the class objects used in that reservation software here:

http://www.attcanada.net/%7ekallal.msn/Articles/WhyClass.html

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Monday, February 02, 2004

I just implemented two bitmask fields in a table as the simplest way I could think of to track 1) which facilities of three will be charged, and 2) which departments of 14 will be charged.

About the only drawback I could see was what another poster pointed out: you're making the value of the field opaque.  In my case, the object I built to reflect the record outside of the database held class constants to reflect the value of each bit, so the minor separation was worth it in terms of simplicity of implementation and saved storage space.

Justin Johnson
Monday, February 02, 2004

There are some problems with BitMasks even if they don’t violate 1NF (http://www.dbdebunk.com/page/page/622161.htm) the biggest of which is lack of declarative integrity constraints as shown in the above article.

MR
Monday, February 02, 2004

Albert, THANKS for the detailed EXAMPLE.  What do you SUGGEST for recurring EVENTS?  For example something SIMILAR to entering in a RECURRING event in Microsoft Outlook?  My FIRST stab would BE to go ahead AND run the algorithm to calculate the exact occurrences and POPULATE the database with THE event dates.

GuyINCOGNITO
Monday, February 02, 2004

>My FIRST stab would BE to go ahead AND run the algorithm to calculate the exact occurrences and POPULATE the database with THE event dates.

To be fair, I can’t really shoot the above idea of being wrong. I guess the real question is how often, or how easy does it need to be to change the above date range?

While I gave a good number of examples as to why not storing each slot, but using single record with the date range is better, there is some disadvantages to this range approach. I not saying that I sugar coated anything in regards to my examples, but there is always trade off in designs. These subtle lessons of the trade off are going to vary from project to project. However, while I am sold on only storing the date range, there are some drawbacks such as:

--  You might have less code, but often SOME of that existing code is more complex, or more difficult to write. Note I said SOME, as most of the time, things are easier. For example, to remove a booking that lasts a week, you only are deleting the one record.

-- Since often records don’t exist for date ranges, then the creating and writing of reports or even screen booking displays can be more difficult to write since you don’t have data to report on!

-- In addition, while much of the increased complexity of using ranges can be dealt with by using class objects, the problem is that sql does NOT work with these objects. The use of objects to attack, slice and dice down the complexity of the “coding problem” down to manageable levels is great. However, those cool class objects don’t work with sql! Kind of a kiss without the hug.

So, do I think you should try attempt to make a range work much like outlook?

    Yes, I most certainly do.

The range idea still does remove the need to write out those records. However, some tradeoffs will occur in others areas of the application. Only when ALL of the factors are taken into account can you really jump on the range idea.

The best answer is going to depend on the scope of the project, and how often those repeating bookings are to occur. Perhaps most bookings are single events, or NON repeating anyway. Also, some issues of the type of UI you have will also effect if I would use ranges vs writing out all the records.

So, using just a range type approach is usually more flexible, and usually less code, the design is also more complex also!

If you have a working design now, then changing it now probably is not such a good idea.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Monday, February 02, 2004

I still haven't seen anyone address this issue -

If I'm involved in a scheduling/appointment/booking system, there is one fundamental question:
- Do we need to determine schedule conflicts?

It's nice, because the answer is always "yes." If the design specs say "no" the answer is still "yes." If the client swears up and down there is no way that will ever be necessary, the answer is still "yes"

Because to a client it appears to be a very simple change request to make, yet as seen in the two scheduling threads, it can affect the very design of the application.

If every appointment is designed with a "start" and "end" date/time, I can find schedule conflicts with a single SQL statement. Just about any other design is going to require iterating through the database. And since the logical requirement is "check for conflicts when creating a new appointment" then we're talking about iterating through the entire database for every new record.

That just plain sucks.

So you would be very, very, very hard-pressed to sell me on any design besides "one record per appointment, with a start time and an end time"

Philo

Philo
Monday, February 02, 2004

I think this thread has gone off on a tangent and some of the posts are not applicable to the OP's problem.

Given that you are using the Jet Engine, which does not support bitwise operators, I believe you have made a logical choice based on the fact that you do not want the size of the Access file to increase.

The OP said they were NOT dealing with appointments and simply gave them as an example.

======================================

Most decisions like these come down to weighing the facts on both sides of the argument and basing your decision on these facts and what requirments you have.

Using a bit field will:

Positive:

1. Eliminate an intersection table that would, over time,  through the addition and deletion of records, unreasonably increase the size of the database file.

2. Allow Jet Engine SQL statements to be built dynamically in code, which works around the Jet Engine limitation of not supporting the bitwise operators.

3. Upgrading the database to SQL Server would allow for the elimination of the code because SQL Server supports bitwise AND/OR.

Negative:

1. Force the use of code to determine if a bit is set.

2. Limits the number of entities which can be represented to the size of the bit field.

=====================================

Using an intersection table will:

Positive:

1. Provide ease of use through SQL statements for selection, insertion and deletion.

2. Enforce data integrity through the database constraints.

3. No additional code is involved except the SQL statement.

4. There is no limit (except database table size limit) to the number of entities that can be represented.

5. No need to worry about upgrades.

Negative:

1. The size of the database will expand when records are added and subsequetly deleted.  This will cause the database to have to be compacted.

2. Two insert statements may be required to add a record.  One for the original table and one for the intersection table.

My $0.02.

Dave B.
Monday, February 02, 2004

Another reason against bit fields would be no data integrity through the database.

As for the two insert statements being a negative, this would probably not apply to a database that supported transactions, I'm not certain if Access does or not.  I wouldn't think it would, unless the ADO provider does.

Dave B.
Monday, February 02, 2004

GuyINCOGNITO, your capitalization mocking has me busting a gut laughing. Brilliant. Absolutely brilliant.

Dennis Forbes
Monday, February 02, 2004

*  Recent Topics

*  Fog Creek Home