Fog Creek Software
Discussion Board




.Net Dataset - worth learning?

I've started working with typed datasets, but still have a lot to learn. What I'm wondering is if it's worth learning...

I can do everything I need with a hybrid of DataAdapters, typed datasets, and datatables, but I know sometimes it's kludgy. I also know that typed datasets could lend a lot of elegance and uniformity to my solutions.

The questions are:
1) Will it reduce development time?
2) Are they maintainable? (Typed datasets are *very* intricate code. Most people I've talked to just regenerate them if they make changes to the data, treating the dataset code as a black box)
3) Do they give enough extra capabilities?

So I guess- for anyone here that's really gotten to know the guts of the ADO.Net dataset, do you use typing, relations, etc. enough to consider the construct worthwhile?

Philo

Philo
Thursday, May 22, 2003

I would consider the Dataset a centerpiece of a DISCONNECTED data-driven application.  Basically an in-memory representation of a complete set of data.

Given that definition, I would answer your questions as:

1. Not necessarily.  You are simply reconstructing the database in the client computers memory.

2. They are somewhat maintainable as you can treat the dataset as an XML file, but once again I think their innate nature make them have a slightly higher maintenance cost.

3. It would depend on what you were doing with the dataset.  If you explicity need a disconnected representation of the data then yes.  If not then I would stick with the DataReader and  CommandObject.

Just my $0.00000000000000000000000000000002. 

Dave B.
Thursday, May 22, 2003

That's the piece I was missing - now I understand when it really makes sense!

Thanks for your time, Dave- much appreciated!

Philo

Philo
Thursday, May 22, 2003

I personally dislike Datasets in n tier applications because they replicate the database so closely.  Any changes to the database structure require changes to the client tier.  I am firmly in the camp of building objects to use on the client which insulate the database.  That said, I know LOTS of programmers that disagree with my view (and unfortunately I work with quite a few of them).  On the server it is faster to use other methods to get data and to update tables than dataset.

Billy Boy
Thursday, May 22, 2003

Well said Billy Boy. Passing disconnected datasets around the tiers defeats encapsulation and I've no idea why it's so popular amongst clever people!

Gwyn
Thursday, May 22, 2003

Using a dataset as the centerpiece of an application really makes it 2-tier at best. We don't use datasets at all in our application (we used to generate them for binding to the DataGrid, but found it easier to generate a collection of the objects we already had laying around instead).

In short, I don't think the being a Dataset expert is necessarily a requirement, although it's good to know what it's for, and when to use it (and when not).

Personally, I used them a lot, and now I'm to the point where I believe that they're super for a simple 2-tier mock-ups and simple apps. Beyond that, no.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, May 22, 2003

"Using a dataset as the centerpiece of an application really makes it 2-tier at best."

Huh?  Whether an application is n-tier or not has nothing to do with whether it's centered around datasets or not.  If it's a 3 tier application (i.e., client, middleware, database) then you get benefits over client-server regardless of whether datasets are used or not.

It sounds to me like you're mistakenly thinking that some kind of persistent objects or object-relational mapping are a necessary part of an n-tier system.  They aren't.  And you get the benefits of an n-tier architecture even when it's centered around datasets.  Of course you don't get the benefits of object-relational mapping, but that's an entirely different issue.

If you don't trust my opinion on the issue, check out Chapter 8 of Martin Fowler's "Patterns of Enterprise Application Architecture."  Even he says that when using .NET it makes sense to have datasets as the centerpiece of your application (they're "the default choice for this platform" because of the "extra help" the .NET tools give you).  He says that doing full O/R mapping on .NET is just as easy as in any other environment, but that it doesn't usually make sense.

Herbert Sitz
Thursday, May 22, 2003

I think you marginalized my use of the word "centerpiece" in order to make your (IMO, invalid) point.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, May 22, 2003

Brad -- Possible, I guess, but I don't think so. 

Fowler's book is basically about what patterns are useful with n-tier architecture and how they work.  Basically, he says that with the functionality you get with .NET datasets, it makes sense to base your architecture around a "table module" with "one class per table in the database".

Fowlers sets out the advantages and disadvantages of doing this, as compared with doing an O/R mapping where you're passing objects from tier to tier, not just datasets. 

Of course, we could still be talking past each other.  But I'd be curious what you think of what Fowler has to say especially in Chapters 8 and 9 of his book.  He's not necessarily against O/R mapping.  In fact, in many cases he's in favor of it.  But I'm pretty sure that his preferred patterns with .NET do use, as you would say, "datasets as a centerpiece". 

Herbert Sitz
Thursday, May 22, 2003

I use the DataAdapter and Command to fill a TableSet. I think it's the easiest way to get data. Yes, the DataReader is more efficient in io/cpu-cycles, but far inferior in coding efficiency. The TableSet provides handy methods for reading data.

I throw the TableSet away after I have read my data.

Thomas Eyde
Friday, May 23, 2003

Herbert, perhaps you should put down the book and apply a little common sense.  If a dataset is effectively a representation of a table in memory, then to all intents and purposes it *is* that table. 

One class per table doesn't mean create a class for every table that acts as a defacto table itself.  Simply wrapping a table with an object that applies no rules or logic pertaining to the application, then exposing it to the presentation layer, defeats the entire purpose of having a middle tier.  To use a dataset in such a manner necessarily means pushing the application rules into the database or presentation layer, making it a two tier application. 


Friday, May 23, 2003

Anonymous (or whoever you are) -- Below are a few short passages from Fowler's book.  I don't know where you got the idea that I was interpreting Fowler to say that a table class would only wrap a table and not do anything else (i.e., not include methods for storing, retrieving, validating data, applying business logic, etc.) 

The main issue, is seems to me, is whether you make your n-tier app using what Fowler calls a "domain model" - which uses O/R mapping and persistent objects that are passed from middle layer to presentation layer -- or whether you use something like what Fowler calls a "table module" -- which is organized around passing datasets with structures based on the tables in a relational database.

Here are a couple quotes from Fowler that may clarify what he's talking about:

    "One of the problems with the _Domain Model_ [i.e., O/R mapping] is the interface with relational databases.  In many ways this approach treats the relational database like a crazy aunt who's shut up in an attic and whom nobody wants to talk about.  As a result you often need considerable programming gymnastics to pull data in and out of the database, transforming between two different representations of data."
    "A _Table Module_ organizes domain logic with one class per table in the database, and a single instance of a class contains the various procedures that will act on the data."
    "The strength of the _Table Module_ is that it allows you to package the data and behavior together and at the same time play to the strengths of a relational database."
    "Usually you use _Table Module_ with a backing data structure that's table oriented.  The tabular data is the result of a SQL call and is held in a _record set_ that mimics a SQL table.  The _Table Module_ gives you an explicit method-based interface that acts on that data.  Grouping the behavior with the table gives you many of the benefits of encapsulation in that the behavior is close to the data it willl work on"
    "_Table Module_ is very much based on table-oriented data, so obviously it makes sense to use it when you're accessing tabular data using [a recordset].  It also puts that data structure very much in the center of the code [ed. as a 'centerpiece'?], so you also want the way you access the data structure to be fairly straightforward."

The main reason I've included these quotes is just to show that there are at least two schools of thought about how to build n-tier apps (as an earlier poster pointed out), that one of them is 'dataset-centric', and that the 'dataset-centric' method is especially appropriate when using recordsets in the .NET framework.  I got the feeling that some of the posts were suggesting that it is never appropriate to use a 'dataset-centric' approach when building n-tier apps, and I really think that idea is just flat out false.

I do have a bit of experience with n-tier apps (though none with those using a _Domain Model_).  I certainly don't think my suggestion is based on any misreading of Fowler, although I should say that Fowler certainly doesn't advocate the _Table Module_ approach as generally preferable to the _Domain Model_.  Quite the opposite, I think.  Anyway, if anyone is interested perhaps they should take a look at Fowler's book, since I don't know if we'll make any more progress here.

Herbert Sitz
Friday, May 23, 2003

Herb:

The question we responded to was about DataSets, not _Table Modules_.  From reading the quotes, it seems that these table modules would embed or derive from a dataset but would also have methods to work on the data, run validation rules etc.

This sounds better than using DataSet directly but still can cause problems.  Imagine we have a database that contains part information with pieces of information spread accross more than one table.  There is a master table that holds things like sku, description, etc and there is a locations table that holds price, aisle, etc.  If I move the description from the master table to the locations table (cause we need to localize the description -- we expanded to Holland), how many tiers of the application should I have to touch?  Obviously the database access code needs to change.  Depending on how tightly the client is bound to the database I may have to change the client.  This is why I believe that direct use of DataSets on the client is bad. 

Billy Boy
Friday, May 23, 2003

Billy Boy -- Well, I only used the term 'Table Module' because that's the name Fowler gives to one of the primary patterns in a dataset-centric n-tier app -- which I take to be an app where the objects passed between middle and presentation tier are merely datasets, not more complex objects that are part of the domain model of the application. 

You may be right that it's more difficult to make the client automatically adapt to changes in the data structure in this sort of app.  But it's not that hard.  It's no problem to make sure that all the SQL is located on the middle tier and can be changed in one spot. 

Philo's original question was about ADO.NET datasets and some of the special functionality they make available.  I got the feeling some of the posts were saying that he didn't need to bother learning any of the special .NET recordset functionality because you should _never_ build an n-tier app around datasets, anyway.  That's a pretty extreme view.

I just wanted to point out -- as you did in your original post -- that there are a lot of smart people (including Fowler) who disagree with that (i.e., they disagree with the statement that n-tier apps should _never_ have a dataset-centric architecture, while still leaving open the possibility that for some n-tier apps it might be a bad choice).  In fact, Fowler suggests that the special functionality of .NET datasets makes it especially apropriate to make your n-tier architecture 'dataset-centric'.  How exactly, I have no idea, since my n-tier experience is in the Delphi world.

Herbert Sitz
Friday, May 23, 2003

I think the point brad is trying to make is (and it's an old problem, .net or not): We end up having to write this sort of thing:

<% Do While Not rs.eof %>
<tr>
<td><%= rs("something") %></td>
<td><%= rs("somethingelse, what was it called, better check the stored procedure hasnt changed") %></td>
<td><% if rs("this") <> "" then %><%= rs("this") %><% else %><%= rs("theother") %><% end if %>
<%
rs.movenext
Loop
%>

Wheras what middle tier business logic components promised us was:

<% for each thing in things %>
<td><%= thing.this %></td>
<td><%= thing.that %></td>
<td><%= thing.theother %></td>
<% next %>

But it never happens - someone always thinks 'if only i had the recordset...' and suddenly a method to send it through to the presentation layer appears (damn such follies as interfaces).

Also bear in mind I've only mentioned reading data here...

Basically, it's a nightmare.

Basil Brush
Friday, May 23, 2003

Herbet is still clutching onto his book of revelations, waiting for Fowler to show him the path to true enlightenment.

Proverbs 14:7 - Leave the presence of a fool, for there you do not meet words of knowledge.

Bye..


Friday, May 23, 2003

My view -- and you can feel free to disagree -- is that if you directly expose the artifacts of the data layer (i.e., DataSets), you haven't sufficiently isolated yourself from it. Now, for some people (and some apps), this is okay, because there's never going to be anything but a tabular, relational data model.

For me, personally, that's not n-tier. That's 2-tier. The presentation layer is using data layer artifacts, regardless of whether you've tacked code onto said artifacts or not.

Also, personally, argument-by-authority doesn't hold a lot of weight. Every idea must be consider, evaluated, and judged on the merits of the idea, not on the merits of the speaker.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, May 23, 2003

Philo:

Introduction:
We have gained extensive experience over the course of the last eight months using datasets to create the 1st of two modules of a Winforms client/server "smart client" application. We have now decided to abandon datasets altogether for the 2nd module.

If you're using datasets in a Web service or in a Web application, I doubt you will experience the problems that we did. For a Winforms application using binding, you may run into the problems we did.

I'll explain further below. Disclaimer: It's possible our problems were related specifically to the level of control we wanted over the data in the datasets. Others may not encounter the same issues, and perhaps spending more time would have allowed us to figure out how to work some of the "automagic" stuff datasets do.

Our Experience:
Early on, when building the straightforward areas of the first module, our data needs were relatively straightforward, and the datasets worked out pretty well. An example of one part of the module is a "listing/detail" form that lists subcontractors on the listing tab and the detail for one subcontractor on the detail tab.

Benefits of Datasets:
Some of the benefits we found were:

- Storing multiple versions of the data. If the user wanted to cancel an editing session and "undo" the edits, we could simply roll back their changes to a different version.
- Forms that had parent-child elements, such as a grid on a form showing the contacts for a company, were easy because of the relations the dataset had built-in. As the user navigated between "parent" rows, the contacts grid would "automagically" show the child contacts.
- The Framework supports easy loading and saving of the datasets.

Drawbacks of Datasets:
These benefits were overshadowed by the drawbacks. Now, many of these could have been due to our misuse of datasets, but overall, we found that as the functionality became more complex, the datasets became a hindrance as opposed to a help.

Here are some of the issues we found, off the top of my head (there are certainly more), in order of how much time and effort the cost us:

- Events. The eventing system built into the dataset/datatable framework was our number one problem. Once you bind a dataset to UI elements in a form, the events that inform the controls that the list has changed become extremely difficult to control. Our custom sorting and filtering scheme (explained below) requires us to modify many rows in the dataset without reloading it from the database.

When we did this, we experienced a massive slowdown because as we modified each row, events were fired to all bound controls, causing them to repaint themselves. Eventually, we figured out that we could simply disconnect all of the bindings while we did our operations on the underlying rows, but we still saw a significant slowdown due to the fact that for some reason, .NET added OVER 50 indexes to the datatable.

We spent about two weeks trying to figure out why before giving up. All but about 2 of the indexes were identical, but we couldn't figure out what created them, even after decompiling all of the source code for DataTable.

- Filtering. The default behavior of a dataview when a filter is applied is to apply the filter to the list every time a row is changed. If the user changes a column that removes the row from the filtered set, the row immediately disappears as soon as you do an EndEdit. Now, this wasn't the END of the world, until we started working on some more complex areas, which required complex calculations to be performed as the user modified information in the row.

One of the more frustrating aspects of the lack of control we had with repsect to binding was the fact that no on-screen controls get updated with changed column information unless an EndEdit is performed on the row. And yes, you guessed it. We fired off an EndEdit once our calculation was performed so the text box got updated on-screen, and POOF, the row would disappear right out from under the user if the row didn't match the current filter criteria.

To solve this problem we created a column in the schema for the primary table that isn't in the database, called "isInFilteredSet." We roll through the rows behind the scenes and set this column each time the filter is changed. Then, we filter the dataview on this column. This allows us to control when the row gets filtered out. The problem is the performance issue outlined above.

- Sorting.  Same problem as with filtering. This time, it would be quite disconcerting if the user was editing a row in the list, enters a number the fires a calculation, and the row got reordered.  We added a "sortOrder" column which we set when certain UI events occur that should re-sort the list (such as when the user refilters.)

- Can't build "smart" datarows. It would have been much easier for us to build some of the complex behavior required by the system if we had the ability to add behavior directly to a datarow, since then we could treat them as "entities." Unfortunately, we don't have the ability to do this.

What we could have done is wrapped our own class around a datarow, but then we wouldn't be able to have the system load up the datatables from the database. At that stage of the game, it makes more sense to build our own collection classes, implement IBindingList, and bind our own damned list to the grid!

Summary:
After a 2-week post mortem on phase one of our project, we have come up with the following conclusions:

1. For our needs, datasets don't give us the required level of control.
2. Since the functionality in the 2nd half of the application is much more complex than the 1st half, we feel that it will become even more difficult to work with datasets.
3. Our problem domain for the 2nd half lends itself very well to an object-oriented approach. We have to deal with hierarchical data with calculations that must occur up and down the hierarchy behind-the-scenes. Datasets forces the business logic layer to be too "relational," making it very difficult to accomplish the required tasks in an OO manner. Basically, if we stuck with datasets, we would have to do O-R mapping in the problem domain, which we deem too difficult. We should have no knowledge of the fact that the data is stored in a relational manner when we're working on the business layer.

Conclusion:
We are abandoning datasets for our core application data, and may not use them at all from this point forward except perhaps for ancillary supporting data. We are going with an OO approach, using a domain model, using ideas very similar to the "business entities" and "data access logic components" approach outlined by Microsoft in the following paper:

Designing Data Tier Components and Passing Data Through Tiers
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/BOAGag.asp

If you have any specific questions about our experience, feel free to drop me an e-mail. We'd love to be able to help out anyone else--no sense having others pulling out all of their hair for weeks like we did.

Dave
Saturday, May 24, 2003

*  Recent Topics

*  Fog Creek Home