Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

.NET Web Application Architecture

Hey guys, i'm invovled in a dotnet web app, and I'd really like to do it the "right" way.  The problem is, I don't have a clue what that is.  I've looked through ASP.NET's site, and it seems like their stupid example applications don't even follow what they're preaching.  The SDK on ADO.NET is reinforcing using DataSets, light touch, yet their sample applications use plain old Data Objects with STORED PROCEDURES (yuck).  Am I suposed to use a collection of DataSets in addition to custom Data Objects? If anyone has experience on how to build a scalable and clean data layer, please let me know. 

Vincent Marquez
Wednesday, March 26, 2003

What is your problem with stored procedures ?

Damian
Wednesday, March 26, 2003

Stored procedures are great until you have to modify them.  They don't allow you to dynamicly modify the structure of a query, which causes all sorts of problems down the line.  Too many times i've written a stored procedures, just to have a request to add another join, or select another column under certain circumstances.  If all your queries are in strored procedures, prepare to do some messy "if begin"s.  If its in your data layer, do some string concatination :-)

 

Vincent Marquez
Wednesday, March 26, 2003

I've written self-modifying queries with T-SQL, but it's really not all that fun. :)

Stored procedures are a potentially important performance optimization, but one that's all-too-often done whether it's necessary or not. There's something to be said for being database agnostic and using parameterized queries in ADO.NET, which against some databases are very close in performance to stored procedures.

Brad (dotnetguy.techieswithcats.com)
Wednesday, March 26, 2003

I hate having to write dynamic SQL queries in stored procedures as well, but don't find that I have to very often. Apart from dynamic queries, I have found stored procedures much easier to work with. I wasn't a huge fan when I first started using them but now I have hate having to embed SQL queries in my code (usually for Access databases).

Ben
Thursday, March 27, 2003

I know you CAN write dynamic queries in stored procedures, but i'm pretty sure that takes away the whole advantage of pre-mapping and knowing how to optimize the query.  I realize this is a religious arguement, and for everyone that agrees with me, someone will probably think i'm an idiot :-) .

Any ideas on the Architecture?  I'm thinking of creating my own DataTable (using my custom dataobjects) by having each "data class" inherit from datarow.... Any other suggestions that allow for Object Oriented design practices, but still allow for the neat datagrid, paging, sorting, merging funtionalities?

Vincent Marquez
Thursday, March 27, 2003

"...I'm thinking of creating my own DataTable (using my custom dataobjects) by having each "data class" inherit from datarow"

-- oh per-lease! - do you normally reinvent wheels? ;-)

We use DataTables and, occasionally DataSets. Remember, DataTables can be created independantly of a DataSet. They have nice features such as filtering, sorting, calculated columns etc.

You can create your own strongly-typed DataSets/Tables using the XSD designer which then generates a DataSet subclass with all of your table names and column names. Nice "syntactic sugar", but we don't use it because a) the database schema is in a state of flux, and b) we can't be bothered to create a DataSet for the results of every partcular query or procedure.

We have a class that has a buch of static methods called things like: ExecuteTable() that executes some sql/proc and returns a table; likewise ExecuteDataSet(), ExecuteRow(), ExecuteScalar(), ExecuteNonQuery() etc, etc which are simply wrappers around various SqlDataAdapter calls. Microsoft have somethifb similar called the "Data Appicaton Blocks" or something which you can download from MSDN.

In addition, when the data we want is best represented in a form other than a flat table we use SQL Servers FOR XML AUTO and EXPLICIT to return more structured data. Our ExecuteXmlDoc() method gets us an XmlDocument which we then drill into using SelectNodes() etc. We have our own System.Web.UI.Page subclass which has various methods to help us with Binding DataTables and XmlNodes (kind of a wrapper around DataBinder.Eval()). We don't need to use XSLT because of this (thank goodness).

HTH

Duncan Smart
Thursday, March 27, 2003

If you can't pre-plan your data structure and from that your data I/O (i.e. your stored procedures), then you likely haven't properly planned your project at the forefront. Realistically most data bound applications can finish the database design first with a layer of stored procedures as the I/O to the database, and then write front ends around it. The database layer outlasts the majority of front ends. There are a variety of very important security reasons for constraining access to the database to stored procedures, though most of those should be long mastered so I won't delve into them.

P.S. Dynamic SQL is generally a security nightmare.

Dennis Forbes
Saturday, March 29, 2003

Dennis, your answer is great but I think someone forgot to tell you that 2 and 3 tier programming got a little old.  We have something called object oriented programming now, and N-tier, meaning there is more then just a "front end" and an "I/O" to the database.  That was the case back in asp 2.0 though.  Most programmers see the need for a middle tier between the database and front end, and thats precisely what I was talking about.  Also, when doing comlex database manipulation and reporting, dynamic queries arn't a style, they are a neccessity. 

Vincent Marquez
Saturday, March 29, 2003

Check out one of Microsoft's "patterns & practices" papers called "Designing Data Tier Components and Passing Data Through Tiers".  It discusses 5 different approaches you can take to implementing your data objects.
http://msdn.microsoft.com/library/en-us/dnbda/html/BOAGag.asp

or as a PDF: http://www.microsoft.com/downloads/release.asp?ReleaseID=44269

Jimmy Nilsson also has an interesting discussion on "Choosing Data Containters for .NET":

Part 1: http://www.informit.com/isapi/product_id~{433049EB-8868-4878-AAAB-45A66715DA4B}/content/articlex.asp

Part 2: http://www.informit.com/isapi/product_id~%7B7D2E70F8-7BA7-4608-A822-A7CDBA43C4D0%7D/content/index.asp

I'm eagerly looking forward to future parts of his discussion.

If you are going to roll your own, you would do well to check out Martin Fowler's latest book "Patterns of Enterprise Application Architecture", which covers issues that are relevant to this, with coverage on how the .NET Framework maps to these patterns.

The available sample chapter from the book is quite relevant to what you're doing; "Chapter 3: Mapping to Relational Databases":
http://www.aw.com/samplechapter/0321127420.pdf

The "Table Module" pattern is basically a .NET DataTable/DataSet.  He discusses alternatives to this, and when you would use which.  For a quick reference of all the patterns, see http://www.martinfowler.com/eaaCatalog/

I too wish there were some definitive guidance on the best way to implement data functionality, but from what I've read so far, it seems that the best answer is currently "it depends."

ODN
Saturday, March 29, 2003

You should read Joel's Architectural Astronaut article Vincent, as you clearly don't have a clue what you're doing and instead talk about the "right" way without knowing why.

The foundation of any data bound application is a good database with a secure I/O layer: ___VERY___ few enterprise applications make use of dynamic SQL, and many that do are the result of pure unadulterated ignorance, and someone who read the latest "VB for dummies" manual.

Dennis Forbes
Sunday, March 30, 2003

As a sidenote, Vincent, next time you should plan your vitriolic arguments a little better: What you described is called "3-tier" (and none of this has ANYTHING to do with "object oriented", although if you just learned that phrase I appreciate that you might want to use it in public), although apparently it's too 01d sk001 for you, and it in no way removes the need to have a properly designed database with proper database-level I/O (although I realize that you likely are a MS Access programmer so you find this difficult to comprehend). Secondly, the larger and more complex a database is, the LESS likely that one resorts to dynamic SQL.

Dennis Forbes
Sunday, March 30, 2003

Speaking of idiots, I worked with one astronaut who assauged himself that his elite Visual Basic skills merited great kudos from those around him, and one of his remarkably hilarious exploits was to make his "middleware" 'object oriented' by foresaking using distinct stored procedures, but instead he had a stored procedure that returned ALL rows in the table, and he then made use of the ADO filter property (a post recordset retrieval operation) to prune the resultset in a "object oriented" manner. I literally choked on my lunch when I saw that unbelievable example of ignorance (loading a client cursor with hundreds of thousands of records, forcing SQL Server to do a rowscan...to somehow put the load on the middleware component. There was no logic or reason to it). There are far too many idiots in this profession.

Dennis Forbes
Sunday, March 30, 2003

Dennis, you do realize that your first post on this thread sounds like you're advocating a 2-tier design.

And Vincent, you realize that your response sounds like you're bashing 3-tier designs just because calling them "N-tier" makes you feel smarter, when for the purposes of our discussion there is no need to make a distinction.  You basically want to know what is the best approach to take when you have a middle tier.

By the way, this question applies to any >=3-tier architecture, not just web application architecture.

With integrated query design support in VS.NET, I see no real reason to write your query strings in code vs. in stored procedures.  Stored procedures are no less flexible than in-code query string construction in the VS.NET environment.  Plus there's the advantage of design-time error checking when you write them in the query designer.

Dennis, I don't believe Vincent is an idiot.  From other posts he seems quite intelligent, but young.  He isn't yet a believer in the need to be protected from himself; he still feels invincible because he's quite intelligent.  Eventually he'll be bitten often enough by conditionally concatenated SQL strings that he'll prefer stored procedures, but it may take some time for him to get there.

ODN
Sunday, March 30, 2003

I agree, and honestly my reply was more focused at some peers who I've had vigorous debates with moreso than Vincent (he just happened to remind me of them). However I definitely disagree that a proper database architecture (which is a database which is blackboxed and secured through a stored procedure I/O interface. I will stand by the fact that most database designs far outlast whatever the latest fad in development is) equals 2 tier: It does insofar that one thinks development stops at the demarcation between database and whatever the middle or front-end is, however that is not the case--One can cluster on whatever number of "n" tiers they'd like, doing whatever middleware processing they'd like, but the database remains secure and functionally robust.

The whole x-tier argument is so convoluted now in any case. i.e. If Johnny makes a application that has a front end calling middleware .NET remoted components which does dynamic SQL and call SQL Server, and Bobby has a front end which calls stored procedures which process the data, do the number of tiers differ? Functionally and logically they do not, and scalability they do not either (given that via linked servers and clustered servers an infinite nubmer of scalability is open), but it's amazing how many times people will proudly proclaim that there is a major philisophical difference when there isn't in the slightest (there is a database vendor abstraction in some cases, however often it just moves the programming up the stream a bit. i.e. it saves no time in the long run and merely increases maintenance costs). I am definitely pro "n"-tier, however often the functionally reasons for the tiers is lost on people.

Dennis Forbes
Sunday, March 30, 2003

Dennis,  a couple things:

I agree that its important to have a single "abstraction" (or I/O as you refer to them) between the database and the App.  Usually, I do this with a static class that holds all of my SQL queries and either returns DataSets, or populates Objects.  In fact, in J2EE, queries are all OVER the place (in each Entity bean) (and if there are queries at all, CMP seems more and more popular), and is used quite often. 

No, i'm not an access programmer, but I realize how my overly agressive post along with a lack of information probably made it seem that way.  I work mostly with MS Sql (and MySql).  As for larger and more complex databases not needing dynamic SQL, I do happen to write stored procedures for a large car company.  Only 10 million or so rows in some tables.  We also employ a couple verticle tables.  Explain how, exactly, can you do reporting on a verticle table without dynamic SQL?

ODN: I apreciate you acting as referee; some of my comments should have been made more clear, as you pointed out.
  I brought up the N-tier terminology because classic ASP was pushed to be developed as 3 tier. (1 being ASP, 2 being stored procs for the business logic, and 3 being the database).  There was no need for Entity objects.  I was under the assumption, that an "N-tier" type architecture promoted the use of "Entity" type objects. 

Vincent Marquez
Sunday, March 30, 2003

Ah, gotcha.  I didn't realize that form of ASP development was promoted as 3-tier.  That explains the need to distinguish it from an object-based middle tier.

I found the Microsoft whitepaper I referenced above very interesting in that it discusses five different ways of implementing "entity" type objects.  I'm still sitting on the fence about the best way to do this, but I currently prefer my entity objects to act as wrappers around a collection of stored procedures.  It's mind-numbing to code, but it's the simplest to work with once it's done.  The choice of in-memory data structures is the interesting part, and that's what I'm still on the fence about.

ODN
Sunday, March 30, 2003

"Stored procedures are no less flexible than in-code query string construction in the VS.NET environment."

They do lose flexibility in one area that some people do care around: they tie you to a specific database. Some applications need to be database agnostic, thus the need to limit yourself to a reasable SQL dialect and use queries in code instead of stored procedures. It all depends on the way you intend to market and sell your software.

And, for what it's worth, stored procedures are NOT a performance optimization on some databases. In the case of SQL Server 2000, for example, the only optimization you get is no first-execution-compilation. All SQL code -- including dynamic and parameterized SQL, as well as stored procs -- is compiled and follows the same execution path. In the case of parameterized SQL, where the query doesn't change, the previous compilation of the query is used the next time it's executed. It's a bit like the JIT in .NET in that sense.

Brad (dotnetguy.techieswithcats.com)
Monday, March 31, 2003

Also implementing a business logic in stored procedures reduces application scalability.
Actually, the question was about application architecture, not about pros and cons of stored procedures. IMHO, dividing applications on layers should be the basic principle of any (well, OK, of the majority) project.
So, I suppose it would be more useful to tell which layers people pick out in their common work (and a short description).

Let me start:
1. Data - tables & constraints in some DBMS
2. Data access layer - set of componens that interact with DBMS and converts records to Domain objects and vice versa.
3. Domain objects - components that correspond to domain concepts
4. Business layer - components that process Domain objects to perform some operation and use Data access components to store results in database.
5. Presentation layer - components for displaying Domain object's data and interactively change it.

Does anyone use something else?

Mikhail Andronov
Monday, March 31, 2003

Nope, that pretty much describes my current project exactly.

And, in reality, we don't store data in the database, but rather serialized objects. Most relationships are handled by the object model (an acyclic graph with shortcut and many-to-many relationship support). This model maps much better to OO than relational tables in a database, in my experience, and we're fortunate enough to have an appropriate application for it (some, perhaps most, wouldn't be able to use a system like ours).

Brad (dotnetguy.techieswithcats.com)
Monday, March 31, 2003

Oh, and no stored procedures. ;)

Brad (dotnetguy.techieswithcats.com)
Monday, March 31, 2003

hey, all right, some people agree with me on the "no stored procedure" policy I enforce at work.  :-)
The reason I'm asking in this article, is it seems that microsoft is trying to get rid of the "Business Object" tier, and have us directly interacting with DataSets, which seem more like a data access layer type thing.  So now i'm having to re-invent the wheel to make sure my Business objects work cleanly with datagrid. :-( 
The original post was to see if others have dealt with a similar problem.

Vincent Marquez
Monday, March 31, 2003

I don't think a DataSet is an alternative for a business objects. I don't use DataSet now, but as I could understand it DataSet is a data storage for disconnected scenarios.
Hey, stop. It seem to me that DataSet can be used like a new layer. Between Data and Data access. This way you are able to abstract from exact database type and it's accessibility (wheater or not it is connected)
What do you think?

Mikhail Andronov
Monday, March 31, 2003

One of the fundamental problems I've seen in the forsaking of stored procedures and other trappings of a modern RDBMS is the "everything looks like a nail" problem: Teach someone something, be it object oriented programming, aspect oriented programming, etc, and they then look to modify every problem in a light that will fit their solution, rather than modifying their solution to fit the problem. This is astoundingly common in software development, and people will often use weak and unsupported reasonings to justify whatever the whim of the week is (for instance a common justification for the enormously wasteful, of both processing power and maintenance time, "data objects"--A set of objects which call the database and return a recordset in most cases, with astronomically costly proxying and the corresponding vast reduction in scalability [although its proponents will often claim the opposite]--is "database abstraction". Yeah, that's what OLEDB is, thanks. Data objects where a philosophy created when every database had its own specialized drivers and API and hence the abstraction was beneficial, but it is fascinating seeing the rote repetition of this tired and obsolete argument as if it's the inventing of something new). I mentioned in a prior post where a coworker decided that the database should be a basic vessel where he stores and retrieves vast numbers of recordsets, and because ADO offers that crazy "filter" property, well damnit he just had to use it! It just had to be great if it's there, and it makes his component more "object oriented" because it has less complex interactions with the database. Why bother making specialized search stored procedures when he could just pull the whole recordset and do a filter on it? (This was justified, btw, under the pretense that it "increased scalability" by "reducing the load on the server". For anyone with an ounce of knowledge of database servers, you know that the reality is the exact opposite, and instead it was a massively overbloated middle tier with a database server struggling to server up table scan after table scan). The problem is that ADO client-side cursors, like Datasets, and many variants, are very weak client side "databases" of sorts: They do not have the domain knowledge and abilities that a proper database with appropriate indexing has. I've seen this quite commonly where people basically end up making their own RDBMS in code, and the results are very, very ugly.

Dennis Forbes
Tuesday, April 01, 2003

Dennis,

I completely agree with you that too many people try to offload the database and put processing power onto the application servers when it is un-neccessary, but ther are also times when it is better.  Databases should be used for what they are good at: Searching through massive amounts of data. To do searching, filtering, and sorting when you can do it in a databae is redundant and not efficient. 

On the other hand, most enterprise apps only have 1 database, and can only HAVE 1 database (unless you want to get into bi-directional mirroring and that scares me). You can plug and play as many app servers as you want.  For 2k, I can double the proccessing power of my application by buying another rackmount from dell.    For somethings, it is GOOD to have entity objects on a middle tier.  This is what I was frustrated with in the first place.  I can do hundreds of more things with an object representation of a row, compared to a recordset (or datatable).  I can serielizet it, I can easily share that data accross multiple App servers, all kinds of cool stuff.  This is the way almost all enterprise applications are done. 

Vincent Marquez
Wednesday, April 02, 2003

I think that everything in this world need a common sense. I think that every component in application should do the work it can do the best way.
About data objects: it seem to me, that they aren't database abstraction. OOAD assumes that you create appliations in terms of domain objects, not data structures.
And speed is not the only application criteria.

Mikhail Andronov
Wednesday, April 02, 2003

Hi all,

I strongly recommend against the use of dynamically generated SQL query strings on servers. In many cases the risk of SQL injection attacks is simply too high to warrant the increased ease and flexibility of dyamically generated queries. 

Yes, it is a lot of work to tweak a whole lot of stored procedures.  It is probably _more_ work to do the kinds of security reviews you need to do to maintain secure, injection-proof code!

More important: it is typically _less_ work in the long run to do the _hard_ work up front. That is: to design the database such that it serves all your user scenarios well now _and_ is easily extensible in the future. 

Eric

Eric Lippert
Wednesday, April 02, 2003

Look, I understand its not a good idea to do
query = "select " + Request["field"] + " from table";
but, if you use an SqlCommand object, its just as secure as an SP.  Also, no one has explained how they can offer a non-dynamic query that would adaquetly handle reporting on a verticle table, which is quite common in complex databases. 

Vincent Marquez
Thursday, April 03, 2003

"no one has explained how they can offer a non-dynamic query that would adaquetly handle reporting on a verticle table, which is quite common in complex databases. "

Could you clarify this a bit Vincent as I honestly haven't seen the original request for ideas regarding this. I am intrigued about the potential for a complex design.

Having said that, often claims of "complexity" barring a more formal design (i.e. a database blackboxed by stored procedures, which themselves are usually blackboxing the tables, which hold the actual information) translates into HonestTalk of "not enough design work done up front, or the problem is not understood enough to start deriving solutions". I obviously can't say that with certainty in your case, however I will just say that it's often the case.

Dennis Forbes
Thursday, April 03, 2003

I'd like to address a slightly different issue that all of you have missed, ignored, or chosen not to raise.

Microsoft advocates using their proprietary data structures for interacting with their proprietary databases. I'm vaguely aware of converters that allow you to use their proprietary data structures with other databases, but the point remains: Microsoft wants you to use their solution. There is nothing inherently wrong with this, but let's look at that solution.

Microsoft would have you believe that the best way to encapsulate data is in data-specific structures. Their UI components receive direct-from-database objects (DataSets et al.) with embedded table and column names and types. The message is clear: pull it from the database and hand it straight to the consumer.

Experience has begun to teach much of the industry (and here I must remind all of us that the industry is rather larger than the Microsoft community) that applications written to specific data stores are inflexible and difficult to maintain. I grant that for single, small applications, a direct link to a database can realize simplicity (and occassionally performance) benefits that outweight more modular design. However, for anything vaguely like an enterprise architecture (by which I mean an integrated collection of diverse applications), modular design trumps most other needs. Construction of abstracted data layers which completely hide the data store implementation from the application is vital.

Microsoft's proferred database interaction is inadequate to this end. One can certainly translate DataSets into abstract data entity objects, but if one hopes to use that data in Microsoft interface components, the process must be subsequently reversed. Microsoft has failed to provide for abstracted, component-based architectures.

Regarding the discussion for and against stored-procedures, be they dynamic or static: in light of the above, it should be evident that any and all database-specific (and by this I mean data store implementation-specific) mechanisms should exist behind a thorough data abstraction layer, and should have no direct bearing on the application which consumes the data.

Matthew Schuyler Peck
Thursday, April 03, 2003

Matthew - you're ignorant of some key facts.

>"...use their proprietary data structures"
>"...UI components receive direct-from-database objects (DataSets et al.)"

DataSets are completely data-source agnostic. You have a database-source specific DataAdapter that *pushes* the data into the DataSet.


>"One can certainly translate DataSets into abstract data entity objects..."

That's EXACTLY what a DataSet already is!!!

Duncan Smart
Thursday, April 03, 2003

Your response reveals your parochial understanding of the subject.

I am well aware that DataAdapters Fill() the DataSet. The existence of an intermediary mechanism in no way invalidates my statement. The data in the DataSet is raw, effectively unstructured data, mapped or not. It came out of a query (be it a query in a stored procedure, view, or other structure) and was populated into the DataSet object in tabular format. This is, for all intents and purposes, direct-from-database.

Clearly, the limit of your conception of data abstraction is a data container that doesn't know the database details: "That's EXACTLY what a DataSet already is!!!" The mindset that considers the database data model (or some queried or mapped version of it) the canonical representation of data entities is what has led Microsoft to their limited, inadequate solution in the first place. Not until you understand that a database is an implementation-specific store of implementation-independent data entities can you hope to understand the limitations of the Microsoft solution.

Matthew Schuyler Peck
Thursday, April 03, 2003

You don't need to use the weakly typed version of the DataSet. You can strongly type it. Then, it's not "tabular data", but "objects populated by a data source" (note that the data source doesn't even imply a database, since ADO.NET -- like ADO before it -- does not mandate a database, but can instead be used to browse hierarchical object data stores like an Exchange Server). And, in case you weren't watching, ADO.NET can also feed from non-Microsoft data sources, like say an Oracle database.

Also bear in mind that DataSets are not at all required. Use the DataReader directly to read into objects; that's what we do. We do on occasion generate a DataSet, but we generate it from an object graph, not a database.

In short, I have no idea where you're coming from. You can say that using .NET in general ties you to Microsoft, and that I'll buy, but ADO.NET tying you to SQL Server and DataSets? Hardly.

Brad (dotnetguy.techieswithcats.com)
Thursday, April 03, 2003

Matthew, check out the article I referenced way back in this thread for evidence that Microsoft doesn't consider datasets the only valid kind of data object.  The describe 5 different approaches (not all of them involve datasets), and they discuss discrete layers like you suggest.
http://msdn.microsoft.com/library/en-us/dnbda/html/BOAGag.asp

ODN
Friday, April 04, 2003

I'll make one final observation:

The two standard/suggested mechanisms for retrieving data from a datasource, as outlined in MSDN documentation and Microsoft Press books, are DataAdapter and DataReader. The standard/suggested use of a DataAdapter, as outlined in MSDN documentation and Microsoft Press books, is to Fill() a DataSet. Whether the datasource is a relational database or some other data store is irrelevant: I have discussed my views on DataSets.

DataReaders are the second standard/suggested mechanism. While useful for quick, one-time retrieval of data, they are of little value beyond this due to the fact that they lock the connection until they are disposed of. While this behavior is reasonable, given synchronization and concurrency concerns, it is a roadblock to extensive use of DataReaders as a data retrieval mechanism.

Matthew Schuyler Peck
Friday, April 04, 2003

DataAdapters are built using DataReaders. It's a common myth that DataReaders lead to bad scalability. The DataAdapter usage pattern is a good one: open the DataReader, read the data into a secondary storage system, and close it. Do this as quickly as possible.

Using a DataReader by hand in this way is just fine. It will be no less scalable, as long as you aren't doing any significant computational work while the DataReader is open. One can argue that, from database usage, this is actually the better scenario, because the code is more optimized (no creation of DataTables and DataRows, just direct creation of type-specific objects), although I do believe that it's probably a micro-optimization for all but the largest class of application.

Brad (dotnetguy.techieswithcats.com)
Friday, April 04, 2003

*  Recent Topics

*  Fog Creek Home