Fog Creek Software
Discussion Board




When to use Stored Procedures?

I'm back onto doing some database design for the first time in about 18 months. The last time I did some, we made a policy that all access to the database would be done via stored procedures and SELECT * was not allowed. This meant there was a well defined interface to the database, which could be unit tested separately.

However, I've been discussing my thoughts with our local SQL guru and he says it's not necessarily a good thing, as it can have a performance impact, particularly when the query involved is a simple "SELECT Foo FROM Bar". Well writing lots of well encapsulated functions instead of a quick hack can have a performance impact, but I'd still do it.

Am I missing something here?

Better Than Being Unemployed...
Thursday, July 24, 2003

When I last used a database I had one DLL which contained all SQL statements and which exported a finite API: the effect of this was similar to what you're describing ("a well defined interface to the database, which could be unit tested separately").

Christopher Wells
Thursday, July 24, 2003

Using stored procedures is usually faster, at least with MS SQL.  Stored procedures are compiled by SQL server the first time they are called (I believe).  That's the main reason for using them.  It may have the added benefit of making your unit tests easier, but I don't see that as _the_ reason to use them.

I believe your DBA is saying that if it's a really simple select statement then you won't see any performace improvement, and may see slightly worse performance.  But again, if it's really simple, the performance impact won't make much difference.

I usually use straight SQL queries until I've got the SQL portions pretty much set in stone, I then convert them to stored procedures.  Makes me feel more 'agile' I guess, but then again I'm not great at writing stored procedures.  I know this because I can actually read the SPs I write ;-) .

Steve Barbour
Thursday, July 24, 2003

If you use SELECT statements, you need to grant the account performing the procedure permission to run select statements.  It is more restrictive to limit that account's permission to only executing SProcs. 

Ran Whittle
Thursday, July 24, 2003

"Using stored procedures is usually faster, at least with MS SQL."

If you're using an older version of MS SQL, sure. But if you're running 2000, then no, stored procs are not any faster than parameterized queries. Both are "compiled" the first time they're run, and cached until the database is shut down.

Ran hit it on the head for one of the two reasons to use stored procedures that I've found. First, security. If you limit the user to sprocs, then you limit the exposure area. Second, data flow. If what you're doing requires manipulating a lot of database data in order to yield a much smaller amount of end-result data, then that kind of processing is better achieved inside an sproc (to prevent sending the mass of data over the wire to the client).

You have to balance that against the negatives, though. If you're writing an internal app, it's not much of a big deal to tie yourself to a specific database vendor, but if you're writing "shrink wrap" software, then you need to work with the existing customer database investment.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, July 24, 2003

Cool, I didn't realize that SQL 2000 did this.  Thanks for the tip.  Definately spot on about reducing the data sent across the wire too.

Steve Barbour
Thursday, July 24, 2003

I note that you specifically mentioned 'SELECT * ...'. A good reason to disallow this is so that you don't get unexpected results when you add two new columns to the table nine months later. If you force all SELECTs to name the columns they need, you're a bit more resilient to change.

Also, another security aspect of only using stored procedures is that in most cases (unless the proc assembles dynamic SQL statements) they're safeguarded from SQL injection.

shoppe
Thursday, July 24, 2003

We were faced with the same choice on a project that wrapped a little while ago.  Using Stored Procedures only to access the database seemed to be the hands down winner, but we eventually decided to bundle all the SQL into a DLL, as mentioned in a previous post.

The main reason this route was eventually selected was because of the sheer number of procedures involved.  I guess there's no way in MS SQL Server to package procedures together into modules, so you get what amounts to a giant list of procedures.  Of course, since there's no way to manage context, each procedure needs a unique name, which gets really messy.  There were also problems as well with keeping the sprocs consistent during schema evolution that the DLL approach reduced a bit.

How do the rest of you handle these problems?

anon
Thursday, July 24, 2003

We use SPs for security reasons only.  We also have a .dll that provides access to our SPs.  The only other thing that comes to mind is that while SPs are compiled and cached, dynamic SPs aren't and this will affect performance.  I think you need to weigh performance vs. security.

shiggins
Thursday, July 24, 2003

Stored procedures also make it easier to change the underlying db design without having to modify the client applications. We only use stored procedures for our applications for several reasons:

* Security is easier: You grant permissions on a per-proc basis rather than on tables themselves.
* It makes it easier to keep track of who calls what.
* You can do pre-processing of data before returning a result set.
* You can massage input parameters as well, such as faking variants, making some queries easier depending on how the client calls them. (Do this by accepting a varchar and then casting. This also makes CASEd WHEREs and ORDER BYs easier.)
* You can break functionality down further into sub-procs and thus reuse some code.
* You can modify table structures without the clients having to change their code.

There are more advantages, but these are just off the top of my head.

One disadvantage to using stored procedures is that you wind up with a lot of them in a large project, and with that comes the problems of managing them and just plain remembering which do what.

Troy King
Thursday, July 24, 2003

If you're not careful, you WILL run into serious performance problems, but not necessarily where you think.

If the developer can't write their own SELECT statement and the DB developer isn't conveniently responsive enough, then the app developer will do stupid things like call two existing SPs and join them manually on the webserver or something.

Richard Ponton
Thursday, July 24, 2003

I tend to put SELECT statements into views.

John Topley (www.johntopley.com)
Thursday, July 24, 2003

>>* You can modify table structures without the clients having to change their code.

Hear Hear! This is real big. There is no reason to slew in-line sql throughout the application IF YOU CAN avoid it.

It is far better place the sql on the server then digging through reams of code to replace, or modify sql.

Also, make all developers on the team use the ER tools in the Enterprise manger. I can't tell you how many times I walked into a company and ask to see the ER/ database schema. They all look at me like I am from another planet. Ok, so then I go talk to the developers, and they tell me they don't use RI , or the ER tools! (yikes...all Referential integrity in code.....you are kidding!). Don’t enforce RI everywhere in code....use the tools on sql server to do that.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Thursday, July 24, 2003

"They all look at me like I am from another planet. "

I always thought that Albert was from another planet - but that was based on the length of his replies and signature.

What's wrong with referential integrity in code? It makes you fairly database independent.

DCS
Friday, July 25, 2003

Referential integrity in code is a problem because not everyone will access the database through your code.

Salespeople/managers/support people/etc will find a way to use Access to 'tweak' fields in your database, insert things they think should be there, remove 'obsolete' items, etc. Next thing you know, your database needs serious cleanup.

This is also a reason that DBA's will allow updates only through stored procedures. It helps keep the database clean.

RocketJeff
Friday, July 25, 2003

Specifically, there is nothing wrong with referential integrity in code — as long as there is referential integrity at the DB level as well.

Richard Ponton
Friday, July 25, 2003

Random thoughts:

SQL in code is fine until you have a WHERE statement. Now you're marshalling data across the wire that doesn't need to be.

*I* like SP's because I find them very++ easy to test, and it's often where the really hard to find bugs crop up. So I step through code until I get to the SP call and identify the parameter values. Then I run the sp by hand with those values to verify the returned results match the expected results.

Personally, I find it very easy to separate data access code in my mind from business/presentation code, and put the data access code in the SP. But I've never had a code review on my SP's, so I could be insane on this one. ;-)

Big huge megadittoes on SQL injection attacks - they're almost impossible if the only way code touches the database is via SP's.

SP's also (IMHO) make handling nontext fields easier in both SQL and Oracle (esp. dates).

Albert, while I agree how shocked I am at how few people have E/R diagrams of their data, I must also agree that you are, in fact, from another planet. :D

Regarding Select * - the real evil is when one of the two added fields is a blob. Suddenly your queries are hauling back several dozen megs of data. I'd think this would be easy to find, but if you're using select * so freely, how many places do you have to change it?

Philo

Philo
Saturday, July 26, 2003

Actually, I am well known for being from another planet ;-)

As for the RI in the code vs using the data engine?

I can not tell you how strong I feel on this issue. On my team, and my software, RI goes into the data engine. That is the end of it!

It is a question where you put the business rules. If your code has to enforce RI, then a lot of problems arise.

First, any new developer is going to have to learn those rules, and know when, or when not to deleted related records in other tables. A developer cannot possibility modify and work on any project of any complexity when all the data rules are in the code. That developer will have to KNOW the rules just to modify any code. Just trying to modify some code that deletes a record means that I have to go and find EVERY INSTANCE of that code in the application. So, if any RI rule is changed, then you must again look through reams and reams of code to find things such things as simple sql deletes etc.

Assuming we just added a new table, you now must add new code that deletes the child records. Worse, is that we might be adding another child table to the application for a new code module. Perhaps we don’t even plan to touch the existing application. We might be adding a new program, or application to the mix. However, the problem still exists that we must modify and check all the existing code and programs  that deletes a record.  Further, the way I normalize data these days, we probably just added two or more tables to the database!

With RI in the code,  just now try adding a new child table to complex application. You are duck soap so fast it is not even funny.  You now on a huge code hunt to find every single spot,  program, or even OTHER programs and process that can delete a record. Today, that data source might be from a palm pilot, web site, or one of 15 other programs that interface into the system. Just try and change one thing in the database, and then try now to figure what programs and code needs to be changed.  You can argue that business objects (3 teried) applictation be used, but really, most RI goes at the engine level.

It is COMPLETE unacceptable to wash up nice code with reams of RI rules thrown all over the in the code when that stuff can be done in the database engine. It is better, more manageable, and is less cost.

Further, nice ER diagrams and engine level RI provides the BEST DOCUMENTATION that developers can have for the money. So, even if there was little advantages, I would still strongly recommend engine RI for reasons of documentation. Try changing developers on the team. How possibly do you get them up to speed. How do they learn the data designs for the application?

When I walked into that client, they have spent a LARGE amount of money on developers (vb + sql server). No engine level RI was to be seen. No ER diagrams were able to be produced by the company. How does a developer even know what fields belong in what table? The application that company has now is barely maintainable, and will cost at least 2 or 3 times more to maintain. (people seem to forget that dollars are precious things). That money spent should be going into new features, and also into changing the code to REDUCE maintenance cost of the code.

By chaining data designs, in one of my products, I reduced support costs by 10 hours per year.

Heck, even people who need to query some data into Excel were complete lost at this company. Who knows what table belongs to what table? It was all in the programmer head, and the rest all in the code. I not sure what is worse, having the design the code, or in that programmer head!

Look, years ago, this type of situation was common, to day it is not acceptable anymore.

As a developer I the first thing I stick up on the wall is a nice big ER diagram. What, do you do all day to figure out what table names are, and then figure what fields names to use? Without a nice ER diagram a huge amount of time is wasted just getting stuff like table names, and field names.

Heck, just try building a simple query without know what tables relate to what tables? It is a joke to work in a environment like this. C

In my opinion, that company was getting ripped off, and the those developers were 2nd rate.

You got this whole incredible data engine, and the developers are running around writing all kinds of code to do things that the engine and some stored procedures can do.

Boy, good developers do not = good database people.

Ask any developer which is better:

    Really good code, and bad data designs
    Really good data design, and bad code

The good data design wins every time. You can’t modify and work on an complex application when you don’t have a understanding of the data tables and the relations. Further, to delete, add, or modify the data structures will usually requite a VERY HIGH knowledge of the existing code base. At least with RI at the engine level, you reduce this problem by huge amounts. You can modify and fix bad code a lot quicker in a large application with a good data desing. With good designs the code practically writes it self anyway.

If the code is good, but the data and RI is all messed up, you can’t maintain the software at all.

Enforcing RI rules in code mean you create a instant mess..

It is true that products like the old dbaseIII/Foxpro and even stuff like MySql for years did not have engine level RI. So, it would be un fair of me to mention that good software cannot be written in these systems.

However, that is a good 10 years ago in the industry. This lack of good tools simply means that applications were less feature rich, or more expense to develop back then.

Today, customers are FAR MORE demanding of an application with a RICH feature set. Thus, customers are demanding not so much better UI’s (we got that now), but they certainly are WAY MORE demanding in terms of a rich feature set. (good data design = rich feature set).

This software industry is all about turning code into dollars. If people do not take the most high road when it comes to software designs and especially programmer productivity, then our products will be crap for the money. We will price ourselves out of the marketplace.

It is a compile disaster not to use RI in any modem data base system with the fabulous ER and diagramming tools we have.


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Sunday, July 27, 2003

Heh. The Reader's Digest version of Albert's treatise:

Put referential integrity in the database and the database becomes self-describing.

In real world terms, if your RI rules are in the database, anyone new to the project can open a database modeler, point it at the database, and have an E/R diagram in under five minutes.

Mind you, in my experience any shop that doesn't have an E/R diagram handy for people that need it won't put RI rules in the database either...

Philo

Philo
Sunday, July 27, 2003

Thanks for the comments, guys. The one thing that did concern me about using SPs was the inability to store them in modules. I suppose having a consistent naming convention is the best thing I can do there.

I agree with the comments about Referential Integrity as well. Certainly on SQL Server it's dead easy to fire up Enterprise Manager, create a table diagram, and then join up the relationships between the tables. You get a good design picture _and_ all the constraints are sorted out.

Better Than Being Unemployed...
Monday, July 28, 2003

*  Recent Topics

*  Fog Creek Home