Fog Creek Software
Discussion Board

Welcome! and rules

Joel on Software

Weird SQL Server Performance

OK - I know this is really for .net questions, but I am having weird trouble with SQL Server 2000 (I am accessing it from a .net app, so it's not completely off topic!).

I have a stored procedure which mostly works great, but sometimes is really slow. I've been through all the normal performance tuning stuff, and have isolated a particular select statement which seems to be the problem. This is a really simple single table select with two criteria:

insert into @Results (ListingID, ListingType)
select top 200 id, 3
from Listing
where code = @Code
and id = @ID
(the variable names have been changed to protect the innocent - they are actually a bit more descriptive than that! And I know there's no order by, so the top 200 could be anything.)

The weird thing is that with statistics IO set on, when this single statement executes within the stored procedure, it usually reports around 300 logical reads. However, on the occassions when it runs for a long time, it reports around 300,000 logical reads. This is the same SQL being called with the same arguments on the same server. The data is not changed between reads, and the stored proc is not recompiled.

I understood that the number of logical reads was one factor which did not change between successive executions of the same query (assuming the same parameters and data), and yet I am seeing a thousand-fold increase in the number of reads. We have three identical servers, all of which are showing the same problem, which makes me suspect this is not down to faulty hardware. Any suggestions gratefully received...

Mark B
Wednesday, April 13, 2005

Can you post some of the sproc?

Wednesday, April 13, 2005

never mind, i'm not thinking posted parts of it

Wednesday, April 13, 2005

Mark --

I haven't hit this same exact scenario before, but one close enough that I will offer what I know.

I have a large table, with about fifty columns, and 4m rows. I would occasionally run queries similar to 'UPDATE Document SET BatchID = 10 WHERE BatchID IS NULL AND Name = "SurfaceOne"'.

The query would execute, but instead of just getting the SurfaceOne rows where BatchID was NULL, it would update ALL of the SurfaceOne rows.

In order to fix the problem we had two choices. The first was to reboot the server, and run the query on the machine while idle. The second was to add an index on BatchID and Name (before we only had an index on Name). The second solution was what we went with over the long run.

I bring this up because it appears that there is a bug in SQL Server which causes it to occasionally come up with faulty execution plans that do strange things (like above, and possibly your case).

My suggestion is to then try adding an index to the two columns you are using, and see if it makes a difference.

Let us know how it goes.

Wednesday, April 13, 2005

"The query would execute, but instead of just getting the SurfaceOne rows where BatchID was NULL, it would update ALL of the SurfaceOne rows."

You mean, SQL Server corrupted your data, by updating the wrong rows ?

If true, this is a huge deal. You should talk to Microsoft about this, as I'm sure they'd be happy to help.

I find it hard to believe that this is actually what happened though, with all due respect to L~. Could there be some other explanation ?

Joel Spolsky
Thursday, April 14, 2005

Hi, I'm a colleague of Mark's. To add more more information:

We created indexes on the table as suggested, and this worked great on one box. We rolled the indexes onto a 2nd box and the performance tanked. Obviously we pulled the indexes back off again.

Today we had another go, just to make sure that we didn't use the wrong script ;-) Similar result, accept this time we rolled to 2 out 3 boxes successfully, but the 3rd one tanked after 5 minutes.

Next step is to apply index by index to see what happens. If all else fails, then we'll speak to Microsoft about it... or we'll change to MySQL ;-)

Mark C
Thursday, April 14, 2005

Try prefixing any objects with the owner - e.g. dbo.Listing

Actively Disengaged
Thursday, April 14, 2005

"You mean, SQL Server corrupted your data, by updating the wrong rows ?"

Yes I mean SQL Server corrupted my data by updating the wrong rows. Yes, I realize this is a very BAD THING, and that the proper thing to do would have been to spend three weeks trouble shooting the issue with Microsoft Tech Support.

Let me just call up the CTO of the largest North American Financial Instituation and tell them their very important and already overdue project is on hold while we trouble shoot a SQL server issue with Microsoft. Oh wait, can't do that.

Of course the easier thing to do was to simply add the indexes.

You have to realize that the project in question was already overdue because our NAS server, a nice 60K dollar piece of equipment from Dell running Windows 2003 Storage Server (770N + 4 220S for those wondering) was corrupted while troubleshooting an issue with MS tech support. That cost us 6TB worth of data, and about two months worth of work.

Personally, I wanted to go back and spend the time troubleshooting both issues after the project ended, but was informed by my boss here that it wasn't a priority. His thinking was, to paraphrase, why should we burn engineering time on an issue that is obviously Microsofts problem.

As a result of the NAS issue, we've started evaluating Linux/Samba file servers, with good results.


Thursday, April 14, 2005

Mark -- I forgot to ask, is the query (or table) in question part of a transaction? You could have a lock contention issue. You can check into that using Enterprise Manager.

Let me just propose a quick scenario... thread A locks table, thread B (your stored proc) tries to insert a row into table, only to stall as a result of the lock.

You didn't say thing to indicate this was the case, but I thought I'd ask.

Thursday, April 14, 2005

I don't think locking is the issue here. We only read from this database, so should only ever have select locks. I am considering setting the database readonly, just haven't got round to fully testing it yet.

I am more concerned about the variability in the number of logical reads this select statement requires.  Everything I've read says that the same SQL on the same data and the same schema will ALWAYS require the same number of logical reads, yet I see the same statement mostly use n reads, but occasionally use 1000n reads. This sounds like a fundamental problem with the optimiser.

I should probably add that the select statement is in an "if" clause. I'm guessing that the first execution of the sproc didn't execute that path, so this isn't in the cached query plan, so gets replanned with every execution, and sometimes that replanning goes weird.

Mark B
Friday, April 15, 2005

*  Recent Topics

*  Fog Creek Home