Fog Creek Software
Discussion Board




Strange: Queries are slower after adding indexes!

We've just added a handful of carefully targetted indexes to our SQL Server 7 database.

I spent ages going though the execution plans of our worst performing queries identifying the bottlenecks and then introduced indexes to relieve these.

Some simple benchmarks showed a small performance improvement in our development environment.  The hope was that these improvements would be better on the much larger production database.

However, the reverse is true.  After applying the indexes we see a significant slow down against the production database.

How is this possible?  How can the addition of indexes reduce performance on read only queries?

I can understand why updates will run slower, they have to maintain the indexes, but these queries are all read only!

My guess is that with the new indexes that statistics, etc, that SQL Server has gathered over time are no longer used and the new index structure will need a little time to "bed down." 

This is just a guess, though, and I haven't been able to google any support for this theory.

So, am I wrong?  How else can this be happening?

Frustrated Maintenance Programmer
Tuesday, August 17, 2004

Indexes are no magic thing that always improves performance.

Imagine a table that records Name, First Name and so on together with the gender. Also, say, that gender is roughly equally distributed, 50% M, 50 % F. No indexes.

You do a
  select x, y, z from my_precious_table where gender = 'M';

The entire table is read.

You add an index on the gender. And you do the same query again.

Now,  the entire index is read. Additionally, chances are, that the entire table is read as well. This is because it reads records off HD blocks, which might contain M and F records.

René Nyffenegger
Tuesday, August 17, 2004

Bi-value indices are less efficient than queries.  If you index on a field that just has two values, true/false, male/female or even if there's only 3 or four values your index is going to be mostly useless.

Because if you think about it you've just created a list (in the case of bi-values), one half of which, or so, is one value and the rest the other.

Simon Lucy
Tuesday, August 17, 2004

"We've just added a handful of carefully targetted indexes to our SQL Server 7 database."

No you didn't.  If they were speed would improve.  Time to trace to see if they are being used or not.

girl, you know it's true
Tuesday, August 17, 2004

"I spent ages going though the execution plans of our worst performing queries identifying the bottlenecks and then introduced indexes to relieve these."

Have you loked at the exec plans of the production DB after you've added the indexes?

I'm not that familiar with these issues on SQL Server, but on Informix, when this happens, the first thing to do is to look at the query plan, to see which index(es) are being used.

Often, the problem is solved by doing an update statistics, to update the system tables where the SGBD keeps the index info used to determine the best query plan.

Paulo Caetano
Tuesday, August 17, 2004

If the index was poorly targeted, then there would be no increase in peformance.  The index would be ignored by optimiser.

What we are seeing is a significant degrading of performance.  This is what I don't understand.

I am not indexing any Bi-Value fields.

Most of the fields are either foriegn keys or date-time.

Frustrated Maintenance Programmer
Tuesday, August 17, 2004

Paulo,

Thanks.  Yes, I am trying to get hold of the execution plans.  These are always difficult to get from production.

Frustrated Maintenance Programmer
Tuesday, August 17, 2004


INDICES!

Q
Tuesday, August 17, 2004

Did you set statistics on and look at the number of "logical reads"? Any query that has more that a couple of thousand logical reads will be slow, those are the ones to rewrite or index, it's often surprising what table is causing the problem:

set statistics io on

Tom H
Tuesday, August 17, 2004

"Some simple benchmarks showed a small performance improvement in our development environment"

Your dev environment must exactly match the production for your tests to be valid, including row totals.  If you are sure that they match and production is slower than dev then you need to look at page/index fragmentation...

Chris Peacock
Tuesday, August 17, 2004

... which just possibly means you developers are hogging a better machine, or are enjoying a less congested one. Just banging on a production database copy in isolation without the normal production enquiry traffic and other processes may mean your DB app is now being swapped out a bit more on the production box because you now have new indexes and associated code to store in RAM.

can't guide you much as these things don't scale too well but,  just maybe, if you can get all the stakeholders to agree to a test and the planets line up, you could jam some more RAM into the production box, either bought in or borrowed from your own gear and see how it goes. Page fault logs may exist to help your thinking on  this.

I do remember a delicious meeting where production and development permanently swapped machines, so watch it!


trollop
Tuesday, August 17, 2004

"You add an index on the gender. And you do the same query again.

Now,  the entire index is read."

This is only true if you explicitly added an index hint on the query, i.e.

SELECT blah FROM blue WITH (INDEX(ix_my_binary_index)) WHERE blingo

The reason is that SQL Server includes a fairly intelligent query plan generator that will evaluate statistics to determine distribution (statistics are automatically created when you add an index as a sidenote, though if one wanted to redo them you can use UPDATE STATISTICS). In the above case the selectivity of the binary field would be 0.5, which is huge (i.e. non-selective). In that case SQL Server would choose not to use the index and would just do a table scan (which is surprizing to many newer to indexes).

The reason it does is because the process that you described, called a bookmark lookup, is very expensive, and SQL Server will only use a non-clustered index that requires a bookmark lookup if it represents something like less than 2% of the overall data. Note that the cost factors built into SQL Server are fixed constants, which is actually unfortunate: I've worked on one very large, SAN based system that excelled at random access, less so on gross throughput. Because of this bookmark lookups would be a far better choice even for less selective sets, but SQL Server used the constants made for a traditional local RAID array. On the flip side maybe the OPs setup is particularly bad at random access, and the bookmark lookups are actually slower.

As others have said, check the actual execution plan and see if they're being used. It's unlikely, albeit possible in edge cases, that the indexes are truly slowing down your queries.

.
Tuesday, August 17, 2004

Part of your problem is that you're using MS SQL.

muppet
Tuesday, August 17, 2004

Oh, and some index hints to add to the above (yeah I posted "anonymously" as .):

-Try to make indexes with full target query coverage - i.e. the selected columns can be served by the same index used for the where predicates. This is very high performance. Remember that non-clustered indexes on a table with a clustered index actually include the columns from the clustered index as hidden fields. This requires no bookmark lookups and as the index entries per page is often very high is extremely fast.

Dennis Forbes
Tuesday, August 17, 2004

At the risk of teaching a grnadmother to suck eggs, you did update the DB's statistics and rebuild the store procedures after creating the indices, didn't you?

Mark Charsley
Tuesday, August 17, 2004

"At the risk of teaching a grnadmother to suck eggs, you did update the DB's statistics and rebuild the store procedures after creating the indices, didn't you?"

As some brief additional info, SQL Server automatically ages out stored procedures occasionally, and rebuilds them on first execution after starting up, or on an explicit recompile require. Until that point, as Mark mentioned, it won't use new indexes.

Regarding statistics, just wanted to mention that index covering statistics are created, and updated, when CREATE INDEX is run - it should have accurate, current statistics from the beginning. It's a good idea to do full database FULLSCAN index updates regularly as general maintenance though (indeed it's included as a step in the Database Maintenance wizard).

Cheers!

Dennis Forbes
Tuesday, August 17, 2004

It's been ages since I've used SQL Server 7, but I think you can still do:
SET SHOWPLAN ON
SET NOEXEC ON

SELECT ...

GO

You should be able to paste the ShowPlan in here.

Your best bet would be to go to DBForums.com's MS SQL Server forum:
http://www.dbforums.com/f7

Since you have better formatting tools.

Captain McFly
Tuesday, August 17, 2004

I've had some horrible problems with query plans not matching in SQL 2K as they should. 

Ie, using Query Analyzer produces one query plan, then doing it in production uses another. 

I doubt this is your problem, but its quick to check, so might as well.  Its called "parameter sniffing"...

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&c2coff=1&selm=%23LOH3ewDDHA.392%40TK2MSFTNGP12.phx.gbl&rnum=5

Crackhead
Tuesday, August 17, 2004

how much data?  maybe the amount of data doesn't warrant the use of indexes....may just be quicker to table scan.

Yo
Tuesday, August 17, 2004

This is a lot of data.  Gigabytes worth.

I'd liked to stress that I have the practical experience, having worked with SQL for over a decade.

I've also read extensively from Date and Pascal.  I know how relational database works.  I understand the first 4 rules of normalisation and I'm even getting to grips with 5th and 6th.

This is not some stupid error on my part.  Logically all is sound.  Something screwy is going on at the implementation level.

I am willing to bet that the fault is in the benchmarking, but proving it is the problem.

Frustrated Maintenance Programmer
Wednesday, August 18, 2004

Frustrated Maintenance Programmer,

<quote>
This is not some stupid error on my part.
</quote>

I don't mean to sound rude, but *everyone* thinks like that. And they are usually wrong.

When diagnosing problems, I always think along the lines of 'what have I screwed up here?'. It usually works because:

a) It usually *is* my fault (if you be honest, I think that is the case for most programmers).

b) It focusses my thinking on what I can change rather than blaming it on something I can't change.

Anonymous
Wednesday, August 18, 2004

*  Recent Topics

*  Fog Creek Home