Fog Creek Software
Discussion Board




Database tuning

I have a SQL Server 2000 database with a few tables. The main table has around 20 columns. As the number of records in that table increases, fetching the records with a "SELECT *" through a VC++ application takes a long time to return. With around 200,000 records, it takes around 20 seconds. How can I tune the performance - both in terms of database and querying.

John
Tuesday, April 08, 2003

1) Don't use select *.
2) Do use stored procedures
3) Use indexes on field that you're selecting from.
4) Normalize the database.
5) Ask C++ gurus about optimal methods of pulling SQL data - you may not be using the best method.
6) Examine why you're pulling what you're pulling, and if you really need all of it. Generally running a select * on a table with a lot of rows and columns sets off alarms in my head..

Philo

Philo
Tuesday, April 08, 2003

Indexes are only going to help if you use a WHERE clause. Which goes with your Point 6.

What sort of app needs to display 20000 records ?

Damian
Tuesday, April 08, 2003

The application is querying for a set of records satisfying a time range, to be displayed in a listview control. The query looks like this:

SELECT *, DATEPART (yyyy, TimeStamp) as 'Year', DATEPART(mm, TimeStamp) as 'Month' from EventsTable WHERE ( TimeStamp >= '2002-12-01 12:34:09' AND TimeStamp <= '2003-01-31 14:09:09')

When there are about 200,000 rows, this query takes a while to return, even though it may fetch only 10 records.

John
Tuesday, April 08, 2003

The database is not indexed on TimeStamp

John
Tuesday, April 08, 2003

If it need not be filled into one stupendously huge ADO recordset consider BCP export. There's a SQL-DMO object model for the thing, it's like three commands. Easy peasy. Try XML export (and then read it in in client side ADO), --might slow you down, might not, parsers are pretty now days. Pretty enough? Hard to say. Development time: medium

Round trips you have taken using ADO Connection / Recordset and other objects can be bundled into one ADO.Command call on a stored procedure with INs and OUTs. It's never pretty that way but you could easily x10 performance under the right circumstances (and you are sure you don't need to scale any more than that). Development time: light

How fast do you need to manipulate it in ram (C++ side)? To read that BCP export into ram you'll want to write a custom C++ data structure.. preferably one as awe inspiring as Dali (Bell Labs), Timesten, kdb, or at the very least Dataset (.Net). Write a com object that will free thread access it. That's the fastest way, but that's probably not the answer you are looking for... Development time: more than god's got fingers for.

Li-fan Chen
Tuesday, April 08, 2003

Actually Philo, he might be using so many selects that the pause is all in the select. That way you'll wait 20 seconds even on gigabits connecting xeons. Denormalization can be of help. But again this assumes you'll use something cute (see BCP/DTS) to transfer the data over the network.

Li-fan Chen
Tuesday, April 08, 2003

A very deep question. 

Check out www.sql-server-performance.com.  It has tons of great pointers.

BoredAtWork
Tuesday, April 08, 2003

Well, first thing is to throw a index on that column. Depending about how the PK on the "main" table is used elsewhere, you may want to make that index a clustered index and create a nonclustered index for the PK.

After that, I'd open up Query Analyzer and check performance with and without those DATEPART functions (be sure to turn on Show Execution Plan). If the difference is big, you might want to consider getting the year and month out of the timestamp column in the client code. This makes a little more sense in the .NET world (the column in question would map to a DateTime - easy as hell to grab the year and month).

Andy
Tuesday, April 08, 2003

Dare I add, try executing the query from within query analyser and have a look at the execution plan. This will tell you where all the time is going. It can help you plan indexes and fine tune your query.

Geoff Bennett
Tuesday, April 08, 2003

LOL, didn't check the second part of the initial question, forget what I said, consider right-clicking on the table icon in Enterprise Manager and pick Add Index.

In the wizard/dialog box add cluster indexes or non-unique indexes (dates are usually none unique).

Should you need to scale to millions of rows then 1) order by date, cluster index, create partitioned tables joined by a view. So one table for 2000, one for 2001, one for 2003, you get the point. Again, there are online articles on this.

Li-fan Chen
Tuesday, April 08, 2003

I went thru the tables in the database. In the query

SELECT *, DATEPART (yyyy, TimeStamp) as 'Year', DATEPART(mm, TimeStamp) as 'Month' from EventsTable WHERE ( TimeStamp >= '2002-12-01 12:34:09' AND TimeStamp <= '2003-01-31 14:09:09')

EventsTable is actually not a table! Its a view doing a complex innner join, left join, union, etc on various fields from 6 other tables (including the main one).

Right now I am going thru the tips in www.sql-server-performance.com and using MS Query Analyzer's "Show Execution Plan" and other tools in there.

John
Tuesday, April 08, 2003


do you use bulk fetching or you retrieve the records one by one (eg. movenext?).

na
Tuesday, April 08, 2003

John, you can still actually put an index on the timestamp column in the view. They're called "indexed views" funnily enough. Saying that, their usage is quite restricted by various caveats. Check out Books Online.

Duncan Smart
Tuesday, April 08, 2003

Mainly it's only available in Sql Server 2k and Oracle.

Geoff Bennett
Tuesday, April 08, 2003

*  Recent Topics

*  Fog Creek Home