Fog Creek Software
Discussion Board




Reporting Techniques...

Let’s say I run a car dealership and I want to return various facts and figures for all of my salespeople.

For example:
For each salesperson:
*I’d like to see aggregate sales figures for the last year and comparisons to the last two years
*I’d like to see every car they sold in the last six months
*I’d like to see their attendance record for the last year
*I’d like to see their customer solicited comments for the last month

This is stored in a vanilla SQL DBMS.

Now, all of these can be formed separately by running three or four SQL queries. Due to the different types of data obviously it’s impossible to combine them all into one big honkin’ query, though.

The way we’ve done it in the past (before I broke out the cluestick) would be that we’d issue a ‘driver’ query which would get all salesperson’s names/IDs and then as we loop over that result set we’d run the three or four ‘detail’ queries with the salesperson ID from the driver query and print out their results. When the driver query moves on to the next row, we issue a new page statement (this generates a PDF document) and repeat. This gets quite slow as soon as the number of 'detail' queries start to ramp  up.

This is certainly doable in this case, but it’s far from the most optimal way to query a DBMS. We’d have number_of_salespeople * number_of_detail_queries queries every time we run this report. Due to some vendor-imposed restrictions (e.g. lack of salesperson index) some of the queries will have to full scan a large table further compounding the inefficiency.

The language we’re using does not support objects (oh the horror) so I can’t run the queries separately, put them in objects, and then iterate over them after the queries have stopped. It *does* have arrays, though, so I figure I could run each query and store them in an array, but I think that would create some maintenance problems later (each new query would require an array and requisite array maintenance code).

Any thoughts/ideas?

Thanks in advance,

The Cap’n

Captain McFly
Tuesday, August 10, 2004

It sounds like you might not have written the queries right.  If you can use nested selects, you can probably accomplish the same thing.  A lot of people try to use SQL Server CURSOR's or other such things instead of looking to reduce the complexity of the query itself.  Try that before reverting to arrays.

sir_flexalot
Tuesday, August 10, 2004

use MS Access to access the data store, and then use nested subreports to generate your reports.

Can be done quite easily.

Tapiwa
Tuesday, August 10, 2004

I concur Access's subreports are awesome for that task, I've only had one report that ever needed that type of layout and  subreports covered it perfectly. As far as I know Crystal Reports and ActiveReports don't match Access for subreporting.

Justin
Tuesday, August 10, 2004

As a few others have already mentioned, a solution will depend on what tools you have

People don’t’ write code to layout and format a reports anymore...…do they? (I wish my clients could afforded to pay for that kind of stuff!!.).

You approach to “loop” the main table sounds reasonable.

Hence:

>When the driver query moves on to the next row, we issue a new page statement (this generates a PDF document) and repeat. This gets quite slow as soon as the number of 'detail' queries start to ramp  up.

I would assume that you generate the “sub” queries, print the data, and then give back the resources used (your arrays or whatever you used to store the results of the queries).

There is no question that each of these “extra” quires you build is going to be a serous performance hit, and will slow things down. However, there is little alternatives here.

Each query takes a lot of resources to open and run. In fact, often, if you have:

tableSalesMain-> tblCarsSold

So, in fact, you either process the above joined data, or even simply continue to use your “main” table as the driver loop. So, you wind up with two queryes:


tableSaleMain
tableSalesMain-> tblCarsSold

Now, as you loop the tablSalesMain, you do a scan on the 2nd query above to pluck out the child records that are already in memory. My point here is that often scanning the results of a join loaded into memory is MUCH faster then executing a query to return the cars sold each time through the loop (and,  you have rightly point this out already).

Not knowing your tools and your environment, any suggestions that help more then the above likely would requires knowledge of your current setup. It is also not clear how large your data sets are. For example, if the joined data table for cars sold is too large, then it will take up too much memory, and the scanning time will approach or exceed that of simply executing a query “each time” to grab the cars sold.

With VB/JET/ms-access for example, that cross over point is about 10,000 records (ie: execute a query  each time through a loop vs. loading all the child records into memory). I taken reports that take 30 seconds and got them down to below 2 seconds using the above knowledge.

So, anything that avoid opening a query and loading records “each” time through the loop will speed tings up

And, yes..those comments about sub reports in ms-access being incredible are so true (everyone loves to hate ms-access…but you can't beat that reprot writer!).
.
You likely don’t have, or can’t use ms-access in this case, but it is darn good, takes virtually no code, and in a matter of minutes you would be done.

All you do is build the simple sql for the cars sold:

Select * from tblCarsSold where
    datediff(“m”,[PurchaseDate],date()) <= 6

The above will return all cars sold in the last 6 months

You then drop in the fields you want on the form

Make          Model              SoldCost  etc.

So, you drag and drop the fields you need for this columnar reports. Save the report.

You now create a report based on tableSalesMain, and in the detail section you simply drop in the above report, and presto you are done (the link master/child fields are usually set automatically for you here, and this restricts the report for EACH salesman for EACH detail record). You can continue this process for as many sub-tables as you need, and each sub table/report can have its own columns etc (and even more cool is you can grab totals from each sub-report into a summary results.

And, if the ms-access report don’t do what you like, you can always write VB code “inside” of the report to do what  you will. Such a rich report system that allows you to write VB code inside is what makes the ms-access report writer second to none. It is hard to believe that this report writer is 10+ years old now, and the rest of the industry still don’t have anything that matches this!

However, it don’t sound like you have, or can use ms-access. However, I would assume that you have cobbled some code together to build and layout your output. As mentioned, about the best speed up is to pre-load “all” of the child records into your temp structures and scan them.....


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Tuesday, August 10, 2004

Thanks for the ideas everyone.

Mr. Kallal is correct in that we don't have MS Access available. Well, we *do* have MS Access but these reports are kicked off automatically on a Linux server and so there's no real way to take advantage of the power/convenience of Access in this case.

I’m not really sure what we’ll end up doing, but there might be options with the reporting language to load tables into memory for client-side joining. This is, obviously, suboptimal but might be what we have to do.

Now that I’m thinking about it, I might be able to do something like my original idea (loading result sets into memory) but by creating a temporary file for each sales person which contains all of their info – e.g. the first query would create the file and write its data; the second query would append its data; etc.. I would then open each file and format the data “all pretty like” to the PDF.

Captain McFly
Wednesday, August 11, 2004

Why not just schedule access to fire off a report every month or whatever.

The reporting too in Access is extremely powerful, and is very easy to use.

You create the report. Once
Your scheduling tool fires it off. Every month

All you need is to ensure that thhe access 'database' has a live connection to the data source. Even if your database is on a linux server, it will work, as long as the Windows box has Read rights.

It really is very easy. The beauty is that you design it once, and run it many times.


If you are reall not happy with windows, check out freshmeat.net and sourceforge. There are a lot of linux or platform independent (Java) report writing engines.

Have not used any of the free ones in a while, so can't really comment, but Jasper seems to have good write ups.

http://freshmeat.net/projects/jasper/
http://freshmeat.net/projects/jasperassistant/

Tapiwa
Wednesday, August 11, 2004

We have no windows servers nor really want to move reports out of our reporting environment portal. I certainly don't want to have to manually run a report every month on my box (or ensure that it automatically fired) and then move it to the intranet by hand (the portal is 3rd party and not easy to manipulate).

Captain McFly
Wednesday, August 11, 2004

*  Recent Topics

*  Fog Creek Home