Fog Creek Software
Discussion Board




Excel Automation vs. HSSF

As a few of you may know from previous postings, I'm looking for a highly flexible reporting engine that can do graphs, shaded columns, underlines, etc. Basically, any reporting system that supports all or nearly all of the features you can achieve by using Microsoft Excel by hand. My quest for such a thing has failed miserably. A few of the alternatives suggested in previous postings are great reporting systems, but I am looking for something that can generate near-marketing quality documents in full color that can go to the end customer.

That said, I am leaning towards using Excel Automation for this. I've used it before to produce similar complex reports and it was quite good to me. The only issue I have is with performance. Automation is slow as dirt when you are running thousands of reports. It's also very hard on memory and seems to die if you send too many windows messaging requests in a row (this is documented somewhere deep in MSDN). As an alternative, I was considering generating the Excel format using an API, such as Apache's HSSF. I was wondering if anyone had any performance numbers for a comparison between COM automation and HSSF, or general suggestions on my situation.

BTW, this is a production system that is expected to generate thousands of reports per month and will be both request and batch based. I have evaluated Crystal Reports, DataDynamics Active Reports 2, and Cognos Impromptu. These reports can be generated on any target system that will work, from MVS through Windows 2K, .NET, Java, or Linux.

I apologize. The words aren't flowing today. I blame the teeth.

Dustin Alexander
Monday, May 19, 2003

I've used HSSF a reasonably amount and I don't think it is flexible enough (YET!) to meet your current needs.  Graph support is very thin on the ground yet, though POI is fantastic and I'm eternally grateful to Andrew, Greg and the other main committers for it even though I lurk around and never post anything to the mailing lists.

Konrad
Monday, May 19, 2003

We used Excel Automation for a while, but it was slow and awkward to program. And I wouldn't even think of running multiple instances of Excel on a server.

We switched to ExcelWriter, a COM dll that runs in- process.  We use it in a desktop application, but you can get a licence to put it on a server.

It has a few quirks, but basicaly it does everyhing we have asked of it: cell formating, multiple pages, formulas, etc. It does charts too, but you will have to see if it has what you need.

We a library in Delphi to call it. Overall, we're happy.     

anon
Monday, May 19, 2003

I don't know very much about HSSF, but I have another suggestion:

Have you considered generating XML sheets using  Excel's  schema?

Using the excel's XSD, you can express everything you can put in an excel worksheet, including formatting and graphics but XML is a lot faster to generate (than using Automation) and Excel can open the xml file as if it were a .xsl sheet.

That way you would only need to generate one XSLT for each of your reports (that can be done by 'desingning' your worksheet in Excel, saving it as XML, and modifying it to insert the xslt related tags).

To run the report you would only need to output your report data as XML form your datasource and apply the specific XSLT.

Sergio
Monday, May 19, 2003

Excel is nice because you can put data into it.

However, you can automate "Microsoft graph" as a separate object. I don’t have a quick ref for that.

On the other, perhaps the Microsoft Data Analyzer is what you need?

Check out:

http://www.microsoft.com/office/dataanalyzer/default.asp

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Monday, May 19, 2003

"The only issue I have is with performance. Automation is slow as dirt when you are running thousands of reports. It's also very hard on memory and seems to die if you send too many windows messaging requests in a row"

Yup, sounds like office automation to me.

Before you get all caught up in static reports for everyone, think about a business intelligence solution.  We use DI Diver.  It has cut down on the number of reports we have had to right dramatically.

Cedric
Monday, May 19, 2003

You could look at doing this the very old way, which is to use DDE to create a conversation between the two apps.  I've used this in the past, along with some simple UI in Excel VBA to kick off the connection.

If you're patient in setting it up and testing it you can run Net DDE to have it display, report on different machines.

Simon Lucy
Monday, May 19, 2003

Haven't used HSSF. 

My info may be way obsolete.
Some years ago I did use another tool which exported in Excel format (it was a fairly sophisticated spreadsheet control from an outfit called Farpoint).  Performance was lightening quick vs painfully slow using Automation, however there wasn't an equivalent for every Automation operation you might want.  I'd expect that this is still the case with most such tools that can export to an Excel format.

If HSSF doesn't work out, it might be worth searching around on "OLE Structured Storage" which is semi self documenting - probably more so with freely available typelibs. 

I'd also look into Sergio's suggestion of using XML sheets with Excel's schema... this really strikes me as the nicest approach for future maintability and general coherence to some standard.  Sergio pls tell: "does Microsoft publish this info somewhere?".  I'm guessing you could probably also find useful stuff at Sun's OpenOffice site.

John Aitken
Monday, May 19, 2003

"Automation is slow as dirt when you are running thousands of reports. "

I have no experience with this but am quite interested. Where is the bottleneck for your automation: CPU, Disk I/O, Memory footprint -> swapping, ...?

Just me (Sir to you)
Tuesday, May 20, 2003

The bottleneck is kindof a CPU thing, I say 'kindof' because it's all task switching (process switching / threadswitching / apartment switching) overhead, maybe with some COM timeout extra-waiting built in.  Cross-process and even 'cross-apartment' marshalling can get ugly and easily overwhelm (performance-wise) the essence of what you're trying to do.

John Aitken
Tuesday, May 20, 2003

In that case running on multiple VPC instances might only make matters worse. Farming it out to a set of 300$ crunch PC's might be a cheap alternative though.

Just me (Sir to you)
Tuesday, May 20, 2003

That is an interesting idea that I had as well. We could build an automation system with a centralized controller that farms out reports to a group of machines and then aggregates their data.

In my experience, the issue with Automation's slowness is neither processor nor memory intensive, although an increase to either does see a positive result. I've tested some fairly large (10,000 row x 3000 reports) spreadsheets on everything from 400 mhz to 1.4 ghz and seen very little speed improvement. The issue seems to be something in the underlying messaging API. As someone mentioned, the overhead is incurred when you make calls that cause the system to switch processes (between your app and an out of process Automation app). This would be interesting to study on a lower level. Does making Automation calls create a thrashing situation? It seems like there is a mandatory overhead built into windows messaging. This might have something to do with the ability of earlier versions to become 'confused' when thousands of messages are sent.

Note that one simple way around automation slowness with Excel is to pass in arrays of values instead of formatting one row or range at a time. Theoretically, its possible to pass in ALL of your data in one array. This would result in one Automation call instead of thousands.

On the XML issue, does Office '97 and up support an Excel schema for xls files of which I am unaware? This needs to be compatible down the the '97 level. Or are we speaking of a style sheet for transformation between my reports and Excel data on the fly, using a document system like Cocoon?

Dustin Alexander
Tuesday, May 20, 2003

"Automation's slowness is neither processor nor memory intensive .... The issue seems to be something in the underlying messaging API."

In that case I would suggest to give it a try with a few VMWare instances on a machine with plenty of RAM.
You can always use the free trial version for this.

Also maybe

"In some cases, Automation servers (such as Microsoft Excel) will always start a new instance, even if one is already running. To remedy this and improve performance, use the GetObject( ) function instead of CreateObject( )."

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fox7help/html/dgoptimizing_activex_controls.asp

has some info that could be of use.

Just me (Sir to you)
Tuesday, May 20, 2003

Did you also consider:

"Besides the technical problems, you must also consider the feasibility of such a design with respect to licensing. Current licensing guidelines prevent Office Applications from being used on server to service client requests, unless those clients themselves have licensed copies of Office. Using server-side Automation to provide Office functionality to unlicensed workstations is not covered by the End User License Agreement (EULA)."

http://support.microsoft.com/?kbid=257757

Just me (Sir to you)
Tuesday, May 20, 2003

Complementing my last post:

Using XML, Excel's XSD and XSLT transformations, you don't even need Excel installed in the server:

XML in Excel and the Spreadsheet Component

http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xlsmlinss.asp?frame=true

XML Spreadsheet Reference

http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xmlss.asp?frame=true

Transform XML Files with XSLT When Importing into Microsoft Excel 2002

http://msdn.microsoft.com/library/en-us/dnexcl2k2/html/odc_xlflatnr.asp?frame=true

I have olnly done one report using this approach, and the main disadvantage is that Excel does not generate the XSLT sheet for you. Instead, you have to save a workseet formated as you report 'template' as XML and then you need to edit by hand the XML and add all XSLT tags.

The problem is that if you need to reopen the report in Excel to do some redesing, you have to do it with the original XML sheet and you have to generate the XSLT again from scratch.

Sergio
Tuesday, May 20, 2003

We automate Excel on the server - but do it this way: all the "automation" code is actually in an Excel VBA macro -- all we do is invoke that single macro from our (C#) code. That way, there's very little interop/marshalling going on to slow things down. And, for some of the reports, rather than passing a big object[]/variant array with the data - we get the Excel VBA macro to "pull" it in using ye olde ADO.

Seems to do the trick.

Duncan Smart
Tuesday, May 20, 2003

The slowness in automation comes from two things:

1) Cross-process marshalling overhead. You can get an order-of-magnitude slowdown from this over in-proc calls.

2) The overhead of calling the IDispatch interface. Calling via IDispatch (which is what automation is) requires table lookups, argument packing into variants, unpacking them out of variants, packing return values into variants, unpacking return values, etc ad nauseum. The tools hide the pain for the most part, but there's a lot of overhead here.

Chris Tavares
Wednesday, May 21, 2003

Response to Chris Tavares:

So long as you're not using a scripting client (which seems to be the case) you should be able to avoid the IDispatch overhead.

John Aitken
Thursday, May 22, 2003

It depends on the server. There are some automation servers that ONLY implement IDispatch. Excel used to be one of them, but I haven't checked the recent versions.

And even if you do have a vtable interface, you've still got the cross-process marshalling overhead.

Chris Tavares
Thursday, May 22, 2003

*  Recent Topics

*  Fog Creek Home