MS Word .doc word counting
ms-word stores the results of the last Word Count executed on the document in a field of the document header itself. If this is what you want , then you can pick it up from that field. Otherwise you will have to parse the file and do the counting yourself :-) not an easy job given that the format is binary and not really in public domain. I don't think the real word count is stored in the document .
Uh , oh ! it means that the word count is not updated everytime the document is edited . It's just the result of the last Word Count execution by the user through the UI.
You can use Word's COM interface to get the word count of a document using most languages. Here's an example using vbscript that displays the word count of c:\test.doc in a message box
There are a number of packages that let you access COM from Java. Here's one from microsoft's site: http://www.microsoft.com/java/resource/java_com2.htm.
There also exists an Apache project named POI.
I was recently looking the feasibility of rolling a VB app that used Word's COM interface into a DHTML app. But, our web master didn't think it would work without installing Word on the IIS server, which he didn't want to do.
Big thanks to everyone! Now I at least have a clue!
More packages listed at http://www.geocities.com/marcoschmidt.geo/java-libraries-word.html
Fog Creek Home