Fog Creek Software
g
Discussion Board




Reading Word Files as OLE Compound Documents

Can anyone guide me to some material regarding using OLE to access the contents of Office files?

I am considering reading the Jakarta project's POIFS code, but I am coding in VC++ (un-managed).

Thanks!

Scott Rogers
Wednesday, April 28, 2004

Coincidentally, I just wrote a whole pile of code to do exactly that.  Let me share with you how I did it:

1) Go to www.google.com.
2) In the little box, type "IStorage MSDN"
3) Click "I Feel Lucky".
4) Read ALL the documentation.  And I do mean ALL of it.  There are plenty of "gotchas" in the structured storage code.

I highly recommend using C++.  Until the Longhorn APIs come along, using the structured storage interfaces from managed code is going to be a pain in the rear.  I can tell you that from bitter experience; the best mistakes to learn from are other people's.

Also, I'm curious to know why you are digging into the Word file format.  (I'm doing it because I'm adding a whole tonne of features to Word and Excel programmability.)  What are you doing that you care about the storages?  Believe me, there's not much in there that's understandable.  Cracking open the storage is trivial -- making sense of the streams is not.

Eric Lippert
Wednesday, April 28, 2004

I am investigating producing Word files with embedded links to other Word files based upon semantic analysis.

Scott Rogers
Thursday, April 29, 2004

Scott, I tried to reply to your email but it bounced.

What I don't understand is why you are trying to do so by reverse-engineering the Word file format instead of using the existing Word object model. 

Does the Document.HyperLinks.Add method not do exactly that?  Or am I misunderstanding you?

Eric Lippert
Thursday, April 29, 2004

*  Recent Topics

*  Fog Creek Home