
|
search engine to be built into J2EE based product
Our company has a web based J2EE product, where we need to incorporate document search facilities.Currently there is a basic text based search of the database, our database contains documents in Word,PDF, etc.., which needs to be searched.
I guess that would require a search engine, the contraints are that it should search close to 200 known document formats, as well as can be integrated completely into our product ( no .exe's and such). The documents are resident inside the database, and not on disk, or URL'd.
I have looked at some of the things available out there from the google toolbar to Jakarta Lucene. If necessary we can do some coding, but also don't mind a straight solution.
It should be a be able to be bundled into the product which uses J2EE and run on a windows machine.
Anon
Wednesday, April 14, 2004
That's all very interesting, thanks for sharing. Oh wait...you meant to include a question about it all?
Sorry, Couldn't Resist
Wednesday, April 14, 2004
Yeah! forgot to ask the question, has anyone done something similar before?
Anon
Wednesday, April 14, 2004
I believe the most widely used solution to index binary formats is Oracle Intermedia, which only works on top of the Oracle database.
But since I guess you're searching for a self-contained solution, my only bet would be Jakarta Lucene, plus some coding and third party tools integration to build the filters to convert from binary formats to text. I don't know if this converters are currently being provided by Lucene -last time I looked at this package they weren't-.
I'm also interested in a solution that can be easily packaged in a shrinkwrap software, and handles the most popular file formats (MS Office, PDF, ...).
Z
Wednesday, April 14, 2004
Does google licence any of their PDF/PPT indexing technology?
I know they do the google appliance, but maybe they also do some sort of binary only licence?
Chris Ormerod
Thursday, April 15, 2004
Recent Topics
Fog Creek Home
|