Fog Creek Software
Discussion Board




Intranet needs a search function

My company's Intranet is a home grown .NET based intranet with very granular security (i.e. security can be applied to individual links, documents, menu options, etc).  Of course it is all database driven, with very little static content.

There is no search function, consequently finding things is a nightmare.  We want to add a search function, however we don't want people to see summaries or get hits on things they are not able to access.

How best to implement ?  The two options I've looked at are:

1) Database search - Write search functionality that is fully cognizant of the security system.  This search functionality would be non-crawler (i.e it would work by searching the database tables directly).
  Advantage: Would understand security scheme.
  Disadvantage: A lot of code to write.  How would it search attachments or linked items (.doc, .ppt, etc) not in the database?  Has to be aware of database entities such as : announcements, discussions, calendar entries, (and more as time goes on).

2) Web Crawler - Roll our own web crawler that would crawl all the pages and be cognizant of META tags indicating security requirements.
  Advantage: Could crawl dynamic pages, as well as pages including content (.doc, .pdf, .ppt)  linked on shared drives, etc. 
Disadvantage: I'd rather not re-invent the wheel nor do I have the expertise to write crawler, index, boolean search parsing,etc.

I've looked at off the shelf stuff such as Google Search Appliance (Wouldn't support our security scheme) and Ultraseek (Again, wouldn't support our security scheme).

Maybe opensource? 

Any input is appreciated.

SearchlessinMaitland
Tuesday, March 30, 2004

You may want to check out htdig

http://www.htdig.org/

It's an open-source web crawler,indexer, and search engine.  You can configure the settings to conform to your security setup (url filtering) or create multiple indexer databases (one for each security level or area).

It does boolean expression searching as well as file system searches.

lumberjack
Tuesday, March 30, 2004

The Windows Indexing service has already defined filters so that it can do searches inside word docs, ppts, etc. The API is documented; you could use them to do the actual search through documents once you've figured out which documents are legal to search.

Chris Tavares
Tuesday, March 30, 2004

SharePoint. :-D

[Disclaimer: I work for Microsoft]
Philo

Philo
Tuesday, March 30, 2004

Try MondoSearch - they specialize in search technology for MS CMS, and the likes.  I've used it and their .NET API allows you to customize search results based on security restrictions, etc. 

www.mondosoft.com

GiorgioG
Tuesday, March 30, 2004

Take a look here: www.searchtools.com

DogCat
Tuesday, March 30, 2004

Thanks all.

I've got an eval of Sharepoint lined up,
I've got an eval of MondoSearch lined up as well (that one sounds really promising -thanks),
and I'm checking out searchtools.com.

The HTDig was interesting, but my initial take was that integrating with our complex security scheme would be a bitch.  Gotta like the price though ;)

Thanks again.

SearchlessinMaitland
Wednesday, March 31, 2004

*  Recent Topics

*  Fog Creek Home