Fog Creek Software
Discussion Board




HTTP Access logging

Quick question. 

Do any of you use a centralized server for HTTP access logging? 

If you do, what web server do you use, and what logging protocol?  For example are you using syslog?

The systems I've worked on don't have a lot of HTTP origin servers.  They primarily scale behind the HTTP server, so access logging is still centralized. 

TIA...

christopher (baus.net)
Thursday, July 29, 2004

My company uses squid.  It runs on NT and authenticates with our NT domain.  Much nicer than the msproxy junk of years gone by.  We disable downloading a lot of nasty file types, which does help circumvent some of the bho/adware/spyware crap.

Plus when IE tells users there is a new version available, they can't get it.  (Hint:  Microsoft change this brain addled misfeature.  Any company with any sense locks ordinary users down - no admin rights.  But what does IE do, proceeds to download and start installing but lo and behold it needs admin, but doesn't figure this out before fsking up the system but good.)

Mike
Thursday, July 29, 2004

Maybe I should clarify my question.  When I say HTTP access.  I mean HTTP server access. 

Apache writes to a file by default for performance reasons, but this is really inconvienent if you have, say, 50 servers. 

Logging is a really strange problem.  At first glance it seems so simple, but the more you investigate the problem, the more complicated it gets.  It is really easy for the logger to be the bottleneck of the system. 

christopher (baus.net)
Thursday, July 29, 2004

well.....actually, you can use Squid for this too. I have a couple Squid servers (running on Linux) configured in a reverse-proxy fashion in front of 8 content servers for a few busy sites. Squid provides load balancing and caching and that's where all my server access logs are generated.  The logs are then processed on another dedicated server which does nothing but crunch the logs and produce pretty reports for the clients.

Jerry
Thursday, July 29, 2004

I forgot to mention that my logs are all stored directly in a MySQL database which is running on yet another small cluster of servers.  In total, I process logs for around 40 Apache servers and hundreds of sites.....and it WAS the bottleneck until I got all the scripts dialed in to the point where it's a totally automated system. Adding a new site and having it magically appear in the logging system requires only the correct CustomLog directive in the Apache config file.

Jerry
Thursday, July 29, 2004

Is this what you are using?

http://sourceforge.net/projects/squidlogger2sql/

I'm implementing something similar to squid, with a different focus.  Right now the logger is the bottleneck, and it is causing some problems since the server is single threaded. 

What happens is under load the syslog() call blocks, which is really bad.  Instead of servicing another request which isn't logging, the server blocks. 

I am tempted to write my own syslog client, but that seems like a lot of work.

christopher (baus.net)
Thursday, July 29, 2004

We're using a setup very similar to this: http://www.linux-mag.com/2003-08/lamp_01.html with numerous little tweaks and enhancements (mostly small custom devleped helper scripts and applications).

I would also point out that the 2 Squid boxes I referred to earlier have lots of RAM and very fast drive sub-systems and that everything in our racks is plugged into gigabit ethernet switching.

We keep a 100 Mbit pipe steadily saturated at around half of its capacity and 95% of that traffic is being delivered from database intensive web-applications (more than 5000 queries per second being handled by the various database servers).

Jerry
Thursday, July 29, 2004

Jerry, sounds like a king of porn setup.

Bicuspix
Friday, July 30, 2004

We use webtrends smart source data collector. Admin wanted to get away from iis logs.

Pro: one box for logging, has some pretty decent reporting and data collection.

Con: to implement it, one has to add code to every page (since we tend to use common headers (top left) and footers (bottom right), that reduces the time to implement it by a lot), and instead of linking directly to PDFs, you have to have an intermediate/redirect page.

Peter
Friday, July 30, 2004

Webtrends takes a much more evasive approach.  They certainly get get a lot more information with their approach.  I suspect some sites use server side access logging in combination with something like webtrends.

christopher (baus.net)
Friday, July 30, 2004

*  Recent Topics

*  Fog Creek Home