Fog Creek Software
Discussion Board

Syndication & Scaling

there is a lot being said about syndication & scaling on various blogs (Chad Dickerson, Sam Ruby etc).

There are some things I wanted to mention/ discus that might help in this problem.

Currently the biggest problem there exists is every aggregator trying to gather feeds on the top of the hour, every hour (3 hours by default if you use FeedReader).

One solution to this might be dividing an hour based on the timezones and your dektop aggregator can check the time-zone you use and set the refresh rate to x minutes after the hour. Would this be a good idea?

The other way to do this would be for the feed producer to set a default refresh rate depending on the frequency with which they publish articles.

Some thoughts from the top of my head, what do you think?

Prakash S
Wednesday, July 21, 2004

Both publishers and aggregator writers and users have responsibilities here. 

Publishers need to enable robots.txt to prevent unwanted spidering, and they need to make use of HTTP codes (400, 410, etc).  They also need to enable Gzip.

Aggregator writers need to respect robots.txt and HTTP codes (specifically 304 Not Modified).  They need to handle Gzipped files.

Users should adjust their aggregator pulls to run as often as the sites they visit tend to update.

It doesn't look like this is too far off of how one should serve, fetch, read HTML.

Wednesday, July 21, 2004

*  Recent Topics

*  Fog Creek Home