Fog Creek Software
Discussion Board




RSS Stuff

First of all, I imagine a big reason why NetNewsLite is more popular than Radio is that it doesn't cost any money, where as Radio costs $40 a year (granted, Radio is much more than a news aggregator).

Second, regarding RSS aggregators pounding your site. I'm one of those people who subscribe to your RSS feed; thanks for providing it. My aggregator (AmphetaDesk see: http://www.disobey.com/amphetadesk/ ) defaults to 180 minutes between fetches. I left it at that setting. There's no way to specify in your RSS feed itself how often aggregators should check. Dave Winer and others have done some work on publish/subscribe systems, like weblogs.com, where the updated weblog pings the server to tell the server it's been updated. If you need to reduce the bandwidth that the RSS aggregators are using, you can either do as you suggested and only give out a headline (many sites, especially news sites, do this). I would also recommend reducing the number of stories in your feed. You've currently got 10. You could easily reduce that to 3 or 4 and satisfy most news junkies.

Luke Francl
Saturday, October 19, 2002

Thanks Luke! I'll try reducing it to 3 items.

Joel Spolsky
Saturday, October 19, 2002

Also, you might consider setting expires headers in the HTTP response.  This allows proxies to cache your content, and therefore serve requestors without placing load on your server.

The downside is slightly less control over what's delivered to the requestor (since you might make a change before the cache expires), and slightly less accurate statistics in your web server logs.

Anyway..

The HTTP 1.1 RFC covers this (http://www.ietf.org/rfc/rfc2068.txt?number=2068)

Specifically, page 101 is the beginning of the functionality you're after.

HTH.

Jeremy Dunck
Sunday, October 20, 2002

Does anyone know of an online tutorial for making a web-based RSS aggregator using ASP?

Chi Lambda
Sunday, October 20, 2002

As yet another idea re: handling the RSS issue, I wonder if RSS 2.0 is extensible enough to include a "refreshFrequency" tag. This would allow the RSS author to embed a directive to RSS-gathering applications.  That tag could either be a number (in hours?), or a semantic value ("hourly," "daily," "weekly"), but preferably only one type or the other. You are in a better position to suggest this to Brent and Dave than I am, I believe.

Personally, I enjoy not having to jump from my newsreader to the WWW to read the content on your site, but what you do with your RSS feed is up to you.

Matt Jadud
Sunday, October 20, 2002

As explained in the paragraph "ETag" at http://bitworking.org/2002/06/02.html , an RSS aggregator does not need to request the entire RSS file on every check.

This reduces the waste of bandwith and speeds up the RSS update checks.

An example of an aggregator that supports this feature is Aggie.

Bernard Vander Beken
Sunday, October 20, 2002

refreshFrequency: turns out that was actually one of the things Dave was thinking about when he added the ttl element, even though the spec only mentions using it for Gnutella.

Cutting back: don't know how much control over the RSS you have (well, without rewriting the routine that generates it), but someone in my comments suggested full content for the first few items, and then just title or title/short description for the rest. That would better serve people who are adding your feed from somewhere other than your website, where they haven't seen the older items, than just having two or three items.

Also from my comments ( http://philringnalda.com/archives/002359.php#comments ), Morbus (the author of AmphetaDesk) is curious about just which UAs are the ones giving you trouble by asking for too much too often.

Phil Ringnalda
Sunday, October 20, 2002

The HTTP HEAD method seems like the right way to go, in fact, it already works for me as a publisher because my site is served by a traditional web server out of static files.

Looking at my logs, it's pretty clear that Amphetadesk is already doing this and saving about 1/2 a meg a day per subscriber as a result. Radio, NetNewsWire, and "Mozilla/3.0+(compatible)" (whoever that is) are not.

Joel Spolsky
Sunday, October 20, 2002

You would save on bandwidth if RSS aggregators made conditional rather than unconditional GET requests.  This is as simple as them adding an If-Modified-Since: header to the GET request and handling both 200 OK and 304 Not Modified responses. 

This means that RSS feeds will need to respond to the conditional GET properly, but it should be easy enough to do. 

Frank Leahy
Sunday, October 20, 2002

I'm surprised (well, maybe not) that Radio aggregators are hammering your RSS file. A fix using ETag was discussed back in May was brought to the notice of at least one of their developers:

http://www.pocketsoap.com/weblog/stories/2002/05/19/bdgToEtags.html

I assume your logs distinguish between requests which return the file and those which return the 304 status?

Charles Cook
Sunday, October 20, 2002

Yeah, I was looking at the total bytes

Joel Spolsky
Sunday, October 20, 2002

"Mozilla/3.0+(compatible)" is FeedReader -- http://www.feedreader.com/ -- and is the aggregator I use.  Buggy as all hell, but the closest I've found to NetNewsWire on the PC.

I've already opened a bug on their site asking them to implement a genuine User Agent.

rOD.

rOD
Monday, October 21, 2002

aha!

thanks.

Joel Spolsky
Monday, October 21, 2002

Using conditional GETis the right way to go, not HEAD.
Setting the expiratation tag is the right way to tell the client not to do another get for a specified period of time.

These ideas are built into HTTP and will tell proxies all the way down the line what to do with your data. You could have 1000's of subscribers and only 1 will hit your server.
If you're not using HTTP, then the old META HTTP-EQUIV tag comes up, a kludge but functional.

You got it, these ideas aren't new and don't need to be re-solved a different way. In fact, I'm surprised this has come up at all. Based on Dave Winer's quick poll, the servers were already working right, though probably not sending out the predicitive (expiry) info. Do people write their own HTTP stacks on the client? I sure hope avaiable ones have caches and use if-modified-since.

mb
Monday, October 21, 2002

The expires header is certainly the elegant favorite here.  This brings up an important point - when designing a piece of software that relies on existing standards, it never hurts to actually browse the documentation. It's so easy to get lost in the trees despite the forest around you. :)

As far as boring us to death - trust me - this topic is oodles more exciting for me than discussions of the latest translation of your book.

Brent Rockwood
Tuesday, October 22, 2002

The conditional stuff looks and will give most gain.
However, the other HTTP feature is also worth mentioning: compressing the output.
This will cost server CPU power, but gains some network bandwidth

Adriaan van den Brand
Tuesday, October 22, 2002

In my blog http://epeus.blogspot.com/2002_10_01_epeus_archive.html#85582608 today I suggested a couple of answers:

As others have said, adopting HTTP's "If-modified-since" timestamp fetch can help here, by only doing a full-page fetch when the RSS has changed. In addition, adopting RFC 3299's way of only sending changes will help reduce the bandwidth of the RSS fetches (I mentioned this back in January when it first came out http://rfc3229.x42.com ).

However, this doesn't reduce the number of HTTP setup/teardowns. To do this, the aggregators need to get smarter. They can do this by estimating an update frequency for each feed - something modelled on TCP's congestion control (exponential back-off, with 'no change' treated as congestion) would probably suit well.
If the aggregator polls the feed, and finds no changes, it doubles the polling interval. If it polls and finds changes, it decrements the polling interval by the number of changes found multiplied by the overall polling frequency. The lower bound is the maximum polling frequency set by the user (once an hour is common). You could set an upper bound, or let it establish itself which blogs are moribund.

Kevin Marks
Tuesday, October 22, 2002

*  Recent Topics

*  Fog Creek Home