Fog Creek Software
Discussion Board

Bayes filtering, invented by...

Thomas Bayes, shorely.

First applied towards spam filtering by Paul Graham.


Rod Begbie
Sunday, November 23, 2003

Actually, this publication from Microsoft Research predates Paul Graham's "plan for spam":

Sunday, November 23, 2003

Jason Rennie published a paper on Bayesian filtering of e-mail in 1998, and has been working on his implementation of it (called ifile) since at least 1997.

Rob Mayoff
Sunday, November 23, 2003

Paul Graham himself mentions that there were others (specifically at MS) who worked on Bayesian filtering before himself.  The problem wasn't the idea -- it was the implementation.  I

Sunday, November 23, 2003

How sad that Microsoft was sitting on this idea for years, but couldn't develop a halfway decent spam filter for Outlook until Outlook 2003.

Robert Jacobson
Sunday, November 23, 2003

> The problem wasn't the idea -- it was the implementation.

It wasn't the implementation or the idea, it was a matter of visibility. His essay about Viaweb and Lisp was widely circulated some time earlier.

Bayesian learning is about 15 years old.

Sunday, November 23, 2003

Invented? No. Popularized? Yes, sure.

Popfile was also around before Graham's article, and it's still among the very best. The main feature it has over any other (except ifile I suppose) is that it's not used just for spam vs. not-spam filtering but instead it can filter anything. You want all securities-related mail in a separate bucket? Or all mailing lists in the same bucket? It handles it beautifully.

You know popfile is the best the first time you receive email from someone you've never corresponded with before, and its still filtered into the correct bucket, based solely on content. For example, signing up on some site, and receiving the password and popfile just /knows/ where I would want such mails.

Sunday, November 23, 2003

"Thomas Bayes, shorely."

Thomas Bayes died in 1762. He came up with Bayes's rule. I.e. conditional probability. He didn't invent bayesian filtering.

Sunday, November 23, 2003

It's always puzzled me why it's called Bayesian filtering when Paul Graham's article doesn't contain anything that even looks like an example of Bayes' theorem.

Sunday, November 23, 2003

"bayesian" in this sense is just an adjective that means "uses subjective probabilities".  any technique that builds off of the idea that you update the likelihood that a given statement is true (i.e. that a message is spam) based on new evidence can be said to fall under bayesianism.

Sunday, November 23, 2003

Credit correctly to go Paul Graham because after he published "A Plan for Spam" there was an explosion of open source efforts to control spam by statistical means.

Graham never claims to have invented the idea by the way:

By the way, curious that MS has a patent on some aspects of Bayesian spam filtering.

Matthew Lock
Sunday, November 23, 2003

Check citeseer -- there's a whole stack of papers that predated PG's work regarding the application of Bayesian classifiers to spam filtering.

But Graham's article certainly popularised it very nicely ;)

Justin Mason
Sunday, November 23, 2003

Probably the main factors were the interesting-ness of Graham's writing and the timeliness of him releasing the paper - just about the time when spam became a major headache.

Matthew Lock
Sunday, November 23, 2003

Don't filter spam, just legislate it away:

I'm serious...and don't call me Shirley.

Interaction Architect
Sunday, November 23, 2003

Naive Bayes for classifying text has been around for a while - I read about it in Tom Mitchell's Machine Learning textbook five years ago:

Colin Evans
Monday, November 24, 2003

*  Recent Topics

*  Fog Creek Home