Fog Creek Software
Discussion Board




SPAM Techniques that REALLY work!!!

Do you hate spam? Sure we all do.  That's why with a simple one dollar a day....

Ok our company was just getting killed by spam.  I put in a FreeBSD sendmail box "firewall" that runs mime-defang so we now have a three layer e-mail filter.

Layer one is sendmail used spamcop to block known spammers (sometimes it blocks AOL lol)

Layer two Mime-Defang strips all virus or suspect attachement types.  These e-mail aliases are then added to the sendmail access database so they are blocked until further notice.

Layer three is spam assassin which marks high scoring messages with SPAM!! in the subject line.  Employees set up client side outlook rules to filter these to a seperate folder.

Any e-mail that scores twice the spam assassin number is rerouted to /dev/null.

I get about one piece of spam a week now.

Jon Kenoyer
Friday, November 15, 2002

How about a guesstimat of how much liegitimate mail you lose, since you clearly have no way of knowing for sure?

Stephen Jones
Friday, November 15, 2002

No complaints so far (its been six months). Believe me I would know if my bosses or fellow employees were not getting e-mail ;P

Jon Kenoyer
Friday, November 15, 2002

The two major problems with SpamCop seem to be:

1. They adopt a "guilty until proven innoent approach", so that if someone complains, they block you. If someone signs up for your mailing list and forgets, you get blocked...

2. They don't seem to have an appeals process, so if situation 1 arises, there is no way to get unblocked!

The impression I get it that it's by no means the worst of the blacklists, but it still highlights the problem of blacklisting.

Now, the baynsian filter approach looks interesting...

James

James Shields
Friday, November 15, 2002

If you haven't seen it already, this article by Paul Graham is worth a look:

http://www.paulgraham.com/spam.html

He describes a "statistical method" for identifying spam that sounds promising.  I don't know if any existing spam filters use his method, but he says his method only lets thru 5 spam messages per 1000, with zero false positives.

programmer
Friday, November 15, 2002

If you don't let someone's server send mail, how can they ever let the person in your company know that they are not getting the messages.

In fact unless you send a speciiffic message saying that the mail was blocked because it was on a spam list the sender of the server is either going to think his message got through or that it is your server that has problems.

I know this because our college server was on a spam blocking list because of an open relay, and it was months before I got a message saying why an email didn't get through, and contacted the sysadmin to do something about it. I calculated that about 3% of the messages we'd sent in the last few months hadn't got through because of that. And nobody before me had mentioned or noticed anything.

Stephen Jones
Friday, November 15, 2002

Well, if you're using SpamCop to "block known spammers," you're also blocking Joel on Software newsletter subscribers. And I assure you that I'm not a "known spammer."

Joel Spolsky
Friday, November 15, 2002

SpamNet should eventually work effectively. They just need to get an Outlook Express version out and tweak their algorithms to better avoid blocking legit email. Jon and other anti-spammers don't seem to understand that blocking legit email is a bigger issue than letting spam through.

pb
Friday, November 15, 2002

Just a note on the penny-an-email idea. It probably won't eliminate or even reduce spam very much. It will probably just raise the quality of the spam.

As proof, look at physical junk mail. It costs more than a penny to bulk-mail a piece of physical junk mail, and I still get many pieces of junk mail in my physical mail box each day.

An anonymous spam recipient
Friday, November 15, 2002

I guess its a question of fault tolerance, or what ratio of valid e-mail's are we willing to lose vs the amount of spam that is blocked.

First I should clarify I am not a system administrator but a programmer that wears many hats for a small company.  We only have 10-15 employees so my "solution" does not scale well at all.

Real system administrators with a larger user base to maintain would take a zero tolerance approach.  As in it is not acceptable for any valid e-mails to be discarded.

I am the first to admit that SpamCop is far from perfect. It's current strategy  http://spamcop.net/fom-serve/cache/297.html seems to punish valid e-mail senders if the ISP they are using has too many reported instances of originating SPAM.

So I guess a blacklist is plauged with the "Sed quis custodiet ipsos custodes" (who watches the watchman) problem.

Jon Kenoyer
Friday, November 15, 2002

postini.com's default spam tagger settings did indeed tag Joel's "whatcounts" message as spam.  I don't know exactly what triggered the tag.

But at least I was able to review the message, and add Joel to my approved list.

David Blume
Friday, November 15, 2002

I'm using PopFile these days for Bayesian filtering, but I've noticed that unless I've placed one or more specific mailing list emails within the legitimate training list they are usually tagged as spam.

Thankfully I don't automatically delete the spam and instead file it for review, but the structure of a mailing list email is much closer to spam than legitimate email from a Bayesian statistical point of view.

I'm simply classify these as false positives and once a week add all false positives to the training set. So far, only mass mailings have been falsly identified. I have a training set of over 2000 messages for both spam and legitimate mail.

The one cent an email approach while creative isn't likely. SpamCop and others like it need some sort of members system where blacklisted servers are voted for an against and members are peer reviewed in a Slashdot style approach. Otherwise, they should stick to flagging and not bouncing email.

Michael Glenn
Friday, November 15, 2002

As posted on http://slashdot.org yesterday, Mozilla's new mailer ( http://www.mozilla.org/mailnews/spam.html ) uses Paul Graham's ( http://www.paulgraham.com/spam.html , mentioned above) statistical rules for spam filtering.

brian ashe
Friday, November 15, 2002

You know, it is just sendmail's default action is to bounce the spam cop marked e-mail.  I bet with a bit of modification I could have it just mark it as SPAM!! in the subject line instead.

(Wonders off to give this a try.)

Jon Kenoyer
Friday, November 15, 2002

The Paul Gram spam method has been implemented by AutoDesk legend John Walker.  Take a look:

http://www.fourmilab.ch/annoyance-filter/

Also, how does Joel's $0.01/message delivery scheme work for mailing lists?.  Lets say the JoSW list is 1/10 the size of the SlashDot crowd - ~30K people.  Does Joel really want to spend $300 every time he sends an update?

J. Peterson
Friday, November 15, 2002

"SpamCop and others like it need some sort of members system where blacklisted servers are voted for an against and members are peer reviewed in a Slashdot style approach. "

Or a whitelist instead of a blacklist. Check out the other thread "email postage" for a discussion of bonded email. It's just a whitelist you indebt yourself to be added to.

No I have no interest in the company, just the idea. And it's better than postage for a number of reasons.

mb
Friday, November 15, 2002

It seems like large ISPs like Earthlink or AOL could employ some sort of heuristics that would sort E-Mails that were "substantially similar".  Then, someone could go through manually, look at the E-Mail and remove it from all inboxes on the ISP.

For example, AOL has 30,000 similar (or identical) Viagra messages sent to its customers.  Someone at AOL gets alerted and purges these.

Having spam removed from the inbox 15 minutes after it's received would be more acceptable than doing nothing.

Granted, it's putting the "Is this spam?" question in the hands of a stranger, but much of the spam problem is "generic spam".  Mortgages, Viagra, XXX, Nigeria, etc.  I would feel comfortable having a stranger filter out the obvious ones.

Bill Carlson
Friday, November 15, 2002

To the nobody really cares department:

Well I turned off Sendmail's RBL feature.  I now have it so that the Mime-defang filter is checking the blackhole spamcop DNS server and just inserting "SPAM!!" in the e-mail's subject line.

The employee's can then use client side filtering rules to dump all the e-mails in a folder for later perusal.

This way we aren't swamped with so much spam that e-mail is almost useless as a communication tool but no e-mail's are ever bounced.

All in all a fun little project for a saturday morning.

Jon Kenoyer
Saturday, November 16, 2002

As a major cheap filter just search on '!'
It doesn't even have to be '!!', so long as you have it as the last filter and filter on known good senders first.

I maintain my own blacklist on my own mail server I'm not happy depending on some external service to fiddle with my mail or decide what's blacklisted and what's not.

Simon Lucy
Saturday, November 16, 2002

Would there be any chance of Microsoft including some kind of spam filtering (preferably statistical) in Outlook Express?

Frederik Slijkerman
Sunday, November 17, 2002

I've had some success with the Distributed Checksum Clearinghouse, or DCC. The DCC works by taking checksums of mail as it passes through the server. It keeps track of how many times that checksum has been seen (DCC servers share counts so the system is distributed). Message with a large number of counts are either spam or mailing lists, so this method requires you to whitelist mailing lists. However, it's pretty good at spotting spam with no false positives so far. The checksums used also incorporate some "fuzzy" checksums to spot common variations on the same message.

It seems preferable to blacklists and heuristic methods for when it's important not to have any false positives (as long as you remember to whitelist those mailing lists!)

Paul Wright
Monday, November 18, 2002

I'm also having good luck with DCC and SpamAssassin.  I had to raise the SpamAssassin rating for DCC because by default with SpamAssassin 2.43, it's ridiculously low and in the 4 months I've been running it, I've never seen legitimate mail with the DCC check tag.

Chris

Chris Blaise
Monday, November 18, 2002

Is there a reason everyone pretends there's no legal options for stopping spam?

Jason McCullough
Monday, November 18, 2002

"Is there a reason everyone pretends there's no legal options for stopping spam?"

because the spammer 1) doesn't follow the law 2) probably isn't within juristriction anyway

I'd be all for banning advertising completely, across the globe!

hmm2
Tuesday, November 19, 2002

*  Recent Topics

*  Fog Creek Home