Fog Creek Software
Discussion Board

Spammers are stupid

I added Bayesian spam detection to my e-mail reader a few weeks ago as I was getting fed up of all the spam I was receiving. A short-ish while later it was mentioned on Slashdot as well as on here. I am using spamassassin with kmail but I am sure my comments are generic enough to other Bayesian systems and mail readers.

With spamassassin, you have to train the filter by giving it both spam and good mail (ham) to chew on and make a database from. In this initial period only the fixed filters are operational but even so the filters were doing an excellent job with no false positives and only about 5% or so of spam getting past the filters. Once the Bayesian side had seen enough spam and ham, then it was enabled and now 100% of my spam is caught with no false positives. I do a quick look at my spam every day before consigning it to the trash, and of course sending it to the Bayesian learning program to enable it to become evern better (!!).

Coming back to my main point, the fixed filters were doing a wonderful job alone which would be enough for probably most peoples needs. So it seems to me that the reason that the fixed filters work so well is that spammers are stupid, they leave a large number of tell tales in their spam that shout out that they are spam. If intelligent people were to get involved in spamming then I think we would be in deep shit. The tell tales are not just the message contents but also the headers that they use.

Just for illustration heres the report of spamassassin from a random spam I received today. Note that even taking the Bayesian trigger out of the equation still means that is classified as spam.

pts rule name              description
---- ---------------------- --------------------------------------------------
4.1 FROM_NUM_AT_WEBMAIL    From address is webmail, but starts with a number
4.2 DATE_SPAMWARE_Y2K      Date header uses unusual Y2K formatting
0.8 PORN_16                BODY: Possible porn - nasty, dirty, little etc.
2.4 FREE_ACCESS            BODY: Contains 'free access' with capitals
0.1 HTML_FONTCOLOR_UNKNOWN BODY: HTML font color is unknown to us
0.1 HTML_MESSAGE          BODY: HTML included in message
0.3 HTML_FONT_BIG          BODY: HTML has a big font
5.4 BAYES_99              BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]
0.3 MIME_HTML_ONLY        BODY: Message only has text/html MIME parts
0.6 HTML_FONT_INVISIBLE    BODY: HTML font color is same as background
1.0 HTML_IMAGE_ONLY_04    BODY: HTML: images with 200-400 bytes of words
0.6 MIME_HTML_NO_CHARSET  RAW: Message text in HTML without charset
1.9 DATE_IN_FUTURE_03_06  Date: is 3 to 6 hours after Received: date
1.6 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no X-MimeOLE
1.0 FORGED_OUTLOOK_HTML    Outlook can't send HTML message only
1.0 FORGED_OUTLOOK_TAGS    Outlook can't send HTML in this format
1.1 MIME_HTML_ONLY_MULTI  Multipart message only has text/html MIME parts
2.6 FORGED_MUA_OUTLOOK    Forged mail pretending to be from MS Outlook
4.2 OBFUSCATING_COMMENT    HTML comments which obfuscate text
0.0 UPPERCASE_50_75        message body is 50-75% uppercas

Friday, November 28, 2003

There was a profile in the New York Times recently about some former "big-time" spammer.  The guy lived and worked in a beat-up trailer in Florida.

I think a bunch of these spammers are simply dopes trying to eek out some kind of living somehow.  Can't hold a job and Amway didn't work for them.

You know the type.

Mitch & Murray (from downtown)
Friday, November 28, 2003

We can beat spam now, but it shall always be there

A letter is an unannounced visit, the postman the agent of rude surprises. One ought to reserve an hour a week for receiving letters and afterwards take a bath.
-- Friedrich Nietzsche, 1844-1900

Friday, November 28, 2003

Quotes are ok every once in a while but after that they become hideously overused.
-- A random JOS reader

Rand() % 2
Friday, November 28, 2003

I read somewhere that a lot of SPAM and internet fraud comes out of Boca Raton, FL because that's where shops dealing in telephone-delivered securities fraud were most numerous, and the people who worked in the one field feel very comfortable with SPAM.

Most SPAMmers that I've read about seem to be seedy semi-underworld types, not exactly the sort that would belong to Jaycees and would stand up in front of a classroom and tell the kids about bright career prospects in the future... ;-)

Bored Bystander
Friday, November 28, 2003

The advantage of Bayesian filtering is that you can have keywords which indicate NON-spam. Everybody has different keywords from their life which counterindicate spam. For example very few spammers would send me a message containing the word "CityDesk" so when that appears in a message it's highly likely to be nonspam.

Joel Spolsky
Friday, November 28, 2003

I hope spammers don't read this forum... :)

van pelt
Friday, November 28, 2003

"I hope spammers don't read this forum... :)"

Don't worry, we don't.

Friday, November 28, 2003

Not all spammers are stupid. I just got a very clever spam today. Last month I met with an artist friend and he asked for my email address, I gave it to him. Now I find I am on his newsletter about his doings. Now this would not be bad even if he spammed me with information about paintings he was selling or new works he wanted to call my attention to on his website.

But instead, it starts with a paragraph about the weather at his place and then says "Speaking of the weather, I recently had an opportunity to try out the new PERMA-DUCK all weather boots, which are available at SPORTY-WORLD.COM for ONLY $49.99 this week ONLY. PermaDuck all weather boots are available in blue green and yellow and are the #1 choice of Arctic explorer Ribyn Donahue." (details changed to protect the guilty.)

The newsletter continued on page after page with a lead in sentence about him fixing his toilet or going for a walk and then some near-non-sequitor leading into a cut-and-paste ad copy that was not in his speaking or writing style for some product completely unrelated to him and his business.

Sort of reminded me of those *bad* situations when some stupid friend joins a multi-level marketing cult and then starts to capitalize on the percieved economic value of your friendship. For people like me it backfires for them since I put people who do this stuff on my blacklist and never do business with them.

Any one else seeing these sort of stealth spams?

Dennis Atkins
Friday, November 28, 2003

In Romania companies are very uneducated about the border between one-time email and spam.

One company I gave my email to at a trade show, specifically for the purpose of getting some product information for a PDA, then continued spamming me with weird price updates, huge XLS files with pries for wires, connectors etc.

One time they sent a mass mail announcing how Mr. X and Mr. Y were retiring from their company and they will all miss them.

I fired off an angry email, apparently they removed me. But the thing is, your average just-graduated manager walks into a bookstore, looks up the business section and sees this wonderful book about "email marketing". I browsed through the book myself, it has things like "collect email addresses wherever you can, add them to your list of 'potential clients'." So '90s.

Friday, November 28, 2003

The creepiest spam I ever got was this:

I received a birthday greeting on my birthday.  The spam message was promoting some kind of "" clone site.  The website was registered in the Phillippines.

I have no idea how they found out my actual birthday.

Alex Chernavsky
Saturday, November 29, 2003

Well you were born in 1965 or 1966 and that's as far as I got so I have no idea either. Did you ever give your real birthday when signing up for yahoo or such?

Privacy Maven
Saturday, November 29, 2003

It's not Oct. 30, 1966, is it?

Privacy Maven
Saturday, November 29, 2003

Sorry, it's December 22, 1965.

Privacy Maven
Saturday, November 29, 2003

Er, no.  None of the above.

Alex Chernavsky
Saturday, November 29, 2003

I think some spammers work like sleeper cells, they may have been infiltrators who have entered large enterprise web farms and made off with lots of personal profiles. The first profile people give away freely is often somewhat nonconsequentials ones like birthday. That's often the very first thing asked in BBS back in the days and more frequently asked by any service restricted to people over 18 (due to violent content or content of an adult nature). And people fork it over pretty freely. Another thing is first name last name. A sleeper cell hacker group could go for years not really sure what to make of this information, until they get hit with the spammer bug, and start spamming people freely.

Without coming up with such conspiracies, the average joe data administrator can walk off with millions of personal profiles from any of the mass online marketing clearing houses on a USB memory stick.

Li-fan Chen
Sunday, November 30, 2003


keep up posting the quotes. Ignore the maroon.

Monday, December 1, 2003

Ignore the mauve and the aquamarine too.

Breandán Dalton
Friday, December 5, 2003


Saturday, February 28, 2004

*  Recent Topics

*  Fog Creek Home