Fog Creek Software
Discussion Board




An end to blog spam

Spammers post their links to www.freeviagra.com on a million blogs and Google suddenly concludes that a million sites point to this mega-popular website.

In other words, the PageRank gets tricked. What all bloggers should do is agree on some made-up tag to essentially "instruct" google to ignore a link and not count it towards its pagerank.

So the html would look like this

  <nopagerank>

  Wow! I found this cool website! Check it out! www.freeviagra.com

  </nopagerank>

Browsers will automatically skip over the unknown tag, so no harm done. Eventually Google will acknowledge the existence of this tag and tweak its googlebot accordingly.

Choke on it spammers.

Alex
Friday, November 21, 2003

Too human-intensive, I think. Got a way to automate it?


Friday, November 21, 2003

The blog software will automate it.

.
Friday, November 21, 2003

But the nopagerank tag could be used by web designers on links other than comment spam.  No links from my site to my competitors.  Which means the google results get biased, a slippery slope...


Friday, November 21, 2003

Alex, you can't use a tag like that because people could use it everywhere and basically it bans google from taking an interest in indexing your page. I don't expect little web publishers to ever want to ban google from indexing, but a major media company would probably use it to ban all indexing. And with a tag like this you can't assume it works unless google respect the tag everywhere, you can't selectively observe it.

Li-fan Chen
Friday, November 21, 2003

Why would anyone in a real world scenario not want the links on their site indexed?

Give me examples.

Alex
Friday, November 21, 2003

It's a good suggestion, I think. It's a way to tell google that even though it _does_ index this page, a part of the content is actually not controlled (in a sense) by the site's owner, and should be considered less vital in determining the popularity of the _pointed to_ sites.

It has nothing to do with banning indexing in the first place; Anyone who wants to do that already does that through robots.txt or filtering the useragent "GoogleScout".

This achieves exactly what Joel achieves by redirecting the links back to fogcreek, and only then redirecting back to the real site -- except that, if Google honours this tagt, it'll be a matter of updating the page templates for most message boards, which is signifcantly simpler and easily done by many people.

Ori Berger
Friday, November 21, 2003

I want a phone that's a phone, you know?

Something with buttons my fingers can push without feeling clumsy, something I can read at a regular distance.  Something that's nicely illuminated but for the purpose of legibility not a beacon to every mugger within a hundred yards.

I want a phone that has a battery that doesn't die ever more rapidly.

I don't want to surf web sites, get news, pay for someone elses adverts nor send incomprehensible messages with all the vowels missing or replaced by numbers.

I just want a phone.

Simon Lucy
Friday, November 21, 2003

Ok, that was bizarre.  I'll let people guess where it was supposed to be posted. 

Perhaps what I really want is a brain.

Simon Lucy
Friday, November 21, 2003

Actually you could set up a MT plugin to  expand the characters in the site.  I'd imagine Google would misinterpret the link but the rendered site would still work.

Its much like the email hiding by expanding the email address' characters.  You could set the plugin/code to only do this on comment area submissions.

But someone would have to verify that Google would indeed misinterpret the results it read.  Last I heard the googlebot just scanned the source file and not a rendered equivalent, so it should work.

Lou
Friday, November 21, 2003

Simon, I'm not going to guess, but I do know what you mean. I'm damn sure there is a market for a neat looking mobile that makes and recieves calls and... that's it. Maybe a phonebook so that I can keep track of everyone's numbers, but that's all.

Less is more
Friday, November 21, 2003

This is something Google needs to fix on their own, not through telling bloggers to add some new tag.  How arrogant would that be...

chris
Friday, November 21, 2003

I don't understand.  If the blogger is putting that comment on their site, presumably they're doing it because they WANT google to pick it up, so what's the blogger's incentive to use your fancy new tag?

Foolish Jordan
Friday, November 21, 2003

the blogger (the person writing the blog) is being attacked and their site is being abused to generate rankings for other sites the blogger wants nothing to do with.

so they want to add 'nofollowlinks' (yes, there is ALREADY a tag for this, META NAME="ROBOTS" CONTENT=" NOFOLLOW") to their comment template, thus instructing the robot not to follow links. of course, a robot might not follow it.

what's new in this particular suggestion is that this be made a 'nesting' tag, so particular links can be marked as nofollow. this might be useful if comments and regular entries are mixed together on the same page.

or you can just make sure comments don't allow live URLs. as happened here for a while--bad for an active message board, but good for rarely used comments.

mb
Friday, November 21, 2003

I thought this was an interesting anti-spam measure.

http://www.hutteman.com/weblog/2003/11/20-144.html

Althought, it would require a coordinated effort among the bloggers (the downfall I think), it's a different way of looking at the battle.

shiggins
Friday, November 21, 2003

I am sorry, but I don't see how this is a good idea.

I mean, if I post something saying 'hey this is a good site' then it should affect the google page rank, because I have obviously thought it important enough to share with my colleagues, which is what pagerank is all about.

so 100s of people post a link to www.viagra.com, well then there must be something special at that site.

Aussie Chick
Friday, November 21, 2003

what's not a good idea?
page rank?
comment spam?
trying to redirect comment spam?
trying to eliminate comment spam?
blogging?
linking?

mb
Friday, November 21, 2003

Chick,

Please tell me you are being sarcastic.  How is it "ok" to utilize others sites for your advertising?

shiggins
Friday, November 21, 2003

Actually I find Alex proposal quite smart. Easy. Simple. Yet efficient.

And to those who find such a measure too intrusive: C'mon, it's an optional thing, no one would force you to use it. It will do no harm to user agents who do not recognize it, too.

To those who find the measure too complicated: I guess Google will introduce such features in the near future anyway. People use standardized tags for browser clients, why not introduce a standard so search engines know how to treat your pages? As long as it's free, i.e. not limited to Google, I think it's just another cool extension, probably in the form of a custom namespace.

Still not convinced? People use -moz-whatever in their stylesheets, and no one bothers (actually, it's standards-compliant, too).

Johnny Bravo
Friday, November 21, 2003

1. No one will use it, because
2. No one cares.

Matt
Friday, November 21, 2003

Very interesting idea. Somehow along the lines of Berners-Lee's Semantic Web -- each piece of information (in this case: site content, user comments on a page) is described with a characterization, which makes it easier for agents/search engines to separate between the different chunks of information, and handle it differently.

I am shure we will see some of this in future markup languages.

(For an introduction to the Semantic Web concept see e.g. http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 )

Martin Dittus
Friday, November 21, 2003

Sorry, first I was refering to the original poster, and secondly, I don't agree with advertising this way etc.

What I am refering to is when find a neat site, and post it here because I know (well am pretty sure) that you lot will find it interesting. I think these links deserve to effect the google page rank.

I stuck with the www.viagra.com example because it had been used. My assumption being that it was 100s of genuine people interested in viagra who has said to their mates, hey 'check out this great viagra site...'

Aussie Chick
Friday, November 21, 2003

I generally agree with you. But the problem is that the Internet has a different content structure from when PageRank was designed: today we have more sites where users can add content (comments), which can also include links.

This makes the evaluation of a link more difficult: PageRank builds on the assumption that all links on a site are to be treated the same (at least I assume that), so links in user comments get rated with the PageRank of the site on which they are posted; but the author of the comment is normally not the author of the site. Weblog spam abuses this assumption.

It's hard to automatically distinguish valid links from spam, and you are right in that the existence of website spam should not affect the "value" of user comments in terms of search engine relevance. But by comparing the ratio of user-provided links to author-provided links one could make assumptions about their importance: if a given site has been linked in n % more user comments than website articles this can either mean the links are spam, or that maybe there is a new trend which has not yet caught the attention of content authors. An analysis of the surrounding text might help here (i.e. do all coments look alike? -> probably spam).

When evaluating a website, search engines treat some of the elements of the document different than normal text (page title, meta tags, headlines, ...). I think it would make sense to add means of distinguishing between site owner/site user content.

More ideas? Interesting discussion!

Martin Dittus
Friday, November 21, 2003

To Aussie Chick...

When you post an interesting link to a blog, everyone *will* be able to see it, click on it, etc. That's the point of sharing in a community.

BUT it won't ATIFICIALLY INFLATE THAT SITE'S RANKING IN GOOGLE.

Which is what spammers exploit.

Alex
Friday, November 21, 2003

Aussie Chick,

any market mechanic that can be exploited in a way that effort < revenues WILL be exploited. Although I must admit I find your naivity charming.

Johnny Bravo
Saturday, November 22, 2003

99.99% of blogs ARE spam, God, who wants to read this

November 20 2003
Woke up today and scratched my scrotum

November 21 2003
Scrotum is still itchy, I scratched it again. By the way did I talk about me yet?

November 22 2003
Posted to this cool forum http://discuss.fogcreek.com/ then I scratched my scrotum

That's all for today, more of the same tomorrow

Marx
Saturday, November 22, 2003

While I admit that blog spam is a pain, and in general I feel that my state should open a hunting season on spammers, I don't think that automation is the key.  As a blog owner, it seems like there's some responsibility to maintain the site.  If my site dedicated to personal scratching accepts comments, and there are some comments that I think need to get the boot, then I should get in there and put the boots to them.  If I'm not interested in taking the time to moderate the discussion on my site, I should probably discontinue the discussion portion of the site.

Clay Dowling
Saturday, November 22, 2003

I pretty much agree. Maintaining a clean blog should be every blogger's responsibility.

Maybe this tag would come in handy when the sheer volume of blog comments grows to become unmanageable.

Alex
Sunday, November 23, 2003

Johnny,

My naevity, yes well I will admit to that!
But I do know that anything that can be expolited will be exploited, this is the sad case of the world. Where there is genuine need, there are people who will take just because they can.

Aussie Chick
Sunday, November 23, 2003

*  Recent Topics

*  Fog Creek Home