Fog Creek Software
Discussion Board




Knowledge Base
Documentation
Terry's Tips
Darren's Tips

Warning! Content Management Systems Can Damage Sea



Warning! Content Management Systems Can Damage Search Engine Positioning


http://www.searchenginewatch.com/searchday/article.php/2221731

Just found this. May be of interest to us.

John Cesta
Monday, June 16, 2003

It has to do with dynamically generated websites where the URL is unique based on how you got to any given page.

/12434,,sdfs35,4657 if you came from one direction, and /,13434,,458f,se,456, if you came from another place. Amazon is a good example of this. Mapquest

CityDesk - thankfully - generates static pages that aren't aware of how you got there and don't change themselves based on how you got there.

It also doesn't require a lot of server processing to make the pages, so even a large site can sit happily on a less powerful server.

This is a known problem amongst webmasters.

Amazon does this, a lot of message boards do this and it's a constant battle to remove the session ID tags that prevent search engines from spidering the site. They do it so they don't need cookies to keep track of you as you go through the site and can keep you logged in. Mapblast (now MSN Maps and Directions) has an interesting way of battling this - they put the session ID in paranthasis (). Another way is to create a 2nd static version of your website with static URL's and limited functionality for search engines. Another is to remove session ID's and other unique identifiers based on the browser = if the browser is google, generate static URL's.

Joel on Software forum also suffers from this - by appending the &ixReplies=0 on the end, each time a search engine visits the home page, it sees completely different links. Eventually it realizes spidering the site is a hopeless endeavor and gives up.

www.MarkTAW.com
Monday, June 16, 2003

FYI, Google happily spiders my site and I use CityDesk to maintain it.

http://www.google.com/search?q=mark+wieczorek #1

http://www.google.com/search?q=pettycoat #1

http://www.google.com/search?q=build+a+home+studio #3

So does Joel on Software

http://www.google.com/search?q=joel+spolsky #1

www.MarkTAW.com
Monday, June 16, 2003

The session ID in parentheses is just an indicator that the site is using ASP.NET's "cookieless cookies" for session persistence. You can see the same thing happen at my wife's site, https://www.quiltindex.com/MallCrawl/welcome.aspx . This is to get around people's paranoia over temporary cookies, not particulary a search-engine optimization.

Mike Gunderloy
Monday, June 16, 2003

I'd only read a bit about it and only skimmed what I did read, but I thought I'd heard that there was a reason it was in paranthases.

www.MarkTAW.com
Monday, June 16, 2003



"Just found this. May be of interest to us."

Not implying that CD has a problem. Just may be of general interest. ;)

John

John Cesta
Tuesday, June 17, 2003

I thought the cropped title was more interesting - "Warning! Content Management Systems Can Damage Sea"

John Topley (www.johntopley.com)
Tuesday, June 17, 2003

Mark -- why would Google give up just because the content appears to be fresh? i ask because i was thinking of using this scheme myself for a message board project...

Ryan Tate
Thursday, June 19, 2003

A couple of message boards like this aready exist, one in ASP and one in PHP, you may want to look at them, unless it's purely for the experience of coding it.

Google decides that your site is infinately large becuase every time it comes back it's completely different, has 100% more pages (same old pages, new links), and all the old pages are gone.

http://www.google.com/webmasters/guidelines.html

"Allow search bots to crawl your sites without session ID's or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page."

"If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site."

Someone came up with a technique that checks for the browser agent, and if it's google, turns off all extras and just gives straight links, but I don't know yet if it works, and you'd have to add other search engines. Other people create static link versions of the site that's basically identical, but missing the ever-changing part of the URLs.

Maybe the best alternative is to turn it on with a cookie. Just set a cookie, and if they accept it, turn on the ixPosts=xx. If they don't, leave it off.

www.MarkTAW.com
Thursday, June 19, 2003

errr.. ixReplies

www.MarkTAW.com
Thursday, June 19, 2003

*  Recent Topics

*  Fog Creek Home