Fog Creek Software
Discussion Board




Knowledge Base
Documentation
Terry's Tips
Darren's Tips

google problem for citydesk

I got this response to an enquiry to Google (regarding the News, not the general search). It was to do with my ITKiwi.com news website. Any thoughts on how to get numbers automatically into the URLs.
-----------------------------------
Hello Richard,

The problem we are finding with your url's is that there is no number associated with them. Our webcrawlers need to see a large section of numbers in the articles url's so it knows that it is a article and not a section. On the example we provided you http://www.easternecho.com/news/news.html, notice how the articles have a set of numbers for each url? We recomend the numbers are at least 6 characters long. Please let us know if you have any additional questions.

Regards,
The Google Team
------------------------------------

Richard Wood
Tuesday, August 12, 2003

To CityDesk oldtimers: Bring Back the fog0000000url's?

tk
Tuesday, August 12, 2003

Couldn't you just start each article's name with the date in YYYYMMDD format? e.g. "20030813 NZ Tech Workers Are Revolting"

That way, it'd have numbers in its URL to keep Google happy, plus readers would be able to see at a glance how old an article is just from the URL. This is handy when people are emailing URLs to friends etc.

It might even be handy for you in maintaining your CityDesk site. A single glance down the article list would show you how long since you posted an article in a particular category, etc.

Darren Collins
Tuesday, August 12, 2003

I'm having a hard time believing that google is requiring numbers in URLs, because 1) a good many sites don't use numbers in URLS and 2) pages in site with no numbers in the URLS have top ten rankings in google.

David Burch
Tuesday, August 12, 2003

I think looking at their email that we're specifically talking about Google News here. They're saying they want to specifically be able to distinguish between "articles" and "sections" on news sites, and they've been using the existance of numbers in "articles" to do that.
Seems a bit of a hairy approach to me but I guess if that's what Google News currently wants, and if you want your news articles in Google News then you don't have much choice.

Richard Wood
Tuesday, August 12, 2003

A good answer would be for Citydesk to allow variable names in file names - I just tested and it doesn't at the moment.

Then you could put a variable called, say .Numbering into the name of the folder that all the articles are in.

That variable could be a date related sequence of numbers as suggested and be derived from the current record's filing date.

By doing it this way we we'd have the flexibility to create such URLs if we want to, or default to the usual method.

Richard Wood
Tuesday, August 12, 2003

I doubt Fog Creek would be interested in adding an obscure feature to CityDesk that very few people would really need. At least, not at this stage of the product life-cycle. They've got too many meaty features to work on.

You'd be better off writing a Perl script that goes through your site fixing up all the links before you upload it. CityDesk can automatically run that script for you.

Darren Collins
Tuesday, August 12, 2003

What is obscure about wanting to have your pages show up in Google News?

Citydesk is ideal for news websites and I would suggest is one of very few accessible at the low end. Most content management systems are so targetted at corporate intranets, communities, or retail shops so as to be hopeless for a news or magazine publisher.

If Citydesk doesn't have a lot of such customers yet, then it certainly could gather them with a little bit of marketing, but not being google-friendly is going to be a big turnoff to any publisher.

Richard Wood
Tuesday, August 12, 2003

Oh my, One of the reasons I just purchased CD was because of its great url's. My news stories already carry the date in the url.

I'm trying to get away from meaningless numbers in the url.

Ron Lane
Wednesday, August 13, 2003

What a strange requirement.

The answer is simple.

<a href="{$x.link$}?345,6876,798789,8745">

www.marktaw.com
Wednesday, August 13, 2003

Can you put this into the template?


extention - ".html?s=2344564,5758679,878973242,132"

www.marktaw.com
Wednesday, August 13, 2003

I must be missing something, Mark. How do I get Citydesk to automatically create such URLs for its documents when it publishes?

Richard Wood
Wednesday, August 13, 2003

I see what you're saying. In the extension field under the template properties you can add .html? and then any set of numbers.

That may be worth a go. I'm still waiting to hear back from Google whether the numbers have to be different for different pages or whether it's the existence of any numbers that helps.

I'll come back with that when I find out.

Richard Wood
Wednesday, August 13, 2003

To take Mark's idea one step further, and (almost) make it look like you actually have dynamic content, what about this:

{$ setDateTimeFormat "*" "yyyy,M,dd" "hh,mm" $}

{$ forEach x in (folder "articles") $}

<a href="{$ x.link $}?{$ publishDate $},{$ publishTime $}">{$ x.headline $}</a>

{$ next $}

Kevin
Wednesday, August 13, 2003

That should have been:

{$ setDateTimeFormat "*" "yyyy,M,dd" "hh,mm,ss" $}

{$ forEach x in (folder "articles") $}

<a href="{$ x.link $}?{$ x.filedDate $},{$ x.filedTime $}">{$ x.headline $}</a>

{$ next $}

Kevin
Wednesday, August 13, 2003

Kevin... Even better! And the number should be static & unique for each document.

And it will look like all those news sites as well.

I guess the only difficult thing then becomes linking to a document from inside an article. There is no way to transmit this information via magicname. But that might be a trade-off you ahve to live with.

KUDOS Kevin!

www.marktaw.com
Wednesday, August 13, 2003

I still don't see how that approach can end up in the URLs on the site. Doesn't this just give you a list within a document?

Richard Wood
Wednesday, August 13, 2003

Richard, try it out and see.  What it does is put the time and date info after the ? in the url.  Anything after the url can be used by that page as input parameters.  If the page doesn't use them it doesn't matter.  But when the search engine goes looking for urls (link addresses) it sees the numeric stuff.

Joel Goldstick
Wednesday, August 13, 2003

A simple solution (not knowing how you want your site structure) would be to organize your news articles in folders like 2003/08/24/ (YYYY/MM/DD).  You would not have to try these workarounds (although they are all very ingenious!).  This is the recommended approach for weblogs as well.

Russ Hollmann
Wednesday, August 13, 2003

> You would not have to try these workarounds

I agree, but since the request was to *automatically* generate numbers, that's what we're working with.

www.marktaw.com
Wednesday, August 13, 2003

Forgive me, but I'm still lost. Where is it being suggested the looping code be placed? In a template? Then how does it effect the name of the html files generated - the generated name as far as I can see is based on the file and folder names in the folder view.

Richard Wood
Wednesday, August 13, 2003

Richard-

Google doesn't look at the literal name of the file on the server, it looks at the URL, which is not necessarily the same thing.  Using the foreach loop shown above, you will append some meaningless numbers onto the URL, without changing the actual file name.

For example, an article published as excitingnews.html will have links pointing to it as excitingnews.html?2003,08,13.  The numbers will be ignored by the web server, but noticed by Google.  You would use this loop in the page listing "current news items", or whatever.

The other option, mentioned already, is to just name files numerically, such as 20030813.html.

Kevin
Wednesday, August 13, 2003

And, just to clarify, the foreach loop doesn't actually have any effect on the published name of the items.  You could still link to excitingnews.html without problem.  The difference is that when Google scans your "recent news" page, it will find URLs with numbers on the end, although they still point to the same thing.

Kevin
Wednesday, August 13, 2003

Any time you created a loop you would use that A HREF tag that Kevin brilliantly supplied.

<a href="{$ x.link $}?{$ x.filedDate $},{$ x.filedTime $}">{$ x.headline $}</a>

becomes

<a href="LatestStory.html?20030813,230136">Latest Story</a>

depending on how you format the fileddate and filedtime... if they're configurable.... yyyymmdd,hhMMss or whatever.

The page should ignore it unless you have some sort of form on it.

The forum here does a similar thing...

default.asp?cmd=show&ixPost=8724&ixReplies=22

where ixPost=8724 is important, but ixReplies=22 does nothing, it just fools your browser into thinking it's a new link.

www.marktaw.com
Wednesday, August 13, 2003

I have resolved the problem changing the extention .html in the template options to a full stop, then  a series of digits then .html
It may seem dumb because the random set of numbers is the same for every file but Google News have advised that it should do the trick.
You can see this by visiting my site www.itkiwi.com and clicking on any story.

Richard Wood
Friday, August 29, 2003

*  Recent Topics

*  Fog Creek Home